ViSMaP: Unsupervised Summarization of Hour-Long Videos Using Meta-Prompting and Short-Form Datasets

by Techaiapp
4 minutes read

ViSMaP: Unsupervised Summarization of Hour-Long Videos Using Meta-Prompting and Short-Form Datasets

Video captioning models are typically trained on datasets consisting of short videos, usually under three minutes in
Send this to a friend