Abstract: As a challenging task in visual information retrieval, open-ended long-form video question answering automatically generates the natural language answer from the referenced video content ...