Abstract: A good knowledge-based visual question answering (KB-VQA) model requires detailed visual information, semantically clear questions, and relevant external knowledge to address open visual ...
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Please cite this work with the following BibTeX: @inproceedings{cocchi2024augmenting, title={{Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering}}, ...
Abstract: The visual question answering (VQA) method applied to remote sensing images (RSIs) can complete the interaction of image information and text information, which avoids professional barriers ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results