Abstract: A good knowledge-based visual question answering (KB-VQA) model requires detailed visual information, semantically clear questions, and relevant external knowledge to address open visual ...
Please cite this work with the following BibTeX: @inproceedings{cocchi2024augmenting, title={{Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering}}, ...
Abstract: The visual question answering (VQA) method applied to remote sensing images (RSIs) can complete the interaction of image information and text information, which avoids professional barriers ...