setTimeout in JavaScript Visual Representation

High-level visual representations in the human brain are aligned with large language models

The human brain extracts complex information from visual inputs, including objects, their spatial and semantic interrelations, and their interactions with the environment. However, a quantitative ...

IEEE

Entity-Enhanced Question Representation for Knowledge-Based Visual Question Answering

Abstract: A good knowledge-based visual question answering (KB-VQA) model requires detailed visual information, semantically clear questions, and relevant external knowledge to address open visual ...

IEEE

Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities

Abstract: This paper proposes a single-stage training approach that semantically aligns three modalities - audio, visual, and text using a contrastive learning framework. Contrastive training has ...

GitHub

Towards Visual Grounding: A Survey

If you find any work missing or have any suggestions (papers, implementations, and other resources), feel free to pull requests. We will add the missing papers to this repo as soon as possible. You ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results