Java Array Visual Representation

High-level visual representations in the human brain are aligned with large language models

The human brain extracts complex information from visual inputs, including objects, their spatial and semantic interrelations, and their interactions with the environment. However, a quantitative ...

GitHub

Efficient Visual Representation Learning with Bidirectional State Space Model

May. 2nd, 2024: Vision Mamba (Vim) is accepted by ICML2024. 🎉 Conference page can be found here. Feb. 10th, 2024: We update Vim-tiny/small weights and training scripts. By placing the class token at ...

IEEE

Entity-Enhanced Question Representation for Knowledge-Based Visual Question Answering

Abstract: A good knowledge-based visual question answering (KB-VQA) model requires detailed visual information, semantically clear questions, and relevant external knowledge to address open visual ...

IEEE

Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training

Abstract: The rapidly evolving field of robotics necessitates methods that can facilitate the fusion of multiple modalities. Specifically, when it comes to interacting with tangible objects, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results