The project is in an experimental, pre-alpha, exploratory phase with the intention to be productionized. We move fast, break things, and explore various aspects of the seamless developer experience ...
ChartMuseum is a chart question answering benchmark designed to evaluate reasoning capabilities of large vision-language models (LVLMs) over real-world chart images. The benchmark consists of 1162 ...
Abstract: Visual grounding seeks to localize the image region corresponding to a free-form text description. Recently, the strong multimodal capabilities of Large Vision-Language Models (LVLMs) have ...
Abstract: Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results