Test-time compute has empowered multimodal large language models to generate extended reasoning chains, yielding strong performance on tasks such as multimodal math reasoning. However, we observe that ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results