DETR-based methods, which use multi-layer transformer decoders to refine object queries iteratively, have shown promising performance in 3D indoor object detection. However, the scene point features ...
TL;DR: GAGS learns a 3D Gaussian field associated with semantic features, which enables accurate open-vocabulary 3D visual grounding in the scene. Abstract: 3D open-vocabulary scene understanding, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results