Abstract: 4D video control is essential in video generation as it enables the use of sophisticated lens techniques, such as multicamera shooting and dolly zoom, which are currently unsupported by ...
Abstract: The vision-language tracking task aims to perform object tracking based on various modality references. Existing Transformer-based vision-language tracking methods have made remarkable ...