NVIDIA AI研究团队在CVPR2026发表论文LocateAnything,一种视觉语言检测模型,采用并行解码边界框方式,在138M高质量样本上训练,显著提升定位精度和吞吐量,目前在HuggingFace上排名第一。
This #CVPR2026 paper from our research team is trending #1 on @HuggingFace 🤗
Meet LocateAnything: a vision-language detection model that rethinks bounding box prediction. For AI agents and robots, “seeing” is only useful if a model can pinpoint where something is fast enough to act.
Trained on 138M high-quality samples, LocateAnything decodes bounding boxes in parallel instead of one coordinate at a time, improving localization accuracy while dramatically increasing throughput for visual grounding and detection.
Project page: https://t.co/O7JMe8tzFM
likes: 609 | retweets: 93 | replies: 23 | views: 37988