@DrJimFan: The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physica...

@DrJimFan 3 信息等级 3 发布：2026-04-01T15:03 抓取：2026-05-03 12:13

🔗 原文链接

AI 行业动态研究

摘要

团队开源智能体机器人框架CaP-X，包含感知、控制、可视化工具，并发布CaP-Gym和CaP-Bench基准，CaP-RL使7B模型成功率从20%提升至72%，程序可迁移至真实机器人。

客观事实

开源CaP-X智能体机器人框架，含感知、控制等API
发布CaP-Gym基准，包含187个操作任务
CaP-RL使7B模型50次迭代后成功率从20%升至72%

CaP-X CaP-Gym CaP-Bench CaP-RL

原文

The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries as they go. CaP-X is a strict superset of our old stack, because policies like VLAs are “just” API calls as well. It solves many tasks zero-shot that a learned policy would struggle with.

And we are doing much more than vibing. CaP-X is our most systematic, scientific study on agentic robotics so far:

We build a comprehensive agentic toolkit: perception (SAM3 segmentation, Molmo pointing, depth, point cloud), control (IK solvers, grasp planner, navigation), and visualization (EEF, mask overlays) that work across different robots.
CaP-Gym: LLM’s first Physical Exam! 187 manipulation tasks across RoboSuite, LIBERO-PRO, and BEHAVIOR. Tabletop, bimanual, mobile manipulation. Sim and real. Can’t wait to see the gradients flow from CaP-Gym to the next wave of frontier LLM releases.
CaP-Bench: we benchmark 12 frontier LLMs/VLMs (Gemini, GPT, Opus, Qwen, DeepSeek, Kimi, and more) across 8 evaluation tiers. We systematically vary API abstraction level, agentic harness, and visual grounding methods. Lots of insights in our paper.
CaP-Agent0: a training-free agentic harness that matches or exceeds human expert code on 4 out of 7 tasks without task-specific tuning.
CaP-RL: if you get a gym, you get RL ;). A 7B OSS model jumps from 20% to 72% success after only 50 training iterations. The synthesized programs transfer to real robots with minimal sim-to-real gap.

3 years ago, our team created Voyager, one of the earliest agentic AI that plays and learns in Minecraft continuously. Its key ideas — skill libraries, self-reflection loops, and in-context planning — have since influenced many modern agentic designs.

Today, the agent graduates from Minecraft and gets a real job. It’s April Fool’s, but this Claw is getting its hands dirty for real!

Link in thread:

likes: 720 | retweets: 114 | replies: 100 | views: 70078