← 返回列表

@DrJimFan: The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physica...

@DrJimFan 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-04-01T15:03 抓取:2026-05-03 12:13
🔗 原文链接
摘要

团队开源智能体机器人框架CaP-X,包含感知、控制、可视化工具,并发布CaP-Gym和CaP-Bench基准,CaP-RL使7B模型成功率从20%提升至72%,程序可迁移至真实机器人。

客观事实
  • 开源CaP-X智能体机器人框架,含感知、控制等API
  • 发布CaP-Gym基准,包含187个操作任务
  • CaP-RL使7B模型50次迭代后成功率从20%升至72%
CaP-X CaP-Gym CaP-Bench CaP-RL

原文

The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries as they go. CaP-X is a strict superset of our old stack, because policies like VLAs are “just” API calls as well. It solves many tasks zero-shot that a learned policy would struggle with.

And we are doing much more than vibing. CaP-X is our most systematic, scientific study on agentic robotics so far:

  • We build a comprehensive agentic toolkit: perception (SAM3 segmentation, Molmo pointing, depth, point cloud), control (IK solvers, grasp planner, navigation), and visualization (EEF, mask overlays) that work across different robots.
  • CaP-Gym: LLM’s first Physical Exam! 187 manipulation tasks across RoboSuite, LIBERO-PRO, and BEHAVIOR. Tabletop, bimanual, mobile manipulation. Sim and real. Can’t wait to see the gradients flow from CaP-Gym to the next wave of frontier LLM releases.
  • CaP-Bench: we benchmark 12 frontier LLMs/VLMs (Gemini, GPT, Opus, Qwen, DeepSeek, Kimi, and more) across 8 evaluation tiers. We systematically vary API abstraction level, agentic harness, and visual grounding methods. Lots of insights in our paper.
  • CaP-Agent0: a training-free agentic harness that matches or exceeds human expert code on 4 out of 7 tasks without task-specific tuning.
  • CaP-RL: if you get a gym, you get RL ;). A 7B OSS model jumps from 20% to 72% success after only 50 training iterations. The synthesized programs transfer to real robots with minimal sim-to-real gap.

3 years ago, our team created Voyager, one of the earliest agentic AI that plays and learns in Minecraft continuously. Its key ideas — skill libraries, self-reflection loops, and in-context planning — have since influenced many modern agentic designs.

Today, the agent graduates from Minecraft and gets a real job. It’s April Fool’s, but this Claw is getting its hands dirty for real!

Link in thread:

likes: 720 | retweets: 114 | replies: 100 | views: 70078