NVIDIA发布Nemotron 3 Nano Omni模型,该模型能统一处理屏幕、文档、音频、视频和文本,实现单模型多模态感知到动作的循环,旨在降低推理复杂度和成本。
Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on...Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on fragmented model chains—separate stacks for vision, audio, and text. This increases inference hops and orchestration complexity, driving up inference costs while weakening cross-modal context consistency. NVIDIA Nemotron 3 Nano Omni…
Source