NVIDIA AI转发用户测试:在DGX Spark(128GB显存)上以q8量化运行nemotron 3 omni模型,通过Hermes Agent实现56 tok/s的推理速度。
RT @sudoingX: nemotron 3 omni q8 on dgx spark 128gb vram cranking via hermes agent at 56 tok/s. first night of real local agentic on this b…
likes: 182 | retweets: 13 | replies: 24 | views: 29239