微软研究院在SocialReasoning Bench上观察到,AI代理能胜任执行任务,但无法持续改善用户位置,即使有明确指令优化用户利益。这一模式在多个模型中稳定存在。
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. https://t.co/6zVr3qDE5X https://t.co/gzPJANSvIG
likes: 23 | retweets: 6 | replies: 1 | views: 5993