@MSFTResearch: Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the use...

@MSFTResearch 3 信息等级 3 发布：2026-05-11T17:30 抓取：2026-05-12 04:03

🔗 原文链接

研究 AI

摘要

微软研究院在SocialReasoning Bench上观察到，AI代理能胜任执行任务，但无法持续改善用户位置，即使有明确指令优化用户利益。这一模式在多个模型中稳定存在。

客观事实

AI代理在执行任务时表现称职，但未能持续改善用户位置
即使有明确指令优化用户利益，代理仍无法稳定提升
该模式在多个模型中一致出现

微软研究院 SocialReasoning Bench

原文

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. https://t.co/6zVr3qDE5X https://t.co/gzPJANSvIG

likes: 23 | retweets: 6 | replies: 1 | views: 5993