今日发布DeepSWE,一种新的代理编码基准标准。公共排行榜上,顶级模型的表现备受关注。该基准旨在提升编码任务的评估标准。
RT @serenaa_ge: Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks.
On public leaderboards, top models often look…
likes: 5421 | retweets: 673 | replies: 442 | views: 1596108