← 返回列表

@teortaxesTex: no, that era has *ended*, nobody is seriously benchmaxxing now. Originally this term meant training on actual test, at best on paraphrases, ...

@teortaxesTex 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-06-08T01:10 抓取:2026-06-08 05:19
🔗 原文链接
摘要

一条推特讨论AI基准测试趋势,指出benchmaxxing时代已结束,业界不再严重依赖训练集作弊;持续更新的基准测试如LCB出现;大约2025年初,模型在截断后切片上不再崩溃。

客观事实
  • Benchmaxxing时代结束
  • 持续更新的基准测试如LCB出现
  • 模型在2025年初停止崩溃
LCB

原文

no, that era has ended, nobody is seriously benchmaxxing now. Originally this term meant training on actual test, at best on paraphrases, and we came up with continuously updated benchmarks like LCB. Around early 2025 (≈R1), models stopped crashing on post-cutoff slices. https://t.co/Y87LKngIXk

likes: 34 | retweets: 0 | replies: 2 | views: 4553