← 返回列表

@levie: Opus 4.8 is out, and we've been testing it with the Box AI agent on our most complex real-world knowledge worker tasks with enterprise docum...

@levie 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-28T20:12 抓取:2026-05-28 23:20
🔗 原文链接
摘要

Opus 4.8版本发布,针对企业文档的复杂知识工作者任务进行测试。新版本在报告起草、法律NDA审查、金融数据分析等任务上表现优于Opus 4.7,具体性能提升数据包括工业品报告87% vs 77%,消费品评估90% vs 84%等。

客观事实
  • Opus 4.8版本发布并开始测试
  • 在报告起草、NDA审查等任务上优于4.7
  • 工业品报告得分87%对比4.7的77%
Box Opus

原文

Opus 4.8 is out, and we've been testing it with the Box AI agent on our most complex real-world knowledge worker tasks with enterprise documents.

Opus 4.8 is measurably better at the generative and analytical work enterprises care about most like writing reports, synthesizing data, reviewing complex enterprise documents across a range of industries. Here are some quick examples of wins vs. Opus 4.7:

  • Report drafting: Opus 4.8 outperforms on a majority of report drafting tasks, producing more complete and accurate analytical reports. On an industrial goods reporting task, it scored 87% vs 77% for Opus 4.7; on a consumer products launch evaluation, 90% vs 84%.

  • Review and verification: On a legal NDA review task requiring verification of contract terms against compliance criteria, Opus 4.8 catches more relevant clauses and flags more potential issues, with near-perfect consistency across all trials.

  • Financial data analysis: On a corporate lending analysis task comparing syndicated vs bilateral loan structures, Opus 4.8 extracts more accurate financial metrics from source documents, leading by nearly 8 percentage points.

  • Consumer products launch evaluation: On a task requiring assessment of a product launch across multiple performance dimensions, Opus 4.8 captured evaluation criteria that Opus 4.7 missed — producing a more thorough aBnalysis that covered all required factors rather than just the most obvious ones.

  • Legal NDA review: On a task verifying NDA terms against compliance criteria, Opus 4.8 identified more relevant clauses and flagged potential issues that Opus 4.7 missed. Its outputs were also highly predictable — producing nearly identical quality across independent runs.

  • Public sector grant analysis: When analyzing library grant documentation against eligibility criteria, Opus 4.8 correctly extracted and validated nearly all required data points, catching specific eligibility details that Opus 4.7 overlooked or misinterpreted.

Opus 4.8 will be rolling out shortly to Box customers to deploy in Box AI agents. Learn more here: https://t.co/D3vID1tWWv

likes: 235 | retweets: 20 | replies: 36 | views: 48185