Anthropic发布了Claude Opus 4.8模型,测试显示其在原型构建和一次性功能上表现优异,但在处理现有代码库的最后10%和边缘情况时存在困难,并伴有幻觉问题。新模型还支持动态工作流、并行子代理和努力控制功能。
I got a few hours of early-access testing with Anthropic’s newly released model Opus 4.8. I walk through real coding, design, and strategy tasks across Claude Code and Claude Cowork, and give you my unfiltered view on what impressed me and what didn’t.
Listen or watch on YouTube, Spotify, or Apple Podcasts
What you’ll learn:Where Opus 4.8 excels: greenfield prototypes, one-shot features, and fast execution
Where it struggles: the last 10%, edge cases in existing codebases, and hallucinations
How Opus 4.8 compares to Opus 4.7 on business strategy work
Why I’m still reaching for Opus 4.7 on data-heavy strategy and roadmap work
The new features shipping alongside the model: dynamic workflows with parallel subagents and effort control in Claude.ai and Cowork
The prompting and harness strategy I’d use to get the most out of it
In this episode, we cover:(00:00) Introduction to Opus 4.8
(00:44) Benchmark performance and pricing
(01:53) First coding test: Building a prototyping tool
(03:00) Where it failed: The last 10% problem
(03:27) The hallucination problem
(04:23) Testing Opus 4.8 on existing codebases
(05:24) The ambition test: Building games for a 9-year-old
(07:03) Business strategy test: 4.7 vs 4.8
(08:23) The roadmap test
(09:17) Final verdict
References:• System Card: Claude Opus 4.8: https://cdn.sanity.io/files/4zrzovbb/website/c886650a2e96fc0925c805a1a7ca77314ccbf4a6.pdf
• Introducing Claude Opus 4.8 on X:
Where to find Claire Vo:ChatPRD: https://www.chatprd.ai/
Website: https://clairevo.com/
LinkedIn: https://www.linkedin.com/in/clairevo/
X: https://x.com/clairevo
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.