The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

Latent Space 4 信息等级 4 发布：2026-05-28T18:41 抓取：2026-05-28 22:13

🔗 原文链接

AI 公司

摘要

Cognition宣布完成10亿美元D轮融资，投资者包括Lux Capital、General Catalyst和8VC。自年初企业使用量增长超10倍，年化收入达4.92亿美元。Devin作为首个AI软件工程师已发布两年。

客观事实

Cognition完成10亿美元D轮融资
企业使用量自年初增长超10倍
年化收入达到4.92亿美元

Cognition Devin Lux Capital General Catalyst 8VC

原文

The new AIEWF website is live! CFPs close in 2 days and we will run our first New Engineer Orientation this weekend, get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!
One of the central tensions in the agents industry is that even while there are major decacorn agent labs like Sierra, Decagon, Notion and Cursor being built up, it is also true that it has never been easier to DIY agents, with a plethora of agent frameworks like LangGraph and Pydantic and Flue, and managed agents from Anthropic and Gemini and Amazon. There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition’s friends Ramp have built their own coding agent with other friend Modal.
You’d think Cognition might feel a bit threatened, but they’re not - even after all this, they were way oversubscribed for the $1B Series D they just announced:
@Lux_Capital, @generalcatalyst, and @8vc.\n\nOur enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492 M.\n\nWe launched Devin two years ago as the first AI software engineer. Since ","username":"cognition","name":"Cognition","profile_image_url":"https://pbs.substack.com/profile_images/1765909640364068865/MvH-m0gd_normal.jpg","date":"2026-05-27T15:39:26.000Z","photos":[{"img_url":"https://pbs.substack.com/media/HJViewebAAE1uVB.jpg","link_url":"https://t.co/k99LLLyWhZ"}],"quoted_tweet":{},"reply_count":157,"retweet_count":194,"like_count":2372,"impression_count":733289,"expanded_url":null,"video_url":null,"belowTheFold":false}" data-component-name="Twitter2ToDOM">Walden Yan, coiner of context engineering and Chief Product Officer/Cofounder of Cognition, invited OpenInspect’s Cole Murray to talk about why the Devin is in the Details.
Full conversation live on the pod today:
In retrospect, async agents were the most AGI pilled bet you could make in 2024 - the models weren’t good enough yet to vibecode, and people didn’t trust AI enough to let it rip, nobody (including early Cognition) was sure about the form factors.
Now it is obvious:
The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor’s tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer’s local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time.
The second wave was local agents: Claude Code, Windsurf, Cursor’s agents pane: first one and increasingly many terminals all running concurrently.
The current Age of Async Agents points to a different future focused more on agent orchestration which drives end-to-end development.

According to previous guest Steve Yegge, there are finer-grained 8 levels to agent adoption, but we have collapsed it into three.
As Cursor’s Michael Truell put it in The third era of AI software development:
Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work.

The agent should not sit solely inside the developer’s flow. It should be setup to work in the background so that you can give it a task, a repo, a machine, a shell, a browser, tests, memory, and review loops to go do the work somewhere else.
In less than a year, the sentiment has shifted from avoiding multi-agent systems:
to suggesting approaches that actually work:
From coining “context engineering” to building the infrastructure behind Devin’s 7x PR growth and jump from 16% to 80% of commits across Cognition repos, Walden Yan has had a front-row seat to the background-agent shift. In this episode, Cognition co-founder and CPO Walden Yan joins swyx alongside Cole Murray, creator of OpenInspect, to unpack why everyone is building their own Devin, what changed after the December 2025 model inflection, and why “spec to pull request” is now becoming a real production workflow.
We go deep on the architecture of background agents: harness-in-the-box vs out-of-the-box, why Devin separates the “brain” from the machine, why repo setup is still one of the hardest problems, why Docker is not always enough, and how full VMs, snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Walden and Cole also dig into memory, MCP limitations, multi-agent orchestration, AI code review, SRE auto-triage, PMs shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer.
And as agents eat software… and software eats the world… you can draw the conclusion on what is next:
We discuss:Why the engineering world is waking up to background agents and cloud agents
The December 2025 model inflection that made spec-to-PR workflows practical
Devin’s 7x merged PR growth and rise from 16% to 80% of commits
Why Cole built OpenInspect as an open-source background-agent system
The economics of $20/seat agent products and why monetization is tricky
What Cognition actually sells beyond Devin: infra, onboarding, integrations, and adoption
Harness in the box vs out of the box, and why architecture matters
Why Devin separates the brain from the machine for security and permissions
Repo setup, scoped secrets, Docker Compose, and agent-ready dev environments
Why full VMs matter when agents need to run real applications and test them
Android, macOS, Windows, nested virtualization, and machine-specific agent work
Why testing is much harder than “computer use”
Screenshots, video verification, and the “I know it works” merge moment
GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments
Why MCP alone is not enough for first-class Slack and enterprise integrations
Memory, Knowledge, skills, Claude.md, and why retrieval is still unsolved
Devin’s auto-generated memories and the challenge of memory pruning
Always-on agents as permanent PMs for issues, tickets, and product areas
Sub-agents, meta-Devin management, and what multi-agent systems actually add
Why pure auto-merge vibe coding breaks down after about two weeks
AI code smells, lint rules, reward hacking, and Semgrep for agent-written code
GitAI, inline context, and preserving the “why” behind code changes
Local testing, mock servers, older codebases, and preparing companies for agents
Windsurf 2.0 and the handoff between local foreground agents and cloud background agents
SRE auto-triage, support workflows, and agents as first responders
PMs, marketing, and non-engineers creating pull requests from Slack
AI agent budgets, $1k-$5k per engineer spend, and hybrid frontier/sub-frontier systems
The rise of autonomous coding factories and who Cognition is hiring
Walden YanX: https://x.com/walden_yan
LinkedIn: https://www.linkedin.com/in/waldenyan/
Cole MurrayX: https://x.com/_colemurray
LinkedIn: https://www.linkedin.com/in/colemurray/
OpenInspect / Background Agents: https://github.com/ColeMurray/background-agents
Timestamps00:00:00 Introduction
00:00:43 Why Everyone Is Building Their Own Devin
00:01:57 Devin’s 2025 Ramp: 7x PR Growth and 80% of Commits
00:03:49 OpenInspect and the Rise of Open-Source Background Agents
00:07:59 What Cognition Actually Sells Beyond Devin
00:09:56 Background Agent Architecture: Harness In vs Out of the Box
00:12:08 Separating the Brain from the Machine
00:14:07 Repo Setup, Secrets, Docker, and Full VMs
00:19:13 Why Testing Is Harder Than Computer Use
00:22:40 Video Verification and the “I Know It Works” Merge Moment
00:23:19 GitHub UX, Devin Review, and AI Code Review
00:25:42 MCP, Slack, and Enterprise Agent Integrations
00:28:59 Memory, Knowledge, and Always-On Agents
00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin
00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay
00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore
00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems
00:56:10 Making Codebases Agent-Ready
00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff
01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases
01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories
01:06:51 Hiring at Cognition and OpenInspect Consulting
01:07:45 Outro
TranscriptIntroduction: Walden Yan, Cole Murray, and Context EngineeringSwyx [00:00:00]: All right, we’re in the studio with Walden Yan, co-founder of Cognition, CPO.
Walden [00:00:08]: Happy to be here.
Swyx [00:00:09]: Which is a cool title. And coiner of context engineering.
Walden [00:00:15]: Although I think there are many people who’d used the terms in various ways beforehand, but I did find that people, both internally and externally, enjoyed the upgrade from prompt engineering or model wrapping into maybe a more thoughtful way to build agents.
Swyx [00:00:33]: For those who haven’t caught up on that, I have on screen the Don’t Build Multi-Agents post, which you should go read on and we might refer to, and Cole Murray, who created OpenInspect.
Cole [00:00:43]: Great to be here.
Swyx [00:00:43]: So let’s talk about it. Everyone is building their own Devins. What’s going on?
The December Shift: From Handholding Models to Autonomous PRsCole [00:00:51]: So I think the engineering world is waking up to this idea of background agents, cloud agents, whatever you’d like to call it. And I think we saw a shift around the December timeframe of 2025, where the models Opus 4.5 and GPT 5.2, they reached a capability where we moved away from handholding the model and being able to actually more or less autonomously drive the model. And what I mean by that is that we could pretty much go from a specification to a completed pull request, assuming the spec was good enough, with very little friction. And that paradigm alone, I think, changed a lot of how we interact with agents, and opened this world where background agents became more practical.
Swyx [00:01:41]: I think for Cole, everyone experienced this in December, but I feel like there was just this increasing ramp, right? There was this moment which was, I think, Sonnet 3.7, where, You guys rewrote Devin in one night or something. So describe 2025 or how it felt from your side.
Walden [00:02:01]: In retrospect, we always thought it was ramping up, but then even now, over the last three, four months from today, it’s been ramping up even faster. So it’s almost funny to be talking about how, big of a leap Sonnet 3.7 was, and honestly, a lot of it was stripping out parts of Devin that were no longer needed with that jump in of intelligence. But I also just think that a lot of the recent leaps, especially, you look at, models like Opus and the latest GPT models, they are reaching levels of autonomy where people are actually finding that they actually can just be hands-off. And people who were once debating, “Oh, do I need to be in the weeds with my model in the IDE? Can I just completely move it off into the cloud?” That’s a more serious conversation, and we’ve seen that in all of our growth charts. Internally there’s this funny graph where our usage has, of PRs, our merged PRs, has grown 7X since I forget what it was called.
Swyx [00:02:57]: I think Dev, maybe tweeted that. Yes.
Walden [00:03:01]: it grew like 7X over, the last, I think it was, two months, three months, something like that. And then you see our engineering headcount growth. It’s, gone up by, 10% or something.
Swyx [00:03:11]: We were, we were afraid To release this. So this is Devin commit percentages on all Devin repos, was 16% in January and now 80% in March.
Walden [00:03:25]: It’s a big shift right now. And so it makes sense that a lot of people are now thinking about, buying Devin, but also maybe, trying to build their own and there’s Lots of I have a lot of fun building Devin, so I can see why other people would want to build their own cloud agents as well. Matt, well, maybe it’s good to hear, what initially inspired you to try to build OpenInspect?
OpenInspect: Ramp, Cloud Agents, and Open SourceCole [00:03:49]: OpenInspect came about, through primarily my clients observing how they were using tools like Claude, OpenAI’s Codex at the time, and seeing some of the friction that they were having with it. Primarily the Claude was being used through Slack, and a big issue they ran into was that the sessions that were launched were specific to whoever called it via Slack. And so if a PM was the one who invoked the session and they would then go to pass context to engineering can’t see the session. And that in itself was a deal breaker because the PM, “Hey, engineering, can you jump in?” But there’s nothing to jump in on unless they’re copy-pasting out or the single response that came back. And so seeing some of these problems, I had built a similar architecture internally, just to experiment with, test out different ideas as this trend of moving off of localhost was starting to become, And as Ramp released their blog post, I had a lot of the pieces for this already in place, and just thought it would be funny to, see what Claude could do just purely from the blog post. And on my X account, there’s actually a thread of where I live tweeted, going through this
Cole [00:05:14]: comparing GPT and Claude as both of them are going through it.
Swyx [00:05:17]: On the announcement thing or something else?
Cole [00:05:19]: right after it got released. We can put it in the show notes. Yeah, it was helpful that I had already knew how to verify the system. I knew what I was looking for. I think Ramp did a great job of really illustrating, the technical aspects of how to build something. It was much more than just like, “Hey, we built a great system.” It was, “And here’s how you can build it too.” And so, I resonated a lot with that, just with the problems that I was already seeing, and I thought that, looking around, I didn’t really see anything in the open source community that, met this type of system. I think there’s a lot that run, in localhost like Superset, Conductor, and many others.But nothing that was actually running in the cloud. And so, I built it, and I thought it was interesting to just open source it and allow anyone to then have a foundation that they can mix and match on top of.
The Business of Background Agents: Open Source vs. DevinSwyx [00:06:16]: So literally after Devin was launched was, there was OpenDevin Which became All Hands. I don’t know if you tried that or
Walden [00:06:22]: I was going to say, one of the things that interested me a lot with OpenInspect was, you didn’t try to go make it then something you monetize. There are a lot of, I think, these open source projects would then go and really try to, raise V
Swyx [00:06:36]: That’s why no OpenDevin. Yeah.
Walden [00:06:38]: yeah, and how did you think about that? I thought that was very interesting.
Cole [00:06:44]: I thought, and just what I had seen across my clients, was that having a background agent system is going to become a critical infrastructure within their company. And so because of that, I think that I wanted to open source it so that they could fork it and put in whatever customization they wanted. To that question though, I get asked all, “Oh, are you going to raise? Are you going to turn this into a service?”
Walden [00:07:08]: I’m sure you’ve gotten offers.
Cole [00:07:09]: but primarily I don’t want to do that for a few reasons. One, I think that I don’t want to compete for, $20 a seat. I think that is just a really difficult business. I think it’s very easy to copy the main pieces of it. Again, I built this fairly quickly. And I think because you are not owning, I guess, the entire stack, it’s hard to monetize. You have money being made at the sandbox layer with Daytona, E2b, many other players. You have money being made at the model layer. And you sit in this weird in-between gray area where what are you actually selling? You’re selling, I guess, the infrastructure. You’re selling, the integrations maybe.
Swyx [00:07:55]: let’s ask the guy. What are you What are you selling?
Walden [00:07:59]: Well, yeah, there’s multiple layers to this in practice, and actually it’s funny you mentioned the infrastructure, ‘cause when we got started building Devin as well, we had to go figure out how to make the infrastructure as well because,
Swyx [00:08:10]: You had to build this two years before everyone else,?
Swyx [00:08:15]: Including, the model side
Walden [00:08:17]: It was not, it was not very polished at the start, when we just built it off of raw VMs from cloud providers like EC2, the boot up time was so slow, I think, And especially then, turning off the machines, saving them, and then to be able to bring them back up again when the, when you want Devin to wake up again later. It would just be out cold for like 10 minutes because that’s just how long these systems took. They were not built for this repeated down and up usage. And so we actually had to go do all of that. And as a result now, one thing we offer when we go and sell Devin to people is, you don’t have to worry about all the compute side of things. We’ll make it work. We’ll make it work in your cloud if you want it to. But aside from the product, and I want to go into the agents and the tuning of the intelligence part later, but I think a big part of what we do at Cognition as well is to just make sure that your company learns and uses and adopts these coding agents. ‘Cause I think for especially the largest enterprises in the world, you find that there is a lot of people who want to move over to using AI for their day-to-day workloads. But because of the way projects are planned, because, not everyone is literate in using AI in these ways, having a team of engineers who can actually go in and onboard you, set up all the integrations you need, the automations you need to really get to that level of, leverage with AI, is super helpful. And so We do that. We show thought partners to the customers that we work with as well.
Swyx [00:09:56]: So let’s talk about, architectural stuff. I think that’s always, that is something that was the topic of conversation between the two of you. Is this, the mental model that you want to start with or something else? I’ll just leave the floor open to you guys.
Agent Architecture: Harness in the Box vs. Out of the BoxCole [00:10:11]: I think, maybe we can start here as just a general what are the pieces of a background agent system. And then maybe we can go into some of the nuances of, Decisions that you can make.
Swyx [00:10:22]: But I guess I also Like, what, maybe what Walden is saying is the agent is like in this open code box, I guess. Right? This is infra, and then there’s, that’s the agent. And you had this discussion about whether you put the agent in here or in Out externally. Can you tease that out?
Cole [00:10:39]: In a background agent systems, you have a decision to make of where the agent is actually going to run. This is typically described as the harness in the box or out of the box. With running the agent in the box, you’re making some trade-offs by doing that. The negative trade-off you’re making is primarily security. Because the agent is running in that box, unless you otherwise design it, all of your secrets need to go into that box as well. And given the nature of AI, it can be unpredictable, and you could very easily end up accidentally exfilling your secrets, or other unintended behavior. Now, the out of the box is the idea that we are going to have the actual agent running not directly in the sandbox, and we will have, quote-unquote, the brain of the agent running in some type of worke