Andrej Karpathy Rebranded Vibe Coding. Here’s What Engineering Leaders Need To Do About It.


In the one year celebration of “vibe coding,” Andrej Karpathy proposed to replace it with “engineering engineering.” The distinction he drew was precise: writing with a vibe is defining what you want and receiving what comes back. Engineering engineers design the system, clarify the constraints, and use AI to accelerate the implementation that you have already thought about. Another speech. The rest is engineering.
Many software organizations operate both at the same time and call themselves the same thing. This is where costly mistakes come from.
One of my progressive approaches is putting it bluntly – not as a policy position, but as a visual observation. In his experience, vibe-coded PRs always come down to empty case management, error paths, and a different mindset. Not because the AI forgot them.; it’s because the developer never made it clear. They explained the result, accepted what the agent produced because it looked right, and sent it on. Tests are successful because they are written against existing code, not against the behavior required by the system.
The agent did not include anything. The developer didn’t know what to ask for.
His answer is not to reject AI coding tools. It is to require developers to demonstrate that they understand the product – edge cases, scaling assumptions, failure modes – before the PR is integrated. If you can’t explain why a solution is designed the way it is, you haven’t designed it. He accepted you.
You are right. And the data backs him up. PR review times on AI-assisted teams increased by 91% – not because the AI is writing worse code, but because reviewers are now responsible for rebuilding the understanding that developers missed. That’s a tough review, not an easy one. And it’s inclusive.
What AI Does in Roles — And What It Doesn’t
There is a widespread perception among technology leaders that AI coding tools have broken down the distinction between who builds and who reviews – that the agent writes well enough that the old quality gateways are a legacy of the slow age.
That thinking confuses speed and understanding.
Engineer, tester, builder – these roles are never about producing artifacts. They were about understanding the system well enough to know if something was wrong before it became someone else’s problem. A developer who sees the state of the race saw it because he understood the killing model. A tester who asks “what happens if the user does something unexpected?” they asked you because they were discussing the behavior of the system. The architect who realizes that this solution is working now and will break at scale saw it because they hold the whole system in their head.
These are not production jobs. They are cognitive functions. You cannot send an insight to an agent.
What has changed is that you can now generate a hundred lines of code without doing the thinking that a hundred lines of code required. The output is there. The understanding behind it may not be. A developer who reviews a vibe-coded PR isn’t reviewing the code – he’s trying to reconstruct whether the developer who submitted it really understands what they’re building.
The roles are not completed. They are stress tested. The developer who designed the solution – can define every edge case, every failure mode, every scaling consideration – more important than ever. The one who accepted what the agent produced because it looks good and passed the tests is still responsible for the speed at which the organization moves.
Three Failure Paths Engineering Managers Need to Watch
This is not imaginary. Repeating patterns across organizations deploying AI coding tools at scale.
The problem of green pipes. A green pipe means that the code is doing what it was asked to do. It doesn’t mean the developer asked the right thing, or asked completely enough. A great engineer knows how to look behind the scenes. A manager who is too far down on the job can’t tell from the dashboard whether green means safe or fast and untested.
Missing path problem. A developer who doesn’t understand system failure modes can’t figure them out. The agent can’t reveal what the developer didn’t know he needed. In a manufacturing process, the fun part is where things work. The unpleasant ways are where you find out what the show is really made of. AI agents, as Karpathy notes, are built for the first 80% of the application’s purpose – implementation that flows naturally from a well-defined purpose. The last 20% – edge cases, failure detection, scaling constraints – requires an engineer who has thought system. That 20% is where the vibe-coded ends.
The problem of measuring confidence. AI-generated code reads authoritatively. The structure is clean, the composition is coherent, the ideas are there. It doesn’t look like code written by someone who wasn’t sure – even when the basic idea consists of betting that something won’t happen. Human code bears the fingerprint of doubt: the comment “TODO: handle this case,” a defensive check that indicates the developer is unsure. AI code usually doesn’t have those characteristics. Reviewers should provide the skepticism themselves. That requires judgment that the reviewer can only use if he understands the system well enough to know what to question.
What Engineering Leaders Need to Do Differently
There is a version of technical leadership that sounds complicated and quietly dangerous in this environment: a manager who has retreated from the code to focus on delivery metrics, measures the AI system by speed numbers and adoption rates, and interprets the senior engineer’s insistence on deep code reviews as resistance to change.
That manager optimizes the output of the process rather than the quality of the decision applied to it. In a fast-moving AI environment, that’s a compound error.
Proximity to technology is no small feat. It’s not about writing code or reviewing every PR. It’s close enough to the actual behavior of automated systems that you can tell the difference between a team that goes fast because they’re self-directed and a team that goes fast because they skipped the hard part.
A manager who can’t read PR doesn’t need to review it all. But they need to understand what their senior engineers are looking for when they do. That distinction – between “this passed the tests” and “this is fine” – is missing from the summary. Available for contact.
My team does three rituals that have nothing to do with status updates and everything to do with maintaining that connection.
Two hours each week in an architectural practice session. Two hours every week on sprint planning. Two hours each sprint breaks down the entire team.
Architecture sessions are where systems thinking lives – not tickets, not documents, but a lively conversation about why things are designed the way they are and what options weren’t taken. A manager who sits in those sessions for six months builds a working model of a system that no dashboard can replicate.
Sprint planning is where it cuts across the board. We use editorial poker — everyone rates independently before the reveal. When the ratings are so different, the next conversation is almost always the most important in the run. Not because we are discussing a number. Because different dimensions mean different mental models. One thinks that this work is 2. One thinks it’s 13. That gap is not a disagreement about effort. It is proof that no two people are looking at the same problem.
Different scales do not measure complexity. Measure where your team’s understanding of the system breaks down.
Demos keep everyone honest about what was actually built versus what was intended, train the team against what each person is working on, and give the manager the most important signal of all: whether the people building the system can explain what they built and why the trades they made were right.
An AI agent can generate a demo. It cannot explain its thinking when asked. Developers who know are the ones you can’t afford to route.
Karpathy’s reassignment from vibe writing to agency engineering isn’t an update of names. It is a work obligation.
Organizations that ignore AI will fall behind. Those who hear it will send failure to the level. Those who engineer – deliberately, with insight at every layer – are the ones who build something that should work in production.
That is not a discussion of productivity. That’s a discussion of responsible AI. The code looks complete. The pipe is green. PR is open.
Whether it’s ready is still a human call. Make sure your team – and you – are close enough to the task to get it done.



