99% Correct Is Still Failure: The Last Mile for Mission-Critical AI

AI can now write code faster than any human alive, and most of the time it’s more than good enough. That’s the magic powering the entire vibe coding wave. But there’s a category of software where “most of the time” just doesn’t cut it: the code running a fighter jet, a power grid, an autonomous vehicle, a piece of medical hardware. When that code is wrong, the consequences aren’t a bug. They’re a recall, an accident, a national security incident. In this episode of Talking AI, Matt Paige sits down with Ryan Aytay, the former CEO of Tableau and now President and COO of CodeMetal, which just raised $125 million to close that gap. Ryan explains what he calls “the last mile” for mission-critical industries: the verification, validation, and provability layer that sits between AI-generated code and the systems where failure is catastrophic.

The last mile is where failure hides.

AI code generation tools like Claude Code, Cursor, and Codex are excellent at getting you to 70, 80, 90, even 99% correct. But for mission-critical systems, that remaining fraction is where catastrophic failures live. Every AI tool, when asked if it can guarantee production-ready code, will say “almost, but not quite.” CodeMetal exists to close that gap with provable correctness, not just confidence scores.

It’s a behavioral problem, not a coding problem.

One of Ryan’s sharpest insights is that the real challenge isn’t whether AI can write syntactically correct code. It’s whether a million lines of translated legacy C++ will behave exactly the same way in production on the same hardware. That behavioral assurance at scale is what no code generation tool currently provides, and it’s the core of what CodeMetal delivers through formal methods, fuzzing, concolic testing, and hardware-in-the-loop validation.

Rewiring the city without the power going out.

The most vivid example Ryan shares is a customer with over a million lines of legacy C++ that needed to be translated to Rust while running on the same hardware system. He describes it as wanting to rewire a city without the power going out, a task that was essentially impossible until now. CodeMetal completed it in weeks with zero behavioral changes and provable correctness.

Prove is a stronger word than guarantee.

Ryan draws a meaningful distinction between guaranteeing that code works and proving it. CodeMetal’s approach uses formal verification methods, mathematical proof rather than just testing, to demonstrate that translated code behaves identically to the original. That provability is what makes the value proposition tangible for defense, automotive, and semiconductor customers, and it’s what enables outcomes-based pricing rather than traditional per-seat licensing.

The biggest risk is doing nothing.

Ryan’s closing advice for leaders is direct: the biggest risk for people, companies, and nations isn’t that AI makes a mistake. It’s that they don’t use it at all. He encourages everyone to try every tool, compare results, educate themselves, and stay in the conversation. The learning curve is steep, but if you take a break from it, you’ll fall behind. And if something doesn’t work today, check back in a week, because it probably will. Watch or listen to the full episode: https://hatchworks.com/talking-ai/

Key Moments

  • 00:00 – Matt’s intro: the gap between vibe coding and mission-critical code
  • 02:47 – From Tableau fanboy to the trust gap in AI
  • 03:52 – Why Ryan left Salesforce/Tableau for CodeMetal
  • 05:55 – “Is it safe for the things I depend on every day?”
  • 06:45 – 99% correct is still failure for mission-critical systems
  • 08:20 – The sycophantic nature of AI: “Heck yeah, I can do that”
  • 09:22 – It’s not a coding problem, it’s a behavioral problem at scale
  • 11:22 – Human in the loop isn’t enough: hardware in the loop
  • 14:30 – What is fuzzing? Formal methods explained in plain English
  • 16:02 – How a sub-100-person company leverages AI across every function
  • 18:19 – The Shopify mandate: using AI reflexively
  • 21:33 – Rewiring the city without the power going out: the million-line translation
  • 24:38 – Defense use cases: drones, autonomous vehicles, and simulation
  • 26:28 – “Prove is even a stronger word than guarantee”
  • 28:32 – Accountability and the coming wave of AI insurance
  • 32:54 – Token usage, the Uber CTO’s blown budget, and outcomes-based pricing
  • 36:26 – SaaS isn’t dead, it’s evolving: Ryan’s Salesforce/Tableau perspective
  • 40:08 – The biggest risk is doing nothing
  • 42:07 – Where to find CodeMetal (and they’re hiring)

Key Links