You're Reading DORA Backwards
7 min read
Let me save you six months of pain: you don’t get better by shipping faster. You ship faster by getting better.
I know that sounds obvious. But I’ve watched team after team, company after company, read the DORA research and walk away with exactly the wrong conclusion. They see that elite performers deploy multiple times per day, and they think, “We need to deploy more often.” They see that elite performers have a change failure rate under 5%, and they think, “We need fewer failures.”
No kidding. That’s like reading that Olympic swimmers have low body fat and deciding the path to a gold medal is to stop eating.
DORA metrics are outcomes. They are the scoreboard. They are not the playbook. And treating the scoreboard like a playbook is one of the most expensive mistakes an engineering leader can make.
The Four Metrics, Backwards
Let’s walk through each one, because every single DORA metric has the same problem when read wrong.
Deployment Frequency
What the research says: Elite teams deploy on demand, often multiple times per day.
What leaders hear: “We need to deploy more often.”
What they do: Push the team to ship faster. Shorten release cycles. Pressure for daily deploys. Maybe throw in a “DevOps transformation” initiative. Maybe buy a tool.
What actually happens: The team ships more often, but the work isn’t smaller. It’s just less finished. Deployments go out with half-baked features behind flags. Bugs increase. Rollbacks increase. The team is now worse at delivery, but the deployment frequency number went up, so the slide deck looks great.
What actually produces high deployment frequency: Work that’s small enough to ship safely. Stories that are properly split. A CI/CD pipeline that’s trustworthy. An architecture that supports independent deployment. A team that has confidence in their test coverage. Deployment frequency is the exhaust of a well-running engine. You don’t make the engine faster by blowing more exhaust.
Lead Time for Changes
What the research says: Elite teams go from commit to production in under an hour.
What leaders hear: “We need to move faster.”
What they do: Strip out review steps. Reduce testing. Pressure developers to “just ship it.” Remove what they call “bottlenecks,” which are often the only quality gates keeping production stable.
What actually happens: Code reaches production faster. Code also breaks production faster. The team spends more time firefighting than building. Morale drops. The best developers start updating their LinkedIn profiles.
What actually produces short lead times: Removing friction, not safeguards. Automated testing that actually tests something. Code review processes that are fast because the changes are small and well-scoped, not because nobody’s looking. Sane architecture that means your change doesn’t require coordinating with three other teams. Clear requirements that mean developers build the right thing the first time instead of reworking it twice.
Change Failure Rate
What the research says: Elite teams have a change failure rate between 0-15%.
What leaders hear: “We need fewer failures.”
What they do: Add more approval gates. Require more sign-offs. Implement heavyweight change management processes. Create a Change Advisory Board that meets weekly to review deployments. Congratulations. You’ve just invented waterfall with extra steps.
What actually happens: Deployments slow to a crawl. Developers batch changes into larger, riskier releases because the cost of going through the process is so high. When those big releases inevitably break, they break spectacularly. Change failure rate might go down in frequency, but the blast radius goes up. You’ve traded paper cuts for compound fractures.
What actually produces low change failure rates: Small changes that are easy to understand and easy to roll back. Automated tests that catch regressions before production does. Standards and architecture patterns that make it hard to write dangerous code. Developers who understand the system they’re changing, which means breaking down knowledge silos and doing real code review, not rubber-stamp approvals.
Mean Time to Recovery (MTTR)
What the research says: Elite teams recover from failures in under an hour.
What leaders hear: “We need to fix things faster when they break.”
What they do: Buy monitoring tools. Implement on-call rotations. Create incident response playbooks. Invest in observability platforms. None of which is wrong, exactly. But it’s focused entirely on the response and not at all on the design.
What actually happens: The team gets better at putting out fires. The number of fires doesn’t change. You’ve built a really efficient fire department for a building with no sprinkler system.
What actually produces fast recovery: Systems that are designed for failure. Feature flags that let you turn things off without redeploying. Architecture that isolates failures instead of cascading them. Small deployments that make it obvious what changed. Automated rollback capability. And (here’s the unsexy part) fewer failures in the first place, because the work going in is well-tested and well-scoped.
The Cargo Cult Problem
There’s a pattern here, and it’s the same pattern I see with companies that try to copy FAANG practices.
They read that Google does trunk-based development. They read that Netflix deploys thousands of times a day. They read that Spotify has squads and tribes. And they think, “If we do those things, we’ll get those results.”
But Google has thousands of engineers, a custom build system, and decades of investment in testing infrastructure. Netflix has a culture of radical autonomy backed by radical accountability. Spotify has publicly said their “model” was aspirational documentation, not a description of how they actually worked.
A 40-person engineering org adopting Spotify’s squad model is like a garage band adopting Metallica’s tour rider. The outputs are not the inputs. The visible practices are not the invisible foundations.
DORA metrics have the exact same problem. The research describes what elite performance looks like. It does not describe how to get there. And the distance between those two things is where millions of dollars in failed “transformations” go to die.
What the Metrics Are Actually Good For
I’m not saying DORA metrics are useless. I’m saying they’re useful as a diagnostic, not a prescription.
If your deployment frequency is low, that’s a signal. It tells you to go look at why. Is the work too big? Is the pipeline too slow? Is the team afraid to ship because they don’t trust their tests?
If your change failure rate is high, that’s a signal. Go look at the testing strategy, the code review process, the story splitting discipline, the architecture.
If your lead time is long, that’s a signal. Go find the friction. Is it in review? In testing? In deployment? In requirements that keep changing mid-sprint?
The metric tells you where to look. It does not tell you what to do. And the moment you start optimizing for the metric instead of fixing the underlying capability, you’ve lost the plot.
The Uncomfortable Truth
Here’s the part nobody wants to hear: the things that actually produce elite DORA metrics are boring. They’re not tool purchases. They’re not reorganizations. They’re not “transformations.”
They’re:
- Splitting stories so work is actually completable in a sprint
- Writing real acceptance criteria so developers build the right thing
- Doing genuine code review so knowledge spreads and quality improves
- Managing tech debt intentionally so the codebase doesn’t fight you
- Setting AI coding standards so the new tools help instead of creating new problems
- Building team practices that reduce friction instead of adding ceremony
These things don’t make good conference talks. They don’t fit on a vendor’s slide deck. They don’t have acronyms or certifications. They’re just the slow, deliberate, unsexy work of getting better at the fundamentals.
But they’re what actually moves the scoreboard.
Stop Reading the Scoreboard
If your leadership team is looking at DORA metrics and building a roadmap to improve them, they’re working backwards. You don’t improve the metrics. You improve the team’s capability, and the metrics follow.
You don’t get fit by staring at the scale. You get fit by changing what you eat and how you train. The scale just confirms it’s working.
DORA tells you what healthy looks like. It doesn’t tell you how to get healthy. That part is on you. And it starts with looking at the systems your team operates in, not the numbers they produce.
The numbers are the scoreboard. Fix the game.
Coach's Playbook
AI workflows, team systems, and engineering leadership. Practical. Actionable. Weekly. Get it in your inbox — free.
Subscribe to Coach's Playbook →