Why switching from story points to hours won't fix your estimates

The planning session has been running for twenty minutes on a single ticket. Two developers have voted 5 and 13 in three consecutive rounds. Finally someone says it: “This is why I hate story points. Can we just estimate in hours? At least that’s something concrete.”

It’s a reasonable impulse. Hours are real. Hours connect to calendars and deadlines. Hours are something a project manager can put in a spreadsheet.

But switching to hours doesn’t fix what just happened in that room.

The disagreement that didn’t go away

When your two developers voted 5 and 13, they weren’t disagreeing about a number. They were answering different questions.

The developer who voted 5 was thinking about effort. It’s a well-scoped change, a couple of files, probably half a day of actual work. The developer who voted 13 was thinking about risk. That code path touches the payment service, and she’s watched what happens when someone deploys to it without checking the three systems it feeds into.

Ask them in hours, and you get “four hours” and “two days.” You’ve swapped the unit and kept the disagreement.

The root problem isn’t what you’re measuring with. It’s that you’re using one measurement for what are actually several different questions — questions that don’t always move together, and that different people on the team are often answering differently without knowing it.

What a single number can’t carry

A ticket can be low-effort and high-risk: one line of code in the payment service. Another can be high-effort and low-risk: a tedious but completely understood data migration you’ve run before. A third can be low on both dimensions but so uncertain that nobody knows what they’re building until they start.

These tickets might all score the same story points. They’d probably score similar hours. But they carry completely different sprint risk profiles.

Effort, risk, complexity, and uncertainty don’t compress cleanly into a single scale. Effort is about how much work. Risk is about what external forces could derail it. Complexity is about how hard the code is to understand and navigate safely. Uncertainty is about how many unknowns you’re walking into. High on any one of them means the ticket is hard. High on two means it’s a sprint risk. But they’re not interchangeable — and collapsing them into one number loses the information about which dimension is the problem.

That’s the information you need to make good sprint decisions. It’s what disappears when you argue about whether the ticket is a 5 or 20 hours. The failure is especially sharp for integration-heavy work, where implementation effort is rarely the dimension that breaks the sprint.

What separating the questions does

When you vote on dimensions separately — effort first, then risk, then complexity, then uncertainty — the planning meeting changes character.

The developer who voted 5 and the developer who voted 13 often agree once you ask them separately. Low effort, high risk: they’re on the same page about both, they just got forced to compress two assessments into one card. The spread in the vote wasn’t a stalemate — it was information pointing at two different things simultaneously.

Now you have something to act on. Risk is high, effort is low: assign this to the person who knows the payment service, budget time for testing, and flag it before it hits the sprint. That’s a different planning conversation than negotiating between 5 and 13.

Hours doesn’t give you this. Neither does any other single-number system. The problem isn’t the unit — it’s the dimension count.

The debate that keeps going

The story points vs hours argument has been running since agile teams started counting sprints. Teams switch from one to the other, find the same estimation failures on the other side, and sometimes switch back. Both sides have reasonable points. Neither side is asking the right question.

The question isn’t which unit to use. It’s whether your estimation process surfaces what makes a ticket hard before you commit it to a sprint. Part of why this keeps going is that the word “complexity” does too much work — it became a catch-all for four distinct dimensions that need separate votes.

If you want to try this with your team — Estimate Well runs structured multi-dimensional estimation sessions where the team votes separately on effort, risk, complexity, and uncertainty before arriving at a final number. Free, no account needed. Share a link and you’re in.