The claim before the metric

Why better measurement starts by asking what you need to be able to say.

A product team I worked with had a problem they could not quite name.

Their conversion rate had improved by eleven per cent over the quarter. The dashboards looked clean. Leadership was pleased. The number had moved in the right direction.

The first question, of course, was attribution. That quarter had included three meaningful changes: a redesigned signup flow, a new pricing page, and a paid acquisition channel that brought in a different kind of user. A more disciplined plan would have made those effects easier to distinguish: sequencing the changes, holding back a control, tracking cohorts separately, or analysing the acquisition channel independently. Standard hygiene, and it mattered.

But that was the surface issue. The deeper question was what they wanted to be able to claim.

Did they want to say conversion had increased? That was true. Did they want to say the signup redesign had caused the increase? The evidence did not support that. Did they want to say the people who converted received more value afterwards? That would require a different kind of evidence again, and they had not collected it.

The team had measurement. They did not yet have evidence for the claim they wanted to make.

This is not a rare situation. It is one of the most common situations I encounter in product organisations, including teams with mature analytics and experimentation practices. The surface symptoms vary. The underlying issue is the same. Teams have data, and they have dashboards, and they often have lots of both. What they lack is a clear sense of what their data actually allows them to say.

The problem is that we have learned to treat measurement as if it were automatically evidence for the claim we want to make.

It is not.

Metrics are not neutral

Every metric carries an implied claim. The claim is usually unspoken, often unexamined, and frequently stronger than the data can support.

Feature adoption implies users found something useful enough to engage with. But adoption can be driven by curiosity, prominence in the interface, the path of least resistance, a lack of alternatives, or a one-off campaign. The metric may be real. The claim about usefulness may or may not be.

Conversion rate implies that an experience improved enough to drive a behavioural change. But conversion can move because of pricing, traffic mix, seasonality, a campaign, a competitor's outage, or a measurement artefact. The number can be accurate while the interpretation is wrong.

Satisfaction scores imply users valued the service. They may have. They may also reflect demand effects, the wording of the question, the moment in the journey when the survey appeared, or the fact that dissatisfied users had already left and were not present to answer.

Outcome movement implies that something changed for participants. In a social programme, that movement may be connected to the intervention. It may also be connected to life events, regression to the mean, selection effects, or the fact that people who were likely to improve anyway happened to be enrolled.

None of this is an argument against measurement. It is an argument for being precise about what each measurement entitles us to claim. A satisfaction score is a real signal. It is a signal about a specific thing. Treating it as evidence for something else is where teams get into trouble.

The dashboard is not the evidence

A dashboard shows movement. It cannot, by itself, decide what that movement means.

This sounds obvious when stated directly. It is much less obvious in practice, because the visual authority of a well-designed dashboard tends to overwhelm the methodological caution underneath it.

A green arrow pointing up reads as good news. A neat line chart implies clarity. A percentage change, presented confidently in a steering meeting, can feel like a conclusion rather than an invitation to ask better questions. Most people looking at a dashboard for thirty seconds are not going to interrogate whether the data supports the implied claim. They are going to act on the impression.

The result, across organisations, is a particular kind of false confidence. Teams ship things, observe metric movement, attribute the movement to what they shipped, and update their roadmaps accordingly. Sometimes that attribution is correct. Often it is partially correct. Sometimes it is entirely wrong, and the team is now optimising in the wrong direction with high confidence.

The remedy is rarely just a better dashboard or a more sophisticated analysis layer. Those can improve visibility, but they do not resolve the prior question of what the evidence is evidence for.

A metric without a clear claim is just a number waiting to be overinterpreted.

What claim discipline looks like

Claim discipline is a small set of questions, asked in order, before the measurement work begins.

The first is: what are we trying to say? Not what are we trying to measure. What is the statement we want to be able to make at the end of this work, to whom, and in support of what decision?

The second is: what standard of evidence does that statement require? A roadmap conversation can be supported by directional signals. A scaling decision needs more. A claim that becomes part of a regulatory submission, a foundation grant report, an investment case, or an external case for a programme needs more again. The standard of evidence should be set by the decision the claim will inform, not by what is convenient to collect.

The third, and often the most useful, is: what weaker statement could we honestly make today? This separates the strong claim a team would like to make from the genuine claim their evidence currently supports. A team may want to say "this change improved user activation," but the evidence may only support "activation increased during the period in which this change was released." Those are different statements. The gap between them is the work.

The fourth is: what would need to change for us to support the stronger claim? Sometimes the answer is better instrumentation. Sometimes it is a different study design. Sometimes it is a comparison group. Sometimes it is implementation data that distinguishes one variant from another. Sometimes it is pre-specification of what counts as a positive result. Sometimes it is simply waiting long enough to know whether the effect persists.

The answer is often structural. More data of the same kind will not fix a design that cannot support the claim being made. A team that runs an A/B test on an audience that does not represent the population they care about cannot generalise the result, no matter how clean the statistics. A charity that only surveys participants at the end of a programme cannot make a robust contribution claim on that evidence alone, no matter how positive the responses. The structural problem sits upstream of the data.

The claim is a design material

This matters particularly for designers, and particularly for design leaders, because we often sit in the part of the organisation where claims are being formed implicitly all the time.

Designers shape what gets tested. They define variants, entry points, populations, journeys, success states, and moments of measurement. They influence which metrics become primary and which become secondary. They produce the dashboards, prototypes, artefacts, and narratives that translate raw information into the impressions that drive decisions. The design work is not separate from the experimental and measurement apparatus. It is part of it.

A confusing intervention produces unclear evidence. A treatment that includes too many simultaneous changes makes attribution difficult. A poorly timed prompt produces misleading data. A success metric that ignores excluded users creates false confidence. A dashboard that presents weak evidence with visual certainty can make a tentative signal feel like proof.

This means the claim itself is something designers can shape, refine, and sharpen, the way they shape any other design material. A vague claim might say: "we believe this tool will help teams make better decisions." A claim shaped with more care might say: "teams using this tool can identify priority issues faster and with greater confidence than they could using the previous process." The second sentence forces better questions. What does "better" mean? Faster by how much? Greater confidence measured how? Compared with what? For which teams? In what context? Over what time period? What would count as failure?

The claim sharpens the product, the experiment, and the evidence. It also protects the organisation from overstating what has been learned. Perhaps the tool improves confidence but not speed. Perhaps it helps experienced teams but confuses new ones. Perhaps the evidence supports a usability claim, but not yet an outcome claim. That is not a failure of measurement. It is what good measurement is for.

The teams I have seen handle this well are not always the ones with the most sophisticated analytics. They are the ones where someone, often a designer, researcher, or product leader, asks the awkward question early. What are we actually claiming here? What evidence would make that claim credible? What evidence would falsify it? What else could explain what we are seeing? Those questions slow teams down at the start and speed them up across the year.

Better claims create better learning

The point of claim discipline is not to make organisations cautious for the sake of it. It is to make learning more useful.

When claims are vague, evidence becomes political. Teams argue about whether the numbers are good enough, whether a result counts, whether the dashboard is telling the right story, whether the board will accept it, whether the funder will understand it.

When claims are precise, learning becomes easier. The organisation can see what is known, what is not known, and what kind of evidence would reduce uncertainty. That creates better product decisions, better funding conversations, better programme design, and more honest conversations about value.

This way of thinking shapes a lot of what I am working on at the moment, including recent Studiologie work on evidence systems for social sector organisations. But the discipline applies wherever teams are trying to connect evidence, decisions, and change.

  • Define the claim before choosing the metric.
  • Understand the decision before building the dashboard.
  • Ask what kind of evidence would make the answer meaningful before asking whether the number has moved.

It is slower at the start. It is much faster, and much more honest, across the life of the work.