Probabilistic Thinking

Base Rates

Anchor predictions in how often this kind of thing actually happens. The single most undervalued number in any forecast — and the one your intuition almost always ignores.

12min readTopicPrediction & probabilityLevelIntroductory

Here's a question. A young man you've just met is quiet, neat, detail-oriented, and tells you he likes long solitary walks and reading. Is he more likely to be a librarian or a farmer?

Most people say librarian. The description sounds like a librarian. But consider the numbers: in the United States, there are roughly twenty farmers for every librarian. Even if the "librarian personality" is three times more common among librarians than among farmers — which is a generous assumption — there are still far more quiet, neat, book-loving farmers in absolute terms than there are librarians of any description. The correct answer, by a wide margin, is farmer.

This little puzzle, adapted from a famous experiment by Daniel Kahneman and Amos Tversky, is the clearest demonstration of what psychologists call the base rate fallacy: our tendency to reason from the vividness of a description while ignoring how common or rare the underlying category is. The "base rate" is just the prior probability — how often this kind of thing happens in the world before we know anything specific. And when we ignore it, our predictions get spectacularly, systematically wrong.

This essay is about that number. Where it comes from, why it's so hard to remember, how professionals in medicine, finance, and forecasting use it to stay calibrated, and how you can train yourself to reach for it before anything else.

What a base rate actually is

A base rate is just a frequency. It answers the question: among all the cases like this, how often does the thing I'm asking about happen? It is the view from thirty thousand feet — what's true of the population — before you zoom in on any specific individual, deal, or diagnosis.

Take the librarian question apart. The "description" — quiet, neat, bookish — is what Kahneman calls specific evidence. It feels informative. It feels like it's telling you something. But to turn that evidence into a good prediction, you have to weight it against the base rate. How many librarians exist vs. how many farmers? That ratio matters enormously, and it's the number our brains skip over.

The description feels like strong evidence. But specificity without base-rate context is a mirage — rarity in the population overwhelms "fit."

The underlying principle generalizes way beyond librarians. Any time you're trying to predict whether something is true of an individual case — is this patient sick, will this startup succeed, is this email a scam, will this hire work out — you need two numbers. First: how often is this true in general (the base rate)? Second: how much does the specific evidence in front of me shift that number up or down? Without the first, the second is noise.

Why we systematically ignore them

The base rate fallacy isn't laziness. It's a predictable feature of how human cognition works. Kahneman and Tversky built a career on mapping its shape, and the core mechanism is this: we replace the hard question with an easier one.

The hard question — "given the population this person belongs to, weighted by how much this evidence updates the probability, what's the likelihood?" — requires arithmetic and a reference class. So the mind substitutes a different question: "how much does this description remind me of the category?" That's easy. You just check your internal stereotype. It feels like you've answered the question, when in fact you've answered a totally different one.

The core bias

When specific, vivid evidence is available, the human mind treats the base rate as if it doesn't exist. The story overrides the statistics — even when we're told the statistics explicitly.

What makes this worse is that vivid evidence almost always feels more trustworthy than statistics. A detailed anecdote about one startup founder triggers a confident prediction; knowing that 90% of startups fail doesn't, even though that second number is vastly more predictive. Journalists know this. Politicians know this. Your own memory exploits this against you every day. One salient example beats a thousand data points, and that's precisely why base-rate thinking has to be a discipline rather than an instinct.

Medicine: the mammogram problem

Perhaps the most famous base-rate problem in the world involves a medical test. It's worth walking through carefully because it shows, in undeniable numbers, how badly intuition fails.

Classic Problem

The mammogram puzzle

A woman in her forties takes a routine mammogram. The test is 90% accurate: it correctly identifies 90% of cancers, and falsely flags only about 9% of healthy women. Her test comes back positive. What's the chance she actually has breast cancer?

Most people — including most doctors, in a famous 1978 Harvard study — say around 80 or 90%. The actual answer is closer to 9%.

The reason is the base rate. Among women in their forties, the prior probability of having breast cancer at any given moment is about 1%. Out of 1,000 such women, only around 10 have cancer. Of those 10, the test catches 9 (90% sensitivity). But the other 990 healthy women also get tested — and 9% of them, nearly 89 women, get a false positive. So when you add it up: about 98 positive tests, of which only 9 are real. That's about 9%.

A 90% accurate test on a 1% base rate produces mostly false alarms. The rarity of the condition swamps the accuracy of the test.

The result is counterintuitive but the math is airtight, and it has enormous practical consequences. Screening programs for rare conditions — HIV in low-risk populations, rare genetic disorders, certain cancers in younger demographics — produce enormous numbers of false positives, leading to unnecessary biopsies, anxiety, and sometimes treatment of conditions that don't exist. Doctors who understand base rates communicate test results differently: "positive" is not a diagnosis; it's a nudge, and how big a nudge depends on who you are.

The general rule

When testing for rare things, even accurate tests produce mostly false positives. The rarer the condition, the more cautiously you should interpret a positive result. This is why doctors don't scan asymptomatic patients indiscriminately — the math guarantees you'll cause more harm than you find.

Business: the startup question

Every founder who pitches an investor is making an implicit bet: my startup is different. The investor, if they're any good, has a base rate in their head: most startups fail. The question is never whether your specific pitch is compelling — every pitch is compelling, that's the whole point of pitches. The question is: how much should this specific evidence move me off the baseline of "probably fails"?

~90%

of startups fail within 10 years

~75%

of VC-backed startups never return investor capital

~1%

become companies valued over $100M

These aren't pessimistic numbers — they're just the base rate. A good investor doesn't approach a pitch asking "will this work?" They ask "does this founder, in this market, with this traction, look like one of the 10% of outliers — and does the evidence in front of me justify overriding the 90% prior?" That framing forces calibration. It's why experienced investors pass on deals that sound exciting and back deals that sound boring: they're reasoning from frequencies, not narratives.

Hiring

Hiring managers do the same math, whether they know it or not. The base rate for successful senior hires at most companies sits between 50% and 70%. A candidate who performs brilliantly in a single interview is not necessarily above that baseline — interviewing is a skill, and the correlation between interview performance and job performance is famously weak. The base-rate-aware approach is to structure interviews around work samples, to call references, and to treat first-impression charisma with skepticism. The magnetic candidate is not a separate category from the base rate. They are a sample from it.

Project estimation

When a team estimates how long a project will take, they almost always go inside the project: they look at the specific tasks, imagine how they'll go, and add them up. This is called the "inside view," and it is reliably optimistic. The base-rate approach — the "outside view," a phrase coined by Kahneman — is to ask: how long have projects like this actually taken in the past? That number is almost always longer. Software engineers building their sixth release feature can still honestly believe this one will take two weeks, when every previous comparable feature took eight. The base rate is sitting right there in their own history, and they still don't use it.

The inside view tells you how it will go. The outside view tells you how it usually goes. The difference between them is your overconfidence.

Investing and forecasting

Markets punish people who ignore base rates with almost theological regularity. Retail investors routinely believe they can pick winning stocks; the base rate for doing so, over multi-year periods, is roughly the same as a coin flip — and after fees, worse. Professional fund managers believe they can beat their benchmarks; the long-run base rate is that most of them don't. Somewhere between 80 and 90% of actively managed equity funds underperform their index over 15-year windows. This is not a secret. It is not hidden data. It is one of the most thoroughly replicated findings in finance. And yet individual investors, and many professionals, continue to treat themselves as the exception.

The best forecasters in the world — the kind Philip Tetlock identified in his "superforecaster" research — reason differently. They start every forecast by asking: what's the base rate for something like this? Only then do they update on specifics. This is the "outside view first, inside view second" discipline. It sounds simple, but almost nobody does it consistently.

The inside view is seductive because it feels tailored. The outside view is powerful because it's based on what has actually happened.

Everyday decisions

You use base rates constantly without noticing, and you misuse them constantly without noticing. Both are worth making conscious.

When you hear a loud noise in a city, you assume construction or a car backfiring, not a gunshot. That's base-rate thinking: gunshots are rare, construction noises common. When you meet someone new who mentions they're a dentist, you assume they went to dental school, not that they're lying. Base rates — most people telling you their profession are telling the truth. When your flight is delayed, you assume weather or mechanical issues, not a crisis, because those are the common causes.

But the misuses are more interesting. Media coverage systematically distorts your sense of base rates in ways that degrade your decisions. Plane crashes are rare, but each one is reported globally, so people overestimate the risk of flying and drive instead — which is, by base rate, far more dangerous. Crime rates in most developed countries have been falling for decades, but news coverage of violent crime has not, so public perception of danger stays stubbornly high. Kidnapping by strangers is vanishingly rare; most missing children are with a parent or someone they know. And so on. When vivid stories determine your felt sense of how often something happens, your base-rate calibration drifts, sometimes catastrophically.

Be suspicious of

Anything that makes a specific story vivid while burying the statistics — true crime podcasts, cable news, viral social media threads about rare events. These are not lying, but they're not calibrating you either. The implicit "this happens" can be true while the implicit "this happens often" is wildly false.

Finding the right reference class

Here's the catch the naive version of base-rate thinking misses: the base rate depends on which reference class you pick. And there's no single right answer.

Take a concrete case. You're evaluating a new restaurant opening in your neighborhood. What's the base rate for failure? It depends on what you consider the relevant reference class:

Tightening the reference class — from "all restaurants" to "funded group with proven operator" — radically changes the base rate. The art is picking the narrowest class you have reliable data for.

Pick too wide a class ("all restaurants") and your base rate is accurate in general but ignores meaningful information about this specific case. Pick too narrow a class ("restaurants opened by this exact operator in this exact neighborhood with this exact concept") and you have almost no data — maybe a sample size of one, or zero. The art is finding the narrowest reference class for which you still have enough historical cases to extract a reliable frequency.

This is the technique Bent Flyvbjerg calls "reference class forecasting," and it's been used to predict the cost and timeline of large infrastructure projects far more accurately than traditional inside-view estimation. The UK government now requires it for major projects, precisely because planners are chronically over-optimistic and base rates from similar projects are the corrective.

Where base rates fail

Base-rate thinking isn't magic, and it has real limitations worth naming.

Truly novel events have no reference class. The first time something happens — a genuinely new technology, a first-of-its-kind geopolitical event — the historical base rate doesn't exist, or is too thin to trust. People who insisted on base-rate thinking alone would have missed the early internet, the first smartphones, the COVID-19 pandemic. For novel situations, you need analogy-building and causal reasoning, not just frequency lookup.

Base rates change. The frequency of something in the past is not always a good guide to its frequency now. Crime rates, disease rates, industry failure rates — all shift over time as conditions change. A base rate from 1990 may be misleading in 2026, and using stale base rates with false confidence is its own failure mode.

Strong specific evidence can legitimately override base rates. If the patient is a 25-year-old woman with no risk factors but is pregnant and has severe right-sided abdominal pain, you don't stop at "appendicitis is more common" — you consider ectopic pregnancy because the specific evidence shifts the probabilities dramatically. The base rate is the starting point, not the ceiling.

Some individuals are legitimate outliers. If you only reasoned from base rates, you'd never fund a startup, never make a risky career bet, never back an underdog. The base rate for writing a bestselling novel is terrible. The base rate for starting a company that changes an industry is terrible. Sometimes you're betting on the tail of the distribution, and that requires evidence strong enough to override the prior — not a refusal to engage with the prior at all.

How to actually use them

Base-rate thinking becomes a reflex with practice. The steps are simple but take deliberate repetition.

The base-rate discipline

Ask the base-rate question first

Before evaluating any specific case, ask: "How often does this kind of thing happen in general?" Make yourself answer with a number, even a rough one. The act of estimating pulls the base rate into the room.

Pick the narrowest reference class with real data

Zoom in from "all X" to "X in this situation" — but stop zooming when the sample size gets too small to trust. You want specificity without collapsing into anecdote.

Consider the specific evidence second

Now, and only now, ask how much this particular case deviates from the baseline. Is the evidence strong enough to genuinely shift the prior? Usually it isn't.

Be suspicious of vivid stories

The more compelling the narrative in front of you, the more alert you should be to base-rate neglect. Charisma, detail, and confidence are not evidence. They're the shape of the trap.

Anchor, then adjust

Start with the base rate as your estimate. Then adjust — typically less than you think — based on what makes this case different. The base rate is the anchor; the evidence is the nudge. Not the other way around.

The principle, restated

Before you judge a specific case, find out how often cases like it go the way you're predicting. That number is your anchor. Everything else — the story, the detail, the feeling — is adjustment around it, and usually smaller adjustment than you think.

Base rates are boring. That's part of why they work. They don't tell you an exciting story about the case in front of you; they tell you what usually happens, which is almost always less dramatic than the specific pitch. The librarian is probably a farmer. The positive test is probably a false positive. The startup will probably fail. The project will probably be late. The fund manager will probably underperform. Not always — sometimes the specific evidence really does override the prior. But far more often than our intuitions suggest, the boring number is right, and the exciting story is what's fooling us.

Training yourself to reach for the base rate first is one of the cheapest and most durable upgrades you can make to your thinking. It costs nothing. It requires no tools. And it gets you closer, on average, to the truth than whatever your gut was about to say.