Probabilistic Thinking

Bayes' Rule

The mathematics of changing your mind. Update beliefs by exactly how much new evidence justifies — no more, no less. The single most useful equation in all of probabilistic thinking.

16min readTopicProbability & beliefLevelIntermediate

In the 1740s, in a small parsonage in Tunbridge Wells, England, a non-conformist minister named Thomas Bayes was working on a problem that had nothing to do with his sermons. He was trying to figure out how to reason backwards. Most of probability, in his time, was concerned with reasoning forward — given a fair coin, what's the probability of three heads in a row? Bayes was interested in the inverse problem: given that you observed three heads in a row, what's the probability the coin is fair? It turned out to be a much harder question, and the answer he worked out — published only after his death in 1761 — would, two and a half centuries later, become the mathematical foundation for everything from medical diagnosis to spam filters to the AI systems writing this sentence.

The principle Bayes worked out is now called Bayes' Rule, and at its heart it answers one question: how much should new evidence change my belief? That's it. The whole apparatus — the formula, the philosophy, the multi-billion-dollar industry of statistical software built on it — exists to answer that one question precisely. Not approximately. Not directionally. By exactly the right amount, given what you already knew and what the evidence is telling you.

Most people reason about evidence in a way that systematically violates Bayes' Rule. They get extreme — flipping from "definitely not" to "definitely yes" the moment evidence arrives. Or they barely move at all, treating evidence as decorative when it should be transformative. Or they double-count evidence they already considered, or ignore the prior probability entirely. Bayes' Rule corrects all of this. It tells you the exact amount your belief should shift, and why.

This essay is about that rule: where it came from, how it works, why it's the engine behind a remarkable range of modern thought, and where it can go badly wrong in human hands.

A puzzle that breaks intuition

Before we get to the math, let's start with a problem that almost everyone — including most doctors who've been asked it — gets dramatically wrong. The problem is famous in cognitive psychology, and it works because it isolates the exact failure of reasoning that Bayes' Rule fixes.

The puzzle

The medical test problem

A disease affects 1 in 1,000 people. There's a test for it that's 99% accurate: if you have the disease, the test is 99% likely to be positive. If you don't have the disease, the test is 99% likely to be negative.

You take the test. It comes back positive. What's the probability you actually have the disease?

Most people answer somewhere around 99%. The test is 99% accurate, after all. A 1996 study by Gerd Gigerenzer found that even physicians, given essentially this exact problem, gave answers averaging around 80%. The actual answer is closer to 9%. Not 99%. Not 80%. About one in eleven.

This is so counterintuitive that most people refuse to accept it on first hearing. The math is correct, but it feels wrong, and the gap between what your intuition says and what the answer actually is — between 99% and 9% — is the gap that Bayes' Rule is designed to close. To see why the answer is 9%, we need to look at the situation in a way that makes the structure visible.

The natural-frequency picture

The clearest way to see what's going on — and the way Gigerenzer found people could solve these problems correctly — is to stop thinking in percentages and start thinking in actual people. Imagine 1,000 people, all of whom take the test. Then walk through what happens to them, one branch at a time.

The 99% test accuracy is real — it just gets swamped by the 999:1 ratio of healthy people to sick ones. Most positives end up being false alarms not because the test is bad, but because the disease is rare.

Now the answer is visible. Among 1,000 people, only 1 actually has the disease — and the test correctly catches that person 99% of the time, so we'll round to 1 true positive. The other 999 people are healthy, and although the test is 99% accurate on them, that 1% false-positive rate applied to 999 people means about 10 of them will get a wrong positive result. So when you add it all up: about 11 positive tests in total, only 1 of which is real. That's roughly 1 out of 11, or 9%.

The intuition that fails here is the failure to keep both numbers in mind at once: how rare the disease is (the prior) and how accurate the test is (the evidence). Most people latch onto the second number and forget about the first, but the prior is doing enormous work — when a condition is rare, even an accurate test produces mostly false positives, because there are so many more healthy people available to be wrongly flagged than sick people available to be correctly flagged.

The answer

The probability you actually have the disease is

≈ 9%

Not 99%. The test is excellent. The disease is just very rare, and rarity matters.

This kind of reasoning — where you start with a base rate, observe some evidence, and update your belief by exactly the amount the evidence justifies — is Bayes' Rule. The natural-frequency tree above is just a visual way of computing it. Once you can see what's going on with people-counts, you can reach for the formula whenever you need numerical precision.

The formula, finally

The actual formula looks intimidating in standard notation, but each piece corresponds to something we just walked through.

Bayes' Rule

Read in plain English, the formula says: your updated belief in a hypothesis (H), given new evidence (E), equals how well that hypothesis predicts the evidence, times how plausible the hypothesis was beforehand, divided by how often the evidence happens at all. Each piece does specific work.

The prior, P(H), is your belief before the evidence arrives — the base rate, the starting probability, what you would have said yesterday. In the medical test problem, this is the 1-in-1,000 rate of the disease in the population. The likelihood, P(E|H), is how well the hypothesis predicts the evidence — the test's sensitivity, the 99% true-positive rate. The marginal, P(E), is how often the evidence shows up overall, regardless of whether the hypothesis is true — in our case, all positive tests, both real and false. And the posterior, P(H|E), is what you actually want: your updated belief after the evidence has done its work.

Plug in the medical-test numbers — prior 0.001, likelihood 0.99, marginal 0.011 (the proportion of all 1,000 people who get any positive result, real or false) — and you get a posterior of about 0.09, or 9%. The same answer the natural-frequency tree gave us. The formula is just the visual reasoning, compressed into algebra.

A useful intuition

Bayes' Rule says: your new belief is your old belief, scaled by how surprising the evidence is. If the evidence was already expected even without the hypothesis, the rule barely updates you. If the evidence would have been very surprising without the hypothesis, the rule updates you a lot. The "ratio of surprises" is the engine.

Bayes, Laplace, and the long road

Thomas Bayes himself never published the rule that bears his name. He worked on it in the 1740s and 1750s, set it aside in unfinished form, and died in 1761 without seeing it in print. His friend Richard Price found the manuscript among Bayes's papers, edited it, added a worked example, and submitted it to the Royal Society of London, where it was published in 1763 under the title "An Essay Towards Solving a Problem in the Doctrine of Chances." It received polite but limited attention, and might have faded into obscurity entirely.

What rescued it was the work of Pierre-Simon Laplace, the French mathematician who, working independently a few decades later, reinvented and dramatically extended the same principle. Laplace's Théorie analytique des probabilités (1812) is essentially the founding document of modern probability theory, and it generalized Bayes's specific result into a general framework for reasoning under uncertainty. Laplace used it to solve problems in astronomy, demographics, and jurisprudence — including, famously, calculating how confident we should be that the sun will rise tomorrow given that we've seen it rise every previous day. (His answer, accounting for finite evidence, was reassuringly close to certain but importantly not exactly so.)

For most of the 19th and early 20th centuries, Bayesian reasoning was sidelined within statistics, partly because it required specifying a "prior" probability that was sometimes hard to justify objectively. The dominant school of statistical thought, the "frequentist" school led by Ronald Fisher and Jerzy Neyman, deliberately avoided priors and worked only with evidence-based calculations. This worked well for many scientific problems but had clear limitations — it couldn't really tell you the probability that a particular hypothesis was true, only the probability of seeing your data if it were.

Bayes came back, and came back triumphantly, for two reasons. First, computers got fast enough in the late 20th century to handle the calculations Bayesian reasoning typically requires, which had been prohibitive by hand. Second, the explosion of machine learning made Bayesian thinking suddenly central to the most important technical projects of the era — spam filtering, recommendation systems, natural language processing, and eventually the large language models that now shape the information environment of the entire planet. The non-conformist minister whose unfinished essay sat in a drawer for two centuries had, it turned out, written the operating manual for a great deal of modern AI.

Updating: how evidence accumulates

One of the most useful properties of Bayes' Rule is that it tells you how to handle multiple pieces of evidence — how each new observation should change your beliefs. The answer is elegant: the posterior from one round becomes the prior for the next. You can update again. And again. Each new piece of evidence shifts your belief by exactly the amount it deserves, and the shifts compound coherently over time.

One ambiguous test moves you a little. Three independent confirmations move you a lot. Bayes' Rule tells you exactly how far each step takes you.

This is a profoundly important property. It means you don't have to evaluate all evidence at once. You can take in evidence as it arrives, update accordingly, and arrive at exactly the same final belief you'd have reached if you had seen all the evidence simultaneously. The rule is, in a deep sense, the right way to handle a stream of incoming information — and it's why Bayesian methods dominate fields like sequential medical testing, real-time AI systems, and any context where evidence arrives over time rather than all at once.

It also produces a nice intuition about how skepticism should work. If you start with a strong prior (say, 99.9% confident that a perpetual motion machine is impossible), you need a lot of evidence to budge that belief — but Bayes' Rule says you should budge it if the evidence is strong enough. Conversely, if you start with a weak prior (say, 50/50 on whether your team will hit its quarterly target), even modest evidence should swing your belief substantially. The rule tells you exactly how much each piece of evidence is worth, given where you started.

Where Bayes lives today

Bayes' Rule isn't just a mathematical curiosity — it's the silent infrastructure underneath enormous swaths of modern technology. Most of the time you won't notice it. But the moment you do, you start seeing it everywhere.

Medicine

Every medical test you've ever taken was, implicitly, a Bayesian update. Doctors who reason well don't just look at test results — they combine the result with the prior probability that you have the condition (your symptoms, your age, your risk factors), and arrive at a posterior probability that determines what to do next. This is why the same positive test result can mean very different things in different patients: a 25-year-old with a positive cardiac test usually doesn't have heart disease (prior too low), while a 70-year-old with chest pain and the same positive result usually does (prior much higher). Same evidence, different priors, different posteriors.

Spam filters

Email spam filters were, for many years, the most widely-deployed Bayesian system on Earth. The classic algorithm — known as a "naive Bayes classifier" — works by calculating, for each incoming email, the probability that it's spam given its words, using the words' frequencies in past spam vs. legitimate email as evidence. The algorithm gets a tiny update from each word and combines them all using Bayes' Rule. It's fast, it works remarkably well, and it's the reason most of us spend less time deleting Viagra ads than people did in 2003.

Search and AI

Modern search engines, recommendation systems, and AI models are built on Bayesian foundations more thoroughly than most users realize. When Netflix recommends a movie, it's combining its prior beliefs about your tastes with new evidence (what you watched yesterday) using methods derived from Bayes' Rule. When a self-driving car sees a partial silhouette in the dark, it's updating prior probabilities about object types based on the partial evidence. Large language models are trained using objectives that are deeply Bayesian in structure — they're learning probability distributions over language given prior context.

Courtrooms

Bayesian reasoning has had a more controversial history in courts of law. Some legal scholars have argued that juries should be taught Bayes' Rule explicitly, since evaluating evidence against a presumption of innocence is a textbook Bayesian update. Others worry that explicit math in jury rooms creates more problems than it solves, including spurious precision and the seductive "prosecutor's fallacy" — where a small probability of false positive evidence gets confused with a small probability of innocence. Both critiques have merit. Bayes is the right framework, but applying it well requires care.

A famous misuse

The prosecutor's fallacy

A prosecutor argues: "The DNA match between the defendant and the crime scene has a 1-in-a-million chance of being a coincidence. Therefore, the defendant is almost certainly guilty." This sounds compelling — but it's a Bayesian error.

The 1-in-a-million figure is P(DNA match | innocent), the probability of a match if the defendant is innocent. What the jury actually wants to know is P(innocent | DNA match), which is different. To get the second from the first, you need a prior — the base rate for guilt before the DNA evidence. If the defendant was selected from a database of millions, the prior is low, and even a one-in-a-million match could leave reasonable doubt. The fallacy is using P(E|H) as if it were P(H|E), which is exactly the error Bayes' Rule was invented to correct.

How humans systematically get Bayes wrong

The medical-test puzzle at the start of this essay isn't just a curiosity. It's pointing at something deep: humans, including very smart, very educated humans, reason about probability in ways that systematically violate Bayes' Rule. Cognitive psychologists have spent fifty years cataloging the specific failure modes, and a few of them are worth knowing about.

Base rate neglect. The most common failure — the one we've already met — is ignoring the prior probability and reasoning entirely from the evidence. People hear "99% accurate test" and immediately think "99% likely to be true," forgetting that what the result means depends on how rare the condition is. This is the failure that Bayes' Rule corrects most directly, and it's so widespread that even doctors trained in statistics regularly fall into it.

Confirmation bias as a Bayesian error. Confirmation bias — the tendency to seek and overweight evidence that confirms what you already believe — is, from a Bayesian perspective, a specific malfunction. A perfect Bayesian reasoner gives equal weight to confirming and disconfirming evidence; what matters is whether the evidence is more likely under the hypothesis than against it. Humans systematically tilt the scale, treating ambiguous evidence as supportive when it should be neutral, and discounting strong contrary evidence as somehow not counting. This is essentially Bayes' Rule with the likelihoods rigged.

Anchoring on first impressions. Once people form an initial belief, they tend to update too little in light of new evidence — sticking close to where they started rather than moving as far as the evidence justifies. This is the opposite failure of "extreme updating," but it's just as common, and it's especially pronounced when the initial belief was formed quickly or under emotional pressure. Bayes' Rule prescribes a specific amount of movement; humans habitually fall short of it.

Failure to update when surprised. A subtler error is the tendency to dismiss surprising evidence as a fluke instead of treating it as a signal. When evidence arrives that doesn't fit our model, the Bayesian response is to update the model. The human response is often to dismiss the evidence ("that study must be wrong"). Sometimes the evidence really is a fluke; often it's the system telling us our prior was miscalibrated, and we miss the message.

Bayes' Rule is what good thinking looks like in math. Most of human cognition is a series of small, predictable departures from it — and the more you can recognize the departures, the more accurate your beliefs become.

Where Bayes' Rule fails

For all its power, Bayes' Rule has real limitations, and pretending otherwise is its own kind of error. The math is exact, but the inputs to the math are often anything but.

The prior problem

The most famous critique of Bayesian reasoning is the difficulty of specifying priors. Where does the prior probability come from? Sometimes it's a real frequency (1 in 1,000 people have this disease). But often it's a subjective estimate, and different reasonable people will pick different priors, and the math will give them different posteriors even from the same evidence. This isn't a bug of Bayes — it's just an honest disclosure that probabilistic reasoning depends on what you bring to the table. But it's a real limit on how "objective" any Bayesian calculation can be when priors are contested.

Made-up numbers

The seductive thing about Bayes' Rule is that you can plug numbers in and get an answer to any precision you want. The dangerous thing is that those numbers are often guesses dressed up as data. A confident-looking calculation that begins "let's say the prior is 5%" is doing real work only if 5% is grounded in something more than vibes. Many published Bayesian arguments — particularly in popular writing about AI risk, religious epistemology, or speculative scenarios — produce specific-sounding posteriors built on entirely made-up priors and likelihoods. The math is correct; the inputs are fiction; the posterior inherits the fiction.

Computational explosion

For complex problems with many variables, exact Bayesian calculation becomes computationally intractable. This is why most modern Bayesian methods in practice are approximations — Markov chain Monte Carlo, variational inference, particle filters — that give reasonable answers without solving the full equation. These approximations work well most of the time, but they introduce their own errors, and a Bayesian system in practice is always a compromise between mathematical purity and what the available compute can handle.

When priors are wrong

Finally — and most importantly — Bayes' Rule will faithfully produce the wrong posterior if your prior is wrong. The math doesn't fix bad priors; it propagates them. If you start with a strongly miscalibrated belief about how the world works, Bayes' Rule will tell you exactly how much new evidence should move you, and that exact amount will still leave you closer to wrong than right. This is why, in practice, Bayesian reasoning works best when paired with active checking of priors — looking for cases where your starting beliefs might be off, and using diverse sources of evidence rather than one channel that could be systematically biased.

How to actually use it

You don't need to do algebra to think Bayesian. The discipline is mostly about asking the right questions in the right order — and most of the work is captured by a small set of habits that, with practice, become automatic.

The Bayesian discipline

State your prior before looking at evidence

Before evaluating any new information, write down (or think clearly about) what you believed beforehand. This makes it possible to notice how much you've actually updated, and prevents the all-too-human move of pretending you always thought what the evidence is now telling you.

Ask: how likely is this evidence if I'm wrong?

This is the single most powerful Bayesian habit. Strong evidence is evidence that's much more likely under your hypothesis than against it. Evidence that would have appeared anyway, regardless of whether you're right, isn't really evidence — it's noise. The crucial check: would I have seen this even if I were mistaken?

Use natural frequencies for hard problems

When percentages and conditional probabilities get confusing, switch to actual people, actual events, actual counts. "Out of 1,000 cases, how many would have this property?" is dramatically easier to reason about than "what's the conditional probability." The math is the same; the human-readable version is easier.

Update incrementally, not all at once

Don't try to evaluate every piece of evidence simultaneously. Take them one at a time, update your belief, and use that updated belief as the prior for the next piece. The math compounds coherently, and you'll find that doing many small updates is far easier than one big one — and produces the same final answer.

Move only as far as the evidence justifies

The two failure modes are updating too little (anchoring) and updating too much (extreme reaction). Bayes prescribes the right amount, and the right amount is usually moderate. If a single piece of evidence has shifted you 80 percentage points, ask whether that's really proportional. Probably not.

Be honest about your priors

If you can't articulate where your prior came from, the calculation downstream is suspect. "I just feel like" priors aren't useless, but they should be flagged as estimates rather than data. The discipline of stating priors clearly — and acknowledging when they're guesses — keeps Bayesian thinking honest rather than letting it become a tool for laundering opinions through math.

The principle, restated

New evidence should update your belief by an amount proportional to how surprising the evidence would be if you were wrong. Not all the way, not none of the way — exactly the amount the math prescribes. Most reasoning errors come from skipping the prior, mistaking the likelihood, or refusing to update at all. Bayes' Rule is just the discipline of doing each of those steps deliberately.

Thomas Bayes died in 1761, and his unfinished essay sat among his papers for two years before anyone read it. It contained the math that would eventually become the foundation of statistical inference, machine learning, modern medicine, and a great deal of AI — though Bayes himself, working in a Tunbridge Wells parsonage, can't have imagined any of it. He was just trying to figure out how, in light of evidence, a wise person should change their mind.

That question turned out to have a precise answer. Two and a half centuries later, the answer is still the right one. Most of us still don't follow it consistently, and the gap between what Bayes says we should believe and what we actually believe is the source of most reasoning errors that feel, from the inside, like good thinking. The rule itself isn't hard to learn. The discipline of applying it — of asking, every time evidence arrives, what your prior was and how surprising the evidence would be if you were wrong — takes longer to internalize, and longer still to make automatic.

But once it is automatic, you'll notice something. The world becomes a slightly less confusing place. Surprises become more informative. Confident-sounding claims become easier to evaluate. Your beliefs stop swinging wildly with each new headline, and stop refusing to move at all in the face of accumulated evidence. They start doing what they were supposed to do all along — tracking the world, in proportion to what you've actually seen of it. That's what Bayes' Rule is for. And it's why a parish priest's quiet little theorem turned out to be one of the most important sentences ever written about how to think.