Probability Lab · Module

Probability Academy

Six interactive lessons

A self-contained tour through the foundations of probability — written for the curious reader, not the math major. Each lesson includes a working live demonstration that you can run in your browser. Read in order or jump to whatever interests you.

Lesson 1 of 6

The Law of Large Numbers

Why averages converge — slowly, inevitably

Roll a six-sided die once. The average value rolled? Could be anything from 1 to 6. Roll it ten times. Average is closer to 3.5, but maybe still 2.7 or 4.4. Roll it ten thousand times. Average will be very close to 3.5. This is the Law of Large Numbers.

Formally: as the number of independent trials of a random experiment grows, the average of the observed outcomes converges to the expected value with probability one. The longer you sample, the smaller the gap.

lim(n→∞) (X₁ + X₂ + ... + Xₙ) / n = E[X]

The crucial subtlety: the Law promises convergence in relative frequency, but not in absolute count. A truly fair coin can produce 5,100 heads vs. 4,900 tails after 10,000 flips — that's a difference of 200, but a relative deviation of only 2%. Run 10 million flips and the relative deviation shrinks to under 0.05%, even though the absolute difference might be 10,000.

Live demo — die-roll averages

Each click rolls another batch and tracks the running average vs. the theoretical mean of 3.5.

rolls: 0 · running avg: — · drift from 3.5: —

Practical takeaway A casino doesn't need any individual game to favor it strongly. A 1% house edge plus enough volume guarantees that the casino's running average converges to "we win." Volume is what makes the Law of Large Numbers a business model.

Lesson 2 of 6

Monte Carlo simulation

Solving math problems by rolling dice

Most probabilities cannot be computed by hand once a system gets complex. The probability that exactly two specific numbers come up in a 5-from-50 lottery? Tractable. The probability of a portfolio surviving a 30-year retirement under stochastic returns? Closed-form formula doesn't exist.

The Monte Carlo method, named after the casino, sidesteps the analytical impossibility by simulation: generate thousands or millions of random samples that mirror the system, count the outcomes, divide by the trial count. If you generate enough samples, the Law of Large Numbers tells you the empirical frequency converges to the true probability.

Estimating π by Monte Carlo is the classic introduction. Pick random points in a unit square. Count the fraction inside the inscribed quarter-circle. Multiply by 4. That number converges to π.

Live demo — Monte Carlo π estimator

points: 0 · inside: 0 · π estimate: — · error: —

Why this matters Almost every modern simulator on this Probability Lab uses Monte Carlo. The Eurojackpot engine running right now is a Monte Carlo simulator: 18+ billion columns generated, then the empirical frequency of each outcome is tabulated and compared to theoretical predictions.

Lesson 3 of 6

The gambler's fallacy

Why "due for a win" is mathematically wrong

You flip a fair coin five times in a row. Heads, heads, heads, heads, heads. What's the probability the next flip is tails?

Most people feel tails is more likely. The intuition: things must average out, the coin owes us. This is the gambler's fallacy. The actual probability of tails on flip six remains exactly 50%. The coin has no memory. Each flip is independent. Past flips contain no information about future flips.

P(flipₙ = T | flips₁..ₙ₋₁) = P(flipₙ = T) = 0.5

The Law of Large Numbers does not say that streaks correct themselves. It says the long-run average converges. With ten more flips you'll likely accumulate roughly five heads and five tails, so the cumulative ratio drifts back toward 50/50 — not because nature is "owed" tails, but because new fair flips drown out the old streak.

Common misconception "After 8 reds in a row at the roulette wheel, black is more likely to come up next." False. The wheel doesn't remember its history. Each spin is a fresh independent trial.

For a live interactive demonstration of the gambler's fallacy with thousands of flips, see the Human Bias Lab.

Lesson 4 of 6

Bayes' theorem

Updating beliefs in light of evidence

You take a medical test for a rare disease. The test is 99% accurate. Your test comes back positive. What's the probability you actually have the disease?

Most people answer "99%" or "around there." The right answer, in many realistic settings, is closer to 9%. The shock comes from forgetting one piece: the disease is rare. Most positives are false positives.

Bayes' theorem makes this precise:

P(disease | positive) = P(positive | disease) × P(disease) ÷ P(positive)

Plug in: 1% population has disease (P(disease) = 0.01); test catches 99% of true cases (P(positive | disease) = 0.99); but 1% of healthy people also test positive. Of every 10,000 people: 99 true positives + 99 false positives = 198 positives total. Only 99 of those 198 are sick. Probability you're sick given a positive test: 99 / 198 ≈ 50%.

Vary the disease prevalence and the answer shifts dramatically. This is the heart of Bayesian reasoning: a piece of evidence (positive test) updates the prior (your starting belief about probability of disease) by a factor that depends on how informative the evidence really is.

Live demo — Bayesian disease test

Prevalence: 1.00%

Test sensitivity: 99.0%

Test specificity: 99.0%

→

Lesson 5 of 6

Combinatorics & the lottery

Why your odds are what they are

The Eurojackpot drawing picks 5 numbers from 1–50 (the main pool) and 2 numbers from 1–12 (the euro pool). How many possible combinations are there?

For the main pool, we want all subsets of size 5 from 50 numbers, with order not mattering. The formula:

C(n, k) = n! ÷ ( k! × (n − k)! )

So C(50, 5) = 50! ÷ (5! × 45!) = 2,118,760. For the euros: C(12, 2) = 66. Total combinations: 2,118,760 × 66 = 139,838,160.

That's the denominator. The probability of hitting the jackpot 5+2 with one ticket is 1 in 139,838,160. To put that in perspective: if you played every single Friday for an entire human lifetime (80 years × 52 weeks = 4,160 tickets), your probability of winning would still be 0.003%.

Other tier probabilities follow from the same combinatorics — the closer to the full match, the smaller the count of "winning" combinations relative to the 139.8 million total.

Concrete You are 50× more likely to be struck by lightning twice in your life than to win the Eurojackpot with one ticket.

Lesson 6 of 6

Entropy & information

Measuring how much randomness "is there"

Two coin flips. Sequence A: HHHHHHHH. Sequence B: HTHTHTHT. Sequence C: HHTTHTHH. Which one carries the most information?

Counterintuitively, the answer is C. Sequences A and B are predictable: once you see a few characters, you know all the rest. Sequence C, being more irregular, requires you to learn each new character — there's no compression. Information equals surprise.

Claude Shannon formalized this in 1948. The entropy of a probability distribution measures the average surprise per outcome:

H(X) = − Σ P(x) log₂ P(x)

For a fair coin, H = −0.5×log₂(0.5) − 0.5×log₂(0.5) = 1 bit. Maximum entropy for 2 outcomes. For a biased coin that lands heads 90% of the time, H ≈ 0.469 bits. Less surprise per flip; more compressible; less random.

Why it matters Cryptography depends on having truly high-entropy random sources. The continuous simulator on this Probability Lab uses crypto.getRandomValues(), which the browser backs with hardware-quality entropy — closer to the theoretical maximum than ordinary Math.random().

And entropy gives a deep answer to the question this whole Lab is built around: what does it mean for a process to be truly random? The answer, from Shannon: a process is maximally random when its outcomes carry maximum entropy — when no shortcut, no pattern, no compression can predict the next outcome from the previous ones. Anything else is bias.

end of curriculum · explore the live Probability Lab to see these principles operating on billions of samples