A Bayesian Story

When Does a Cell Decide to Divide?

No background in biology or statistics required. This deck walks through one idea at a time: how to describe a dividing cell in the language of probability, and how to ask data whether that description holds up — before any results are in.

19 slides

The Biology

One Cell, Across Generations

Put a single bacterium under a microscope and watch: it grows longer, then splits into two smaller cells. Each of those grows and splits again. Followed over time, this produces a lineage — a family tree of cells, where every branch is one cell cycle: the stretch from one cell's birth to the moment it divides.

Four numbers describe each cell cycle in the data we'll eventually fit against:

birth mass, m_b growth rate, α division time, t split ratio, f

In words: a cell is born at some mass m_b, grows at rate α, and after a duration t splits into two daughters. One daughter keeps a fraction f of the mother's final mass; the other keeps the rest, 1−f.

Generation: 0

Watch it grow, then divide at a roughly — but not exactly — fixed size. That "roughly, but not exactly" is the whole mystery this deck is about: what decides the precise moment of division?

a simulated cell growing, pinching, and splitting into two unequal daughters (f and 1−f)

The Open Question

What Actually Triggers Division?

A cell has no clock. So how does it "know" when to divide? One long-standing idea, the accumulator hypothesis, says: some internal quantity — call it protein, or just "stuff" — builds up as the cell grows. Once enough of it has piled up, the odds of dividing shoot up.

Since a cell's length grows roughly exponentially, the natural stand-in for "how much has piled up" is simply how much bigger the cell has gotten since it was born:

$$p(t) = m_b\left(e^{\alpha t} - 1\right)$$

m_b = mass at birth · α = growth rate · t = time since birth · e ≈ 2.718

Read it piece by piece: e^αt is "how many times bigger the cell has gotten" after time t. Subtract the 1 it started with, and multiply by its birth mass, and p(t) is just the raw amount of new growth — nothing exotic, just size added since birth.

Birth Mass (m_b) 1.0

Growth Rate (α) 0.40

the axis rescales as you move the sliders — nothing gets clipped

Survival Analysis Basics

The Language of "Not Yet"

To turn "risk of dividing" into math, we borrow a toolkit built for any "time until an event" question. Two functions matter:

Survival, S(t): the probability the cell has not yet divided by time t. It starts at S(0) = 1 and falls toward 0.

Hazard, h(t): the risk of dividing right now, given it hasn't divided yet. Not "what fraction of all cells ever divide at time t" — but "of the cells still standing at time t, what fraction divide in the next instant."

$$S(t) = \exp\!\left(-\int_0^t h(s)\,ds\right)$$

exp(·) undoes a natural logarithm · the integral just adds up hazard over every instant from 0 to t

Know the hazard, and survival follows automatically. Multiply them together and you get something new — the division-time density, written f_div(t). It answers: "how probable was it, exactly, that this cell divided at this precise moment t?"

$$f_{div}(t) = h(t)\,S(t)$$

Remember the name f_div(t) — it reappears, unchanged, a few slides from now as the actual number we compute for every real cell in the dataset.

multiply the top two curves pointwise and you get the bump on the bottom — f_div(t)

The Modified Hazard Model

Three Knobs Set the Risk Curve

Our hypothesis (slide 3) says the real driver isn't time directly — it's the accumulated growth, p. So we write survival as a function of p instead, using a flexible curve shape that starts near 1 (safe) and drops toward 0 (division near-certain) once p passes some threshold:

$$S(p) = \Big(1 + (p / \textcolor{#C85A32}{u})^{\textcolor{#C85A32}{v}}\Big)^{-\textcolor{#C85A32}{\omega_2}}$$

Survival alone only tells half the story — the coral dashed curve in the figure is the hazard, h(p): the instantaneous risk of dividing at accumulated growth p, given the cell hasn't divided yet. It falls straight out of S(p) by differentiation, and it's exactly what powers the "risk right now" curve from a few slides back:

$$h(p) = \frac{\textcolor{#C85A32}{\omega_2}\,\textcolor{#C85A32}{v}}{\textcolor{#C85A32}{u}}\cdot\frac{(p/u)^{v-1}}{1+(p/u)^v}$$

same three knobs, no new parameters — h(p) is just S(p)'s built-in rate of collapse

u (scale): roughly, how much growth is typically needed before division becomes likely.
v (sharpness): how abruptly risk switches on near that point — big v means "safe... then division!" small v means a slow ramp.
ω₂ (flexibility): a second knob that bends the curve independently of v, so the model isn't locked into one fixed shape.

All three are just numbers we don't know yet — the whole second half of this deck is about how we estimate them from data.

Scale (u) 1.5

Sharpness (v) 6.0

Flexibility (ω₂) 1.0

both axes re-fit themselves to whatever the sliders produce, so the curve never gets chopped off

Natural Variation

No Two Cells Are Exactly Alike

So far we've built a story for when a cell divides. But two more numbers from slide 2 — growth rate α and split ratio f — also wobble from cell to cell, even in an identical environment. We need a probability distribution for each: a shape describing which values are common and which are rare.

Growth rate α is always positive and tends to cluster around a typical value with a long right tail. A Gamma distribution — a flexible shape used for exactly this kind of "positive, skewed" quantity — fits naturally.

Split ratio f is always between 0 and 1 (a daughter can't get more than 100% or less than 0% of the mother's mass). A Beta distribution, built specifically to live on the interval from 0 to 1, is the natural fit.

$$\alpha \sim \text{Gamma}(\mu_\alpha, \sigma_\alpha) \qquad\qquad f \sim \text{Beta}(\mu_f, \sigma_f)$$

The "~" reads as "is drawn from." Both distributions are described here by an everyday mean (μ) and standard deviation (σ) so the parameters stay easy to reason about.

two very different shapes for two very different kinds of number

The Metric of Fit

What Does "Likely" Actually Mean?

Time for a detour. Flip a coin 10 times and see: H H T H H H T H H H — 8 heads, 2 tails. Pick any candidate value for θ, the coin's true chance of heads, and ask: "if θ were the truth, how probable was exactly this sequence?"

$$L(\theta) = \theta^{8}(1-\theta)^{2}$$

That number, computed for a given θ, is the likelihood. It is not the probability that θ is correct — it's a score for how well a candidate θ explains data we've already fixed and observed. Slide the control below and watch which values score highest.

Candidate Bias (θ) 0.50

Likelihood L(0.50) = 0.00098

the peak sits at θ = 0.8 — exactly the observed fraction of heads

Scaling Up

The Same Idea, Now on Real Cells

A coin had one unknown (θ) and 10 data points. Our cell model has seven unknowns — u, v, ω₂, μ_α, σ_α, μ_f, σ_f — and thousands of observed cell cycles. The mechanics don't change at all, only the scale.

For one cell cycle, three separate scores get multiplied together: how probable its division time was (that's f_div(t) from slide 4), how probable its growth rate was (the Gamma), and how probable its split ratio was (the Beta):

$$L(\text{cell}) = f_{div}(t \mid m_b, \alpha) \times \text{Gamma}(\alpha) \times \text{Beta}(f)$$

Treat every cell cycle as independent, and the likelihood of the whole dataset is just all those numbers multiplied together. Because multiplying thousands of tiny fractions causes computer underflow, we take logarithms first, turning the giant product into a sum:

$$\ln L(\text{dataset}) = \sum_i \ln L(\text{cell}_i)$$

One candidate set of seven numbers in → one single score out. That score is what the rest of this deck searches for the best (or most plausible) values of.

same logic, more unknowns, more data — nothing conceptually new

Before Looking at the Data

Priors: Setting the Boundaries First

Before any fitting happens, each of the seven unknowns gets an allowed range — broad, and shaped loosely around the data's scale, but flat inside it. Every value inside the range starts out equally plausible; nothing outside it is considered at all. This starting assumption is called a prior.

$$\text{posterior} \;\propto\; \text{likelihood} \times \text{prior}$$

∝ means "proportional to" — the two sides match up to an overall scaling constant we never need to compute

Here the prior is mainly just a fence: it rules out impossible values (a negative growth rate, for instance) but rarely changes which candidate looks better once you're inside the fence — that ranking job belongs almost entirely to the likelihood.

Think of the prior as drawing the boundaries of a map before exploring it. It says where you're allowed to look, not which spot is best.

a flat prior: a fence, not a preference

Parameter Optimization

MLE: Find the Peak, Fast

Maximum Likelihood Estimation (MLE) searches for the single combination of seven numbers that scores highest on the likelihood. Picture searching a hilly landscape blindfolded: feel the ground's slope under your feet, and always take a step uphill. Keep going until the ground is flat in every direction — that's a peak.

A landscape can have more than one hill. A blind climber can get stuck on a small local hill and never find the tallest one nearby.

To avoid that trap, the search is run from several different starting points, and only the highest peak found across all of them is kept.

MLE hands back exactly one best guess, with no sense of how confident to be in it. Its real job here is narrower: give the next technique, MCMC, a sensible place to start looking.

one guess, chosen from several attempts

Markov Chain Monte Carlo

MCMC: Map the Whole Range, Not Just the Peak

MLE only finds one point. But we don't just want a best guess — we want to know how confident to be in it. MCMC (Markov Chain Monte Carlo) answers that by building up a whole cloud of plausible parameter combinations instead of a single one.

The idea: propose a small random jump from the current guess to a nearby one, and score it with likelihood × prior. A jump to a better spot is (almost) always taken. A jump to a worse spot is taken sometimes, in proportion to how much worse it is — never simply thrown away. Do this thousands of times, and the trail of visited points ends up dense exactly where the data says plausibility is high, and sparse where it's low.

Steps: 0 | Acceptance Rate: — This toy walker uses plain accept/reject on random jumps. The real fit instead uses a library called emcee, which runs 40 walkers together and lets each one's jump be informed by where the others currently sit.

click "Step" a few times — accepted jumps in teal, rejected attempts in coral

Making the Walk Trustworthy

Three Practical Safeguards

A random walk like this needs some quality control before its output can be trusted as a fair map of plausibility. Three checks matter most:

1. Burn-in

The first stretch of steps is still shaking off wherever the walk started. Like letting your eyes adjust before really looking around a dark room — those early steps are run, then thrown away entirely.

2. Production run

Once settled, every step from here on is kept. Across 40 parallel walkers running for 3,000 steps each, that's 120,000 recorded parameter combinations.

3. Thinning

Consecutive steps resemble each other, like footprints close together. Keeping only every 5th step trims some of that redundancy before summarizing.

The Posterior Distribution

Turning a Cloud of Points into Answers

After the walk finishes, we have a giant cloud of recorded points in the seven-dimensional parameter space. Look at just one parameter's values across that whole cloud — ignore the other six — and its histogram is the answer for that parameter: the posterior distribution, meaning "what we believe about this parameter after (post) seeing the data."

From that histogram, two summaries matter most:

The median — the middle value — becomes the single best point estimate.

The 95% credible interval — the range between the 2.5th and 97.5th percentile of all recorded values — supports a direct statement: "given this model and this data, there's a 95% chance the true value falls in this range."

a histogram of one parameter, taken across every recorded sample

Looking Ahead

The Story So Far, and What's Next

We hypothesized an accumulator, wrote it as a survival curve, gave the other traits their own distributions, learned what "likelihood" means from a coin, set boundaries with a prior, and built two ways to search the resulting seven-dimensional space — one for a quick best guess, one for the full picture of uncertainty.

Next: running the real dataset through this pipeline, checking whether the fitted model's simulated lineages actually resemble the real ones, and reporting what the posterior says about u, v, ω₂, and the rest.

Results 1 / 4

Did the Walk Work?

We unleashed 40 walkers for 3,000 steps each, discarding the first 1,000 as burn-in. We did this across 9 different nutrient environments (sugars).

Looking at the Glucose (Glu) dataset as an example: the walkers accepted proposed jumps ~46.5% of the time — right in the sweet spot between getting stuck (too low) and aimless wandering (too high).

The "trace plot" on the right shows the paths of all 40 walkers over time. We want to see a dense, overlapping "fuzzy caterpillar" with no single walker drifting away from the pack. This confirms they have successfully mapped the core plausibility zone.

Trace plot for Glucose showing dense overlapping paths

Trace Plot for Glucose: all 7 parameters show excellent mixing and stability.

Results 2 / 4

The 7-Dimensional Answer

The "Corner Plot" projects our 7D cloud of samples down into 2D slices. The diagonal shows the final posterior histogram for each parameter. The off-diagonal contours show how parameters correlate.

Notice the tilted egg shapes? For example, u and v are strongly negatively correlated. If the model assumes a slightly higher growth threshold (higher u), it must compensate by making the threshold sharper (higher v) to still fit the data well.

Glucose (Glu) Medians & 95% CIs (Example)

Scale (u): 1.53 [1.51, 1.55]
Sharpness (v): 8.37 [8.06, 8.69]
Flexibility (ω₂): 0.61 [0.57, 0.65]

Corner plot for Glucose. The dark center contours are the most plausible regions.

Results 3 / 4

Do the Virtual Cells Look Real?

We take the best-fit parameters and run the lineage simulator for thousands of generations. We then overlay the simulated histograms (orange) on top of the real biological data (blue). If our model is a good representation of biology, the shapes should overlap almost perfectly.

Simulated (orange) vs Real (blue) Distributions (Ace)

Empirical vs Model Survival Curve (Ace)

Simulated (orange) vs Real (blue) Distributions (DMa)

Empirical vs Model Survival Curve (DMa)

Simulated (orange) vs Real (blue) Distributions (Glucose)

Empirical vs Model Survival Curve (Glucose)

Simulated (orange) vs Real (blue) Distributions (Glycerol)

Empirical vs Model Survival Curve (Glycerol)

Simulated (orange) vs Real (blue) Distributions (Lactose)

Empirical vs Model Survival Curve (Lactose)

Simulated (orange) vs Real (blue) Distributions (Pyruvate)

Empirical vs Model Survival Curve (Pyruvate)

Simulated (orange) vs Real (blue) Distributions (Rhamnose)

Empirical vs Model Survival Curve (Rhamnose)

Simulated (orange) vs Real (blue) Distributions (Sucrose)

Empirical vs Model Survival Curve (Sucrose)

Simulated (orange) vs Real (blue) Distributions (Xylose)

Empirical vs Model Survival Curve (Xylose)

Results 4 / 4

How Does the Environment Shift the Decision?

We ran this entire process for 9 different carbon sources. By plotting the peak (mode) values of the shape parameters against each other, we see how the cell's underlying logic adapts to nutrient quality.

Scatter plots comparing u, v, and omega2 across 9 sugars

Each dot is the best-fit model for a different nutrient environment.

There is a clear structure here. Sugars that support very fast growth (like Glucose and Sucrose) tend to cluster with lower threshold scales (u) and higher flexibility (ω₂). Cells in slower, poorer environments (like Rhamnose or Lactose) push the threshold u higher, meaning they demand more relative accumulated growth before committing to division.

Results Extension

MCMC Autocorrelation

The autocorrelation function (ACF) measures how much a step in our MCMC chain depends on the steps before it. High autocorrelation means the chain is slow to explore the parameter space because consecutive proposals are closely related.

For a reliable MCMC run, the correlation should decay toward zero as the lag increases. A rapid drop indicates that the chain is collecting independent samples from the target posterior distribution.

Evaluating the Lag

A fast decay suggests a higher effective sample size (ESS), meaning the posterior estimates are based on well-mixed, independent chains.

Autocorrelation plot for Glucose. Fast decay shows effective convergence.