Blog

notes on machine learning, math, and things I find interesting

The Fisher information matrix and the Hessian
2026-06-18 · notebook

One matrix wearing three hats: the precision of the best estimate in statistics, the metric of information geometry, and the curvature behind second-order optimizers. Built up from the Hessian of a log-likelihood — the score's two moments, the osculating circle at the MLE, the Cramér–Rao bound, the KL-divergence Hessian, the reparameterization tensor law, and the Gauss–Newton decomposition.
The information theory of matrix completion
2026-06-02 · notebook · code

Suh's completion capacity recasts matrix completion as a Shannon problem: how many entries does one observation resolve? A walk through the theory, an implementation that verifies every closed form, a GF(2) decoder that exhibits the threshold, and several verified extensions.
The many sides of PCA
2026-06-02 · notebook

One method, five derivations: best linear approximation, minimal reconstruction error, metric MDS, maximum variance with decorrelation, and the SVD. All reduce to the eigenvectors of the covariance matrix.
Maximum likelihood and maximum a posteriori
2026-06-02 · notebook

The two standard methods for estimating model parameters: definitions, worked examples, how they relate as the sample grows, and the zero-count problem that motivates smoothing.
Several views on cross-entropy
2026-05-31 · notebook

From the likelihood of a Bernoulli model, cross-entropy follows as the negative log-likelihood, and the same quantity reappears as KL divergence, code length, a proper scoring rule, and mode-covering KL. Concludes with the comparison to squared error for classification.