- Monoco Quant Insights
- Posts
- Beyond Kalman: A Mathematical Odyssey
Beyond Kalman: A Mathematical Odyssey
From Rockets to Robots, and Now Deep Learning—The Odyssey of the Kalman Filter
By the time Rudolf Kalman walked onto the stage in 1960, the world was in flux—technologically, politically, and mathematically. What followed was not just an invention, but an invitation to a century-long conversation about uncertainty, state, and belief.
It began, as many good ideas do, in obscurity.
Rudolf E. Kálmán, Hungarian-born and sharp-witted, wasn’t the type to shy from mathematical boldness. In the 1950s, the world of control theory was a tangled mess of ad-hoc filters, empirical tweaking, and analog intuition. Engineers needed a map, but they were walking with a compass in fog. Kalman offered a flashlight.
His 1960 paper on what would later be called the Kalman Filter opened with a deceptively simple idea: if the world is governed by linear dynamics with Gaussian noise, then we can estimate its hidden states with optimal precision. The math was elegant, recursive, and—as he would later argue—more Bayesian than many appreciated.
State Prediction:
x̂ₖ|ₖ₋₁ = A · x̂ₖ₋₁|ₖ₋₁ + B · uₖ₋₁
Pₖ|ₖ₋₁ = A · Pₖ₋₁|ₖ₋₁ · Aᵗ + Q
Measurement Update:
Kₖ = Pₖ|ₖ₋₁ · Hᵗ · (H · Pₖ|ₖ₋₁ · Hᵗ + R)⁻¹
x̂ₖ|ₖ = x̂ₖ|ₖ₋₁ + Kₖ · (yₖ - H · x̂ₖ|ₖ₋₁)
Pₖ|ₖ = (I - Kₖ · H) · Pₖ|ₖ₋₁
Here was the beauty: Kalman took in measurements yₖ, muddied by noise, and updated his belief about the underlying state xₖ. The filter was recursive—no need to store past data—and yet probabilistic, grounded in a prediction-correction cycle.
In the years that followed, Kalman filters became the unsung heroes behind Apollo navigation, missile tracking, and early AI. To many engineers, it felt like alchemy.
But like all alchemy, there were limits.
When the World Refuses to Be Linear
It wasn’t long before reality raised its unruly head. The Kalman filter presumed linear systems and Gaussian noise. But what if the system spun and danced—nonlinear, chaotic, nonlinear again?
In the 1980s, as autonomous robotics took its first awkward steps and aerospace systems grew increasingly complex, engineers began hacking at the model. Enter the Extended Kalman Filter (EKF)—a gallant, if slightly desperate, attempt to bend the linear model into nonlinear shapes.
The EKF was based on linearizing nonlinear functions using first-order Taylor expansions:
Assumes: Nonlinear dynamics, Gaussian noise
State Model:
xₖ₊₁ = f(xₖ, uₖ) + process noise
yₖ = h(xₖ) + measurement noise
Linearize with Jacobians:
Fₖ = ∂f/∂x at x̂ₖ₋₁
Hₖ = ∂h/∂x at x̂ₖ|ₖ₋₁
Apply Kalman-style prediction and update using Fₖ and Hₖ
But now, instead of working with AAA and HHH, we compute Jacobians FkF_kFk and HkH_kHk—local derivatives of our nonlinear functions. The math creaked under its own approximation, but it worked, just enough, in just enough cases.
Still, the discomfort lingered. Linearization was mathematically inelegant. Worse, it could diverge—spectacularly—if the system wandered too far from where the Taylor series held. Kalman, ever the purist, reportedly distanced himself from the EKF.
If the original filter was a jazz solo in Bayesian belief, the EKF felt like sheet music played on a broken piano.
Enter the Particles: Monte Carlo Saves the Day
Then came the 1990s, and with it, a revolution not from control theory, but from computer science.
As computing power exploded, a group of researchers—including Arnaud Doucet and Nando de Freitas—began to lean into randomness. Why force your model into linear clothes when you could simulate its evolution using particles?
The Particle Filter, or Sequential Monte Carlo method, breaks the filter wide open. Instead of a single Gaussian estimate of belief, it represents uncertainty as a cloud of weighted particles—each one a hypothesis about the true state.
Assumes: Arbitrary nonlinearities and noise
Steps:
Predict:
xₖ⁽ⁱ⁾ ~ p(xₖ | xₖ₋₁⁽ⁱ⁾, uₖ)Update weights:
wₖ⁽ⁱ⁾ ∝ p(yₖ | xₖ⁽ⁱ⁾)Resample:
Draw new particles based on weights wₖ⁽ⁱ⁾
Estimation:
x̂ₖ ≈ weighted average of particles
At each time step, the particle filter:
Propagates particles forward via the state dynamics.
Weighs them based on how well their predicted measurements match the real ones.
Resamples to focus on likely regions of the state space.
No derivatives. No Gaussian assumptions. Just pure Bayesian simulation.
The tradeoff? Computational complexity. But as CPUs grew cheaper, the elegance of approximation-by-simulation became irresistible.
The Unscented Vision: A Quiet Genius Steps In
In 1997, Simon Julier and Jeffrey Uhlmann proposed another path: what if we could capture the nonlinear transformation of a Gaussian without linearization? Their solution: the Unscented Kalman Filter (UKF).
The UKF uses a set of carefully chosen “sigma points” that capture the mean and covariance of the state. These points are propagated through the nonlinear dynamics directly—no Jacobians required.
Assumes: Nonlinear dynamics, Gaussian noise
Key Idea:
Use sigma points to represent uncertainty, not derivatives
Steps:
Generate sigma points around x̂ₖ₋₁
Propagate through nonlinear functions:
χᵢ → f(χᵢ), and γᵢ → h(χᵢ)Recalculate mean and covariance
Apply Kalman-style update using transformed sigma points
It was a moment of quiet genius. The UKF provides better accuracy than the EKF in many scenarios, with only a modest computational overhead. The math is almost poetic:
Take your belief distribution.
Summarize it using a small set of points.
Let each one feel the system's nonlinear push.
Reconstruct your new belief.
Where We Are Now: Filters in the Age of AI
Today, the landscape has fractured beautifully.
Deep learning has entered the scene, offering data-driven models of state transitions and observation functions. Recurrent neural networks now play the role of f(x)f(x)f(x), learned from data. Neural Kalman Filters and Deep Bayesian Filters try to fuse the rigors of probability with the flexibility of black-box models.
At the same time, the Ensemble Kalman Filter (EnKF) has gained popularity in fields like weather forecasting, where high-dimensional state spaces make traditional filters impractical. The EnKF blends the ideas of particle filters and Kalman filters: simulate a small ensemble of state vectors, apply updates statistically, and avoid full covariance matrices.
Assumes: High-dimensional systems, Gaussian noise
Steps:
Maintain N particles (ensemble members)
Predict each forward
Compute sample mean and covariance:
x̄ₖ = average of ensemble
Pₖ = empirical covariance of ensemble
Apply Kalman update to each particle
Each of these evolutions—EKF, UKF, Particle Filter, EnKF, Deep Filters—is not a rejection of Kalman, but a response to the same question: how do we believe, and update that belief, in a world filled with uncertainty?
A Closing Thought: Belief as Geometry
There’s something deeply human in all of this. The Kalman filter, at its heart, is about belief. About taking a step into the unknown, using what we know, updating gently with each new piece of information.
Mathematically, it’s a dance through distributions. Sometimes we stay Gaussian. Sometimes we throw particles into the void. Sometimes we learn the rules themselves. But always, we ask: what is true, given what I’ve seen?
Kalman opened the door. Others walked through, stumbled, invented, and improvised. What they found was not just better filters, but better metaphors for thought.
And so the story continues.
Model | Introduced By / Year | Handles Nonlinearity? | Assumes Gaussian Noise? | Core Idea | Pros | Cons |
---|---|---|---|---|---|---|
Kalman Filter (KF) | Rudolf E. Kálmán, 1960 | ❌ No | ✅ Yes | Recursive Bayesian update for linear systems | Optimal for linear-Gaussian systems; computationally efficient | Fails for nonlinear or non-Gaussian systems |
Extended Kalman Filter (EKF) | NASA / 1960s–1970s | ✅ Approximate (via Taylor expansion) | ✅ Yes | Linearize system via Jacobians | Works in mildly nonlinear systems | Can diverge; linearization can be fragile |
Unscented Kalman Filter (UKF) | Julier & Uhlmann, 1997 | ✅ Better approximation | ✅ Yes | Use sigma points to capture nonlinear transformations | No need for Jacobians; better accuracy than EKF | Higher computational cost than EKF |
Particle Filter (PF) | Gordon, Salmond, Smith, 1993 | ✅ Yes | ❌ No | Use particles to approximate entire belief distribution | Works with any nonlinearity or noise type | Computationally intensive; degeneracy risk |
Ensemble Kalman Filter (EnKF) | Evensen, 1994 | ✅ Statistical approximation | ✅ Yes (with some relaxations) | Use ensemble of state samples instead of full covariance | Scales to high dimensions (e.g. weather models) | Assumes Gaussianity in updates |
Neural / Learned Filters | 2010s–present (various authors) | ✅ Yes (learned from data) | ❌ No (learned noise models) | Use deep learning to model transition/observation | Highly flexible; adapts to real data | Requires large training data; hard to interpret |
References
Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems.
Transactions of the ASME–Journal of Basic Engineering, 82(1), 35–45.
[The seminal paper that introduced the Kalman Filter.]Kalman, R. E., & Bucy, R. S. (1961). New Results in Linear Filtering and Prediction Theory.
Journal of Basic Engineering, 83(3), 95–108.
[Extended the Kalman Filter framework and mathematical theory.]Jazwinski, A. H. (1970). Stochastic Processes and Filtering Theory.
Academic Press.
[A comprehensive mathematical treatment of estimation and filtering.]Maybeck, P. S. (1979). Stochastic Models, Estimation, and Control (Vol. 1).
Academic Press.
[A classic text used to teach Kalman filtering and extensions.]Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation.
IEE Proceedings F (Radar and Signal Processing), 140(2), 107–113.
[Introduced the Particle Filter to engineering audiences.]Julier, S. J., & Uhlmann, J. K. (1997). A New Extension of the Kalman Filter to Nonlinear Systems.
Proceedings of SPIE 3068: Signal Processing, Sensor Fusion, and Target Recognition VI, 182–193.
[Introduced the Unscented Kalman Filter (UKF).]Evensen, G. (1994). Sequential Data Assimilation with a Nonlinear Quasi-Geostrophic Model Using Monte Carlo Methods to Forecast Error Statistics.
Journal of Geophysical Research: Oceans, 99(C5), 10143–10162.
[Foundational paper on the Ensemble Kalman Filter.]Doucet, A., de Freitas, N., & Gordon, N. (2001). Sequential Monte Carlo Methods in Practice.
Springer.
[A widely used textbook on Particle Filtering.]Haarnoja, T., Ajay, A., Abbeel, P., & Levine, S. (2016). Backprop KF: Learning Discriminative Deterministic State Estimators.
Advances in Neural Information Processing Systems (NeurIPS).
[Early example of combining neural networks with Kalman-style filtering.]Karl, M. et al. (2017). Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.
International Conference on Learning Representations (ICLR).
[Part of the movement toward learned latent state-space models.]