Online Natural Gradient as a Kalman Filter

alt text

Gradient descent and its stochastic variants are widely used iterative methods in learning tasks. Although it is guaranteed that gradient descent converges to a local mininma if the step size is arranged properly, its performance is highly dependent on parameterization of the likelihood distribution. Moreover, it does not utilize the curveture of the manifold. Natural gradient descent, on the other hand, runs optimization in distribution space instead of parameter space and does not possess the shortcomings of gradient descent. Its stochastic version can be formulated as an extended Kalman filter as shown in Yann Oliver’s paper “Online Natural Gradient as a Kalman Filter” (https://arxiv.org/pdf/1703.00209.pdf), which further provides step size values.

I’ve prepared a Jupyter notebook on Yann Oliver’s paper “Online Natural Gradient as a Kalman Filter”. The notebook contains more details and visualizations. Check here

Written on June 4, 2020