Online EM Algorithm#
In this section we review the online EM algorithm for exponential families with hidden data [Cappe2009], and apply it to the Generalized Hyperbolic distribution.
Online EM for Exponential Families#
Consider the exponential family with density:
where \(x\) is observable, \(y\) is hidden, and
The expectation parameter is \(\eta = \nabla\psi(\theta)\), and the Legendre dual of \(\psi\) is \(\phi(\eta) = \eta^\top \theta - \psi(\theta)\), with \(\theta = \nabla\phi(\eta)\).
Regret Framework#
The parameter estimation process can be viewed as an online game. At time \(t-1\) we make a prediction \(\theta_{t-1}\). At time \(t\) the environment reveals an observation \(x_t\), and the loss is \(l(x_t, \theta_{t-1}) = -\log f(x_t | \theta_{t-1})\). The regret from \(t = 1\) to \(T\) is:
where \(D_\psi(\theta \| \theta_0) = \psi(\theta) - \psi(\theta_0) - \nabla\psi(\theta_0)^\top (\theta - \theta_0)\) is the Bregman divergence, which coincides with the KL divergence \(D_{KL}(\theta_0 \| \theta)\).
Update Rules#
The online EM algorithm updates the expectation parameters as follows. Starting from \(\eta_0 = \nabla\psi(\theta_0)\):
where
is the conditional expectation of the sufficient statistics, and \(\tau_t = \tau_0 + t\) is the step size.
Regret Bound#
Theorem. With \(\tau_t = \tau_0 + t\), the regret (1) of the online EM algorithm (2) is:
where \(\theta_{ML}\) is the penalized MLE:
\(\eta_{ML} = \nabla\psi(\theta_{ML})\), and \(D_{KL}(x_t, \theta_{t-1} \| x_t, \theta_{ML})\) is the KL divergence between the conditional densities \(f(y | x_t, \theta_{t-1})\) and \(f(y | x_t, \theta_{ML})\).
Proof. The first part of the regret is:
The conditional expectation of the joint log-density satisfies:
using \(\phi(\eta) = \theta^\top \eta - \psi(\theta)\) and \(D_\phi(\eta_t \| \eta_{t-1}) = D_\psi(\theta_{t-1} \| \theta_t)\).
The key identity for the accumulated conditional expectations is:
Applying this to the second part of the regret:
Combining both parts yields (3). \(\square\)
Application to the GH Distribution#
For the joint GH distribution (2), which is an exponential family, the online EM algorithm updates the sufficient statistics as follows. Given a sequence of observations \(x_1, x_2, \ldots, x_T\):
where the conditional expectations are given by (1).
The parameters \(\theta_t = (\mu_t, \gamma_t, \Sigma_t, p_t, a_t, b_t)\) are then recovered from the sufficient statistics using the same formulas as the M-step (see (3)):
and
The online EM algorithm has comparable convergence rate to the batch EM algorithm, but each step processes only a single observation, making it much faster per iteration. The step size \(\tau_t = \tau_0 + t\) corresponds to starting with an initial weight of \(\tau_0\) “pseudo-observations.”
Limitations for Curved Exponential Families#
The online EM algorithm of [Cappe2009] may not converge for curved exponential families of the form:
where \(\theta(u)\) is a nonlinear mapping projecting the parameter \(u\) to a higher-dimensional space. The batch EM algorithm for curved exponential families solves:
while the online EM tends to solve:
which may not have a solution when \(\dim(u) < \dim(\theta)\). Therefore, the online EM algorithm cannot be directly applied to the factor analysis model described in Factor Analysis for Generalized Hyperbolic Distributions.