Online EM Algorithm
In this section we review the online EM algorithm for exponential families with hidden data [Cappe2009], and apply it to the Generalized Hyperbolic distribution.
Online EM for Exponential Families
Consider the exponential family with density:
where \(x\) is observable, \(y\) is hidden, and
The expectation parameter is \(\eta = \nabla\psi(\theta)\), and the Legendre dual of \(\psi\) is \(\phi(\eta) = \eta^\top \theta - \psi(\theta)\), with \(\theta = \nabla\phi(\eta)\).
Regret Framework
The parameter estimation process can be viewed as an online game. At time \(t-1\) we make a prediction \(\theta_{t-1}\). At time \(t\) the environment reveals an observation \(x_t\), and the loss is \(l(x_t, \theta_{t-1}) = -\log f(x_t | \theta_{t-1})\). The regret from \(t = 1\) to \(T\) is:
where \(D_\psi(\theta \| \theta_0) = \psi(\theta) - \psi(\theta_0) - \nabla\psi(\theta_0)^\top (\theta - \theta_0)\) is the Bregman divergence, which coincides with the KL divergence \(D_{KL}(\theta_0 \| \theta)\).
Update Rules
The online EM algorithm updates the expectation parameters as follows. Starting from \(\eta_0 = \nabla\psi(\theta_0)\):
where
is the conditional expectation of the sufficient statistics, and \(\tau_t = \tau_0 + t\) is the step size.
Regret Bound
Theorem. With \(\tau_t = \tau_0 + t\), the regret (1) of the online EM algorithm (2) is:
where \(\theta_{ML}\) is the penalized MLE:
\(\eta_{ML} = \nabla\psi(\theta_{ML})\), and \(D_{KL}(x_t, \theta_{t-1} \| x_t, \theta_{ML})\) is the KL divergence between the conditional densities \(f(y | x_t, \theta_{t-1})\) and \(f(y | x_t, \theta_{ML})\).
Proof. The first part of the regret is:
The conditional expectation of the joint log-density satisfies:
using \(\phi(\eta) = \theta^\top \eta - \psi(\theta)\) and \(D_\phi(\eta_t \| \eta_{t-1}) = D_\psi(\theta_{t-1} \| \theta_t)\).
The key identity for the accumulated conditional expectations is:
Applying this to the second part of the regret:
Combining both parts yields (3). \(\square\)
Application to the GH Distribution
For the joint GH distribution (2), which is an exponential family, the online EM algorithm updates the sufficient statistics as follows. Given a sequence of observations \(x_1, x_2, \ldots, x_T\):
where the conditional expectations are given by (1).
The parameters \(\theta_t = (\mu_t, \gamma_t, \Sigma_t, p_t, a_t, b_t)\) are then recovered from the sufficient statistics using the same formulas as the M-step (see (3)):
and
The online EM algorithm has comparable convergence rate to the batch EM algorithm, but each step processes only a single observation, making it much faster per iteration. The step size \(\tau_t = \tau_0 + t\) corresponds to starting with an initial weight of \(\tau_0\) “pseudo-observations.”
Limitations for Curved Exponential Families
The online EM algorithm of [Cappe2009] may not converge for curved exponential families of the form:
where \(\theta(u)\) is a nonlinear mapping projecting the parameter \(u\) to a higher-dimensional space. The batch EM algorithm for curved exponential families solves:
while the online EM tends to solve:
which may not have a solution when \(\dim(u) < \dim(\theta)\). Therefore, the online EM algorithm cannot be directly applied to the factor analysis model described in Factor Analysis for Generalized Hyperbolic Distributions.