Factor Analysis for Generalized Hyperbolic Distributions
Another way to improve the conditioning of \(\Sigma\) is the factor
analysis approach [Tortora2013]. Instead of estimating a full covariance
matrix, we assume the structure \(\Sigma = F F^\top + D\) where
\(F \in \mathbb{R}^{d \times r}\) with \(r < d\) and
\(D \in \mathbb{R}^{d \times d}\) is a positive definite diagonal matrix.
This is equivalent to saying that a GH random vector \(X\) can be
expressed as:
\[X \stackrel{d}{=} \mu + \gamma Y + \sqrt{Y}(F Z + \varepsilon),\]
where \(Z \in \mathbb{R}^r \sim N(0, I)\) and
\(\varepsilon \sim N(0, D)\) are independent.
Joint Distribution
The conditional distribution of \(X\) given \(Y\) and \(Z\)
is \(N(\mu + \gamma Y + \sqrt{Y} F Z, \, D Y)\). The joint distribution
of \((X, Y, Z)\) is:
(1)\[\begin{split}&f(x, y, z | \mu, \gamma, F, D, p, a, b) \\
&= \frac{1}{\sqrt{(2\pi)^{d+r} |D|}}
\frac{(a/b)^{p/2}}{2 K_p(\sqrt{ab})}
y^{p - 1 - d/2} \\
&\quad \times \exp\!\left(-\frac{1}{2} z^\top z
- \frac{1}{2}(b \, y^{-1} + a \, y)
- \frac{1}{2}(x - \mu - \gamma y - F \sqrt{y} \, z)^\top
D^{-1}(x - \mu - \gamma y - F \sqrt{y} \, z) \, y^{-1}\right)\end{split}\]
for \(y > 0\).
Curved Exponential Family
The density (1) belongs to a curved exponential family:
\[f(x, y, z) = h(x, y, z) \exp\!\left(\theta(u)^\top t(x, y, z)
- \psi(\theta(u))\right),\]
where \(\theta(\cdot)\) is a nonlinear mapping from the parameter
\(u = (\mu, \gamma, F, D, p, a, b)\) to a higher-dimensional space.
The sufficient statistics \(t(x, y, z)\) consist of ten components:
\[s_1 = y^{-1}, \quad s_2 = y, \quad s_3 = \log y, \quad
s_4 = x, \quad s_5 = x y^{-1}, \quad s_6 = x x^\top y^{-1},\]
\[s_7 = x z^\top y^{-1/2}, \quad s_8 = z y^{-1/2}, \quad
s_9 = z y^{1/2}, \quad s_{10} = z z^\top.\]
Log-Likelihood Function
The log-likelihood function for the factor analysis model is:
\[\begin{split}L_{FA}(u | s) &= -\frac{1}{2} \log|D|
- \frac{1}{2} \mu^\top D^{-1} \mu \, s_1
- \frac{1}{2} \gamma^\top D^{-1} \gamma \, s_2
+ \gamma^\top D^{-1} s_4
+ \mu^\top D^{-1} s_5 \\
&\quad - \frac{1}{2} \operatorname{tr}(D^{-1} s_6)
+ \operatorname{tr}(F^\top D^{-1} s_7)
- \mu^\top D^{-1} F s_8
- \gamma^\top D^{-1} F s_9 \\
&\quad - \frac{1}{2} \operatorname{tr}(F^\top D^{-1} F s_{10})
- \mu^\top D^{-1} \gamma
+ L_{GIG}(p, a, b | s_1, s_2, s_3),\end{split}\]
where \(L_{GIG}\) is the GIG log-likelihood defined in (8).
Conditional Expectations for the E-Step
Integrating (1) over \(z\), the joint distribution of
\((X, Y)\) has covariance structure \(\Sigma = F F^\top + D\).
Therefore the conditional distribution of \(Y\) given \(X\) is:
\[Y | X = x \sim \operatorname{GIG}\!\left(p - \frac{d}{2}, \,
a + \gamma^\top (F F^\top + D)^{-1} \gamma, \,
b + (x - \mu)^\top (F F^\top + D)^{-1} (x - \mu)\right),\]
and the conditional moments \(E[Y^\alpha | X, u]\) and
\(E[\log Y | X, u]\) are computed using (1).
The conditional distribution of \((X, Z)\) given \(Y\) is Gaussian:
\[\begin{split}\begin{pmatrix} (X - \mu - \gamma Y)/\sqrt{Y} \\ Z \end{pmatrix}
\Bigg| Y, u
\sim N\!\left(0, \begin{pmatrix}
F F^\top + D & F \\ F^\top & I
\end{pmatrix}\right).\end{split}\]
Define \(\beta = F^\top (F F^\top + D)^{-1}\). The conditional
expectations of the latent factor \(Z\) are:
\[\begin{split}E[Z Y^{-1/2} | X, u] &= \beta(X - \mu) E[Y^{-1} | X, u] - \beta \gamma, \\
E[Z Y^{1/2} | X, u] &= \beta(X - \mu) - \beta \gamma \, E[Y | X, u], \\
E[Z Z^\top | X, u] &= I - \beta F
+ \beta(X - \mu)(X - \mu)^\top \beta^\top E[Y^{-1} | X, u] \\
&\quad - \beta(X - \mu)\gamma^\top \beta^\top
- \beta \gamma (X - \mu)^\top \beta^\top
+ \beta \gamma \gamma^\top \beta^\top E[Y | X, u].\end{split}\]
E-Step
Given i.i.d. samples \(x_1, \ldots, x_n\) and current parameters
\(u_k = (\mu_k, \gamma_k, F_k, D_k, p_k, a_k, b_k)\), the E-step
computes all ten sufficient statistics. The first six are the same as the
standard EM algorithm (see (2)):
\[\begin{split}s_1^{(k)} &= \frac{1}{n} \sum_{j=1}^n E[Y^{-1} | X = x_j, u_k], \\
s_2^{(k)} &= \frac{1}{n} \sum_{j=1}^n E[Y | X = x_j, u_k], \\
s_3^{(k)} &= \frac{1}{n} \sum_{j=1}^n E[\log Y | X = x_j, u_k], \\
s_4^{(k)} &= \frac{1}{n} \sum_{j=1}^n x_j, \\
s_5^{(k)} &= \frac{1}{n} \sum_{j=1}^n x_j \, E[Y^{-1} | X = x_j, u_k], \\
s_6^{(k)} &= \frac{1}{n} \sum_{j=1}^n
x_j x_j^\top E[Y^{-1} | X = x_j, u_k].\end{split}\]
The remaining four are determined by the first six using
\(\beta_k = F_k^\top (F_k F_k^\top + D_k)^{-1}\):
\[\begin{split}s_7^{(k)} &= (s_6^{(k)} - s_5^{(k)} \mu_k^\top
- s_4^{(k)} \gamma_k^\top) \beta_k^\top, \\
s_8^{(k)} &= \beta_k (s_5^{(k)} - \mu_k \, s_1^{(k)} - \gamma_k), \\
s_9^{(k)} &= \beta_k (s_4^{(k)} - \mu_k - \gamma_k \, s_2^{(k)}), \\
s_{10}^{(k)} &= I - \beta_k F_k
+ \beta_k \big(s_6^{(k)} - s_5^{(k)} \mu_k^\top
- \mu_k (s_5^{(k)})^\top + \mu_k \mu_k^\top s_1^{(k)} \\
&\quad - (s_4^{(k)} - \mu_k) \gamma_k^\top
- \gamma_k (s_4^{(k)} - \mu_k)^\top
+ \gamma_k \gamma_k^\top s_2^{(k)}\big) \beta_k^\top.\end{split}\]
The M-step then applies (3) and (2) with
\(s = (s_1^{(k)}, \ldots, s_{10}^{(k)})\).
References
[Tortora2013]
Tortora, C., McNicholas, P. D., & Browne, R. P. (2013).
Mixtures of multivariate generalized hyperbolic distributions.