The Generalized Hyperbolic Distribution

In this section we briefly review the basic properties of the Generalized Hyperbolic (GH) distribution.

Definition as Normal Mixture

Let \(Y\) be a GIG random variable with parameters \((p, a, b)\), and \(Z\) be an independent Gaussian random vector with zero mean and covariance \(\Sigma\). Then the random vector:

(1)\[X \stackrel{d}{=} \mu + \gamma Y + \sqrt{Y} Z\]

follows the generalized hyperbolic distribution with parameters \((\mu, \gamma, \Sigma, p, a, b)\), where \(\mu, \gamma \in \mathbb{R}^d\) and \(\Sigma \in \mathbb{R}^{d \times d}\) is a positive definite matrix.

Parameter Interpretation

  • \(\mu\): location parameter

  • \(\gamma\): skewness parameter

  • \(\Sigma\): models the dependency structure of the multivariate distribution

  • \((p, a, b)\): GIG parameters controlling the heavy-tailedness

In general, many multivariate heavy-tailed distributions can be defined by (1) given some non-negative random variable \(Y\). These distributions are usually called normal mixture or Gaussian mixture distributions.

Note

In many references, “normal mixture” refers to a discrete mixture of normal densities. In normix, we use “normal mixture” to refer to any random variable that can be expressed by (1).

Joint GH Distribution

The joint distribution of \(X\) and \(Y\) is crucial in analyzing the GH distribution. We call the distribution of \((X, Y)\) the joint-GH distribution. Its density function is:

(2)\[\begin{split}f(x, y | \mu, \gamma, \Sigma, p, a, b) &= \frac{1}{\sqrt{(2\pi)^d |\Sigma|}} \frac{(a/b)^{p/2}}{2 K_p(\sqrt{ab})} y^{p - 1 - d/2} \\ &\quad \times \exp\left( -\frac{1}{2}(x - \mu - \gamma y)^\top \Sigma^{-1} (x - \mu - \gamma y) y^{-1} - \frac{1}{2}(b y^{-1} + a y) \right),\end{split}\]

for \(y > 0\).

Marginal GH Density

Integrating out \(y\) from (2), we obtain the marginal distribution of \(x\), which is the GH density function:

(3)\[\begin{split}f(x | \mu, \gamma, \Sigma, p, a, b) &= \int_0^\infty f(x, y | \mu, \gamma, \Sigma, p, a, b) \, dy \\ &= c \frac{K_{p - d/2}\left(\sqrt{(b + q(x))(a + \gamma^\top \Sigma^{-1} \gamma)}\right)} {\left(\sqrt{(b + q(x))(a + \gamma^\top \Sigma^{-1} \gamma)}\right)^{d/2 - p}} e^{(x - \mu)^\top \Sigma^{-1} \gamma},\end{split}\]

where \(q(x) = (x - \mu)^\top \Sigma^{-1} (x - \mu)\) is the Mahalanobis distance, and

\[c = \frac{(a/b)^{p/2} (a + \gamma^\top \Sigma^{-1} \gamma)^{d/2 - p}} {(2\pi)^{d/2} |\Sigma|^{1/2} K_p(\sqrt{ab})}.\]

Alternative Parameterization

Similar to the GIG distribution, one can use another parameterization with \(\delta = \sqrt{b/a}\) and \(\eta = \sqrt{ab}\):

\[f(x | \mu, \gamma, \Sigma, p, \delta, \eta) = c \frac{K_{p - d/2}\left(\sqrt{(\eta + q_\delta(x))(\eta + \tilde{q}_\delta)}\right)} {\left(\sqrt{(\eta + q_\delta(x))(\eta + \tilde{q}_\delta)}\right)^{d/2 - p}} e^{(x - \mu)^\top (\delta \Sigma)^{-1} \delta \gamma},\]

where \(q_\delta(x) = (x - \mu)^\top (\delta \Sigma)^{-1} (x - \mu)\), \(\tilde{q}_\delta = (\delta \gamma)^\top (\delta \Sigma)^{-1} (\delta \gamma)\), and

\[c = \frac{(\eta + \tilde{q}_\delta)^{d/2 - p}}{(2\pi)^{d/2} |\delta \Sigma|^{1/2} K_p(\eta)}.\]

Model Identifiability

From the above representation, one can observe that the GH model is not regular since the parameter sets \((\mu, \gamma/c, \Sigma/c, p, c\delta, \eta)\) give the same distribution for any \(c > 0\). Therefore, the Fisher information matrix of the GH distribution is singular.

There are several ways to regularize the GH family:

  1. Set \(\delta = 1\) (simplest approach)

  2. Set \(b = 1\) in the EM algorithm [Protassov2004]

  3. Fix \(b\) when \(p > -1\) and fix \(a\) when \(p < 1\) [Hu2005]

  4. Fix the determinant \(|\Sigma| = 1\) [McNeil2010]

Note

When the dimension \(d\) is high, fixing \(|\Sigma| = 1\) is recommended. Since \(|\Sigma/c| = |\Sigma|/c^d\), any small perturbation of the matrix scale will make \(|\Sigma|\) change dramatically when \(d\) is large. The matrix inversion would be intractable if \(|\Sigma|\) is too large or too small.

Exponential Family Form

The joint-GH distribution (2) belongs to the exponential family with density:

\[f(x, y|\theta) = h(x, y) \exp\left(\theta^\top t(x, y) - \psi(\theta)\right)\]

Sufficient Statistics:

\[\begin{split}t(x, y) = \begin{pmatrix} \log y \\ y^{-1} \\ y \\ x \\ x y^{-1} \\ x x^\top y^{-1} \end{pmatrix}\end{split}\]

where \(t_1, t_2, t_3 \in \mathbb{R}\), \(t_4, t_5 \in \mathbb{R}^d\), and \(t_6 \in \mathbb{R}^{d \times d}\).

Natural Parameters:

The natural parameters are derived from the classical parameters \((\mu, \gamma, \Sigma, p, a, b)\). By expanding the exponent in (2):

\[\begin{split}&-\frac{1}{2}(x - \mu - \gamma y)^\top \Sigma^{-1} (x - \mu - \gamma y) y^{-1} - \frac{1}{2}(b y^{-1} + a y) \\ &= -\frac{1}{2} x^\top \Sigma^{-1} x \, y^{-1} + x^\top \Sigma^{-1} \mu \, y^{-1} + x^\top \Sigma^{-1} \gamma \\ &\quad - \frac{1}{2} \mu^\top \Sigma^{-1} \mu \, y^{-1} - \mu^\top \Sigma^{-1} \gamma - \frac{1}{2} \gamma^\top \Sigma^{-1} \gamma \, y \\ &\quad - \frac{1}{2} b \, y^{-1} - \frac{1}{2} a \, y\end{split}\]

we identify the natural parameters:

(4)\[\begin{split}\theta_1 &= p - 1 - \frac{d}{2} \quad \text{(coefficient of } \log y \text{)} \\ \theta_2 &= -\frac{1}{2}\left(b + \mu^\top \Sigma^{-1} \mu\right) \quad \text{(coefficient of } y^{-1} \text{)} \\ \theta_3 &= -\frac{1}{2}\left(a + \gamma^\top \Sigma^{-1} \gamma\right) \quad \text{(coefficient of } y \text{)} \\ \theta_4 &= \Sigma^{-1} \gamma \quad \text{(coefficient of } x \text{)} \\ \theta_5 &= \Sigma^{-1} \mu \quad \text{(coefficient of } x y^{-1} \text{)} \\ \theta_6 &= -\frac{1}{2} \Sigma^{-1} \quad \text{(coefficient of } x x^\top y^{-1} \text{)}\end{split}\]

Base Measure:

\[h(x, y) = \frac{1}{(2\pi)^{d/2}} \mathbf{1}_{y > 0}\]

Log Partition Function:

(5)\[\psi(\theta) = \frac{1}{2} \log|\Sigma| + \log 2 + \log K_p(\sqrt{ab}) + \frac{p}{2} \log\left(\frac{b}{a}\right) + \mu^\top \Sigma^{-1} \gamma\]

Expectation Parameters:

The expectation parameters \(\eta = \nabla\psi(\theta) = E[t(X, Y)]\) are:

(6)\[\begin{split}\eta_1 &:= E[\log Y] = \left.\frac{\partial}{\partial \alpha} \left(\sqrt{\frac{b}{a}}\right)^\alpha \frac{K_{p+\alpha}(\sqrt{ab})}{K_p(\sqrt{ab})} \right|_{\alpha=0} \\ \eta_2 &:= E[Y^{-1}] = \sqrt{\frac{a}{b}} \frac{K_{p-1}(\sqrt{ab})}{K_p(\sqrt{ab})} \\ \eta_3 &:= E[Y] = \sqrt{\frac{b}{a}} \frac{K_{p+1}(\sqrt{ab})}{K_p(\sqrt{ab})} \\ \eta_4 &:= E[X] = \mu + \gamma \eta_3 \\ \eta_5 &:= E[X Y^{-1}] = \mu \eta_2 + \gamma \\ \eta_6 &:= E[X X^\top Y^{-1}] = \Sigma + \mu \mu^\top \eta_2 + \gamma \gamma^\top \eta_3 + \mu \gamma^\top + \gamma \mu^\top\end{split}\]

where \(\eta_1, \eta_2, \eta_3 \in \mathbb{R}\), \(\eta_4, \eta_5 \in \mathbb{R}^d\), and \(\eta_6 \in \mathbb{R}^{d \times d}\).

Note that \(\eta_1, \eta_2, \eta_3\) are exactly the expectation parameters of the GIG random variable \(Y\).

Recovering Parameters from Expectations

Given all the expectation parameters, we can recover the original parameters as follows:

(7)\[\begin{split}\mu &= \frac{\eta_4 - \eta_3 \eta_5}{1 - \eta_2 \eta_3}, \\ \gamma &= \frac{\eta_5 - \eta_2 \eta_4}{1 - \eta_2 \eta_3}, \\ \Sigma &= \eta_6 - \eta_5 \mu^\top - \mu \eta_5^\top + \eta_2 \mu \mu^\top - \eta_3 \gamma \gamma^\top, \\ (p, a, b) &= \arg\max_{p,a,b} L_{\text{GIG}}(p, a, b | \eta_1, \eta_2, \eta_3),\end{split}\]

where \(L_{\text{GIG}}\) is the GIG log-likelihood function given in (8).

These equations form the M-step in the EM algorithm, where the expectations in (6) are replaced by conditional expectations.

Hellinger Distance

While there is no analytical formulation of the Hellinger distance between two GH distributions, the Hellinger distance of the joint-GH distributions is tractable.

Proposition. Let \(\theta_1 = (\mu_1, \gamma_1, \Sigma_1, p_1, a_1, b_1)\) and \(\theta_2 = (\mu_2, \gamma_2, \Sigma_2, p_2, a_2, b_2)\) be the parameters of two joint-GH distributions. The squared Hellinger distance between the two distributions is:

\[H_{\text{JGH}}^2(\theta_1 \| \theta_2) = 1 - \frac{|\Sigma_1 \Sigma_2|^{1/4}}{|\bar{\Sigma}|^{1/2}} \frac{(a_1/b_1)^{p_1/4} (a_2/b_2)^{p_2/4}} {\sqrt{K_{p_1}(\sqrt{a_1 b_1}) K_{p_2}(\sqrt{a_2 b_2})}} \frac{K_{\bar{p}}(\sqrt{\bar{b}' \bar{a}'})}{(\bar{a}'/\bar{b}')^{\bar{p}/2}} e^{-\frac{1}{4} \Delta\mu^\top \bar{\Sigma}^{-1} \Delta\gamma},\]

where:

  • \(\Delta\mu = \mu_1 - \mu_2\), \(\Delta\gamma = \gamma_1 - \gamma_2\)

  • \(\bar{\Sigma} = (\Sigma_1 + \Sigma_2)/2\)

  • \(\bar{p} = (p_1 + p_2)/2\), \(\bar{a} = (a_1 + a_2)/2\), \(\bar{b} = (b_1 + b_2)/2\)

  • \(\bar{b}' = \bar{b} + \frac{1}{4} \Delta\mu^\top \bar{\Sigma}^{-1} \Delta\mu\)

  • \(\bar{a}' = \bar{a} + \frac{1}{4} \Delta\gamma^\top \bar{\Sigma}^{-1} \Delta\gamma\)

If \(\mu_1 = \mu_2\), \(\gamma_1 = \gamma_2\), and \(\Sigma_1 = \Sigma_2\), then \(H_{\text{JGH}}(\theta_1 \| \theta_2) = H_{\text{GIG}}(p_1, a_1, b_1 \| p_2, a_2, b_2)\).

Although \(H_{\text{JGH}}\) differs from the Hellinger distance of the marginal GH, it provides an upper bound, so we can use it to measure how close two GH distributions are.

Numerical Stability

The computation of (7) is not stable in terms of relative error under certain conditions. However, it is relatively stable in terms of the Hellinger distance.

Numerical experiments show that:

  • The relative errors of \(\mu\) and \(\gamma\) are around machine epsilon

  • The relative error of some GIG parameters (especially when \(|p|\) is large) can be large

  • However, the Hellinger distance between true and estimated parameters remains small

This behavior is consistent with the ill-conditioning of the GIG optimization problem discussed in the The Generalized Inverse Gaussian Distribution section.

Special Cases

Several important distributions are special cases of the GH family:

  • Normal-Inverse Gaussian (NIG): \(p = -1/2\)

  • Variance-Gamma (VG): \(b \to 0\) (Gamma mixing)

  • Normal-Inverse Gamma (NInvG): \(a \to 0\) (Inverse-Gamma mixing)

  • Student-t: \(p < 0\), \(a = 0\), \(\gamma = 0\)

  • Hyperbolic: \(p = 1\)

These are implemented as separate classes in normix:

  • NormalInverseGaussian

  • VarianceGamma

  • NormalInverseGamma

References

[Protassov2004]

Protassov, R. S. (2004). EM-based maximum likelihood parameter estimation for multivariate generalized hyperbolic distributions.

[Hu2005]

Hu, W. (2005). Calibration of multivariate generalized hyperbolic distributions using the EM algorithm.

[McNeil2010]

McNeil, A. J., Frey, R., & Embrechts, P. (2010). Quantitative Risk Management. Princeton University Press.