The Generalized Hyperbolic Distribution
In this section we briefly review the basic properties of the Generalized Hyperbolic (GH) distribution.
Definition as Normal Mixture
Let \(Y\) be a GIG random variable with parameters \((p, a, b)\), and \(Z\) be an independent Gaussian random vector with zero mean and covariance \(\Sigma\). Then the random vector:
follows the generalized hyperbolic distribution with parameters \((\mu, \gamma, \Sigma, p, a, b)\), where \(\mu, \gamma \in \mathbb{R}^d\) and \(\Sigma \in \mathbb{R}^{d \times d}\) is a positive definite matrix.
Parameter Interpretation
\(\mu\): location parameter
\(\gamma\): skewness parameter
\(\Sigma\): models the dependency structure of the multivariate distribution
\((p, a, b)\): GIG parameters controlling the heavy-tailedness
In general, many multivariate heavy-tailed distributions can be defined by (1) given some non-negative random variable \(Y\). These distributions are usually called normal mixture or Gaussian mixture distributions.
Note
In many references, “normal mixture” refers to a discrete mixture of normal
densities. In normix, we use “normal mixture” to refer to any random variable
that can be expressed by (1).
Joint GH Distribution
The joint distribution of \(X\) and \(Y\) is crucial in analyzing the GH distribution. We call the distribution of \((X, Y)\) the joint-GH distribution. Its density function is:
for \(y > 0\).
Marginal GH Density
Integrating out \(y\) from (2), we obtain the marginal distribution of \(x\), which is the GH density function:
where \(q(x) = (x - \mu)^\top \Sigma^{-1} (x - \mu)\) is the Mahalanobis distance, and
Alternative Parameterization
Similar to the GIG distribution, one can use another parameterization with \(\delta = \sqrt{b/a}\) and \(\eta = \sqrt{ab}\):
where \(q_\delta(x) = (x - \mu)^\top (\delta \Sigma)^{-1} (x - \mu)\), \(\tilde{q}_\delta = (\delta \gamma)^\top (\delta \Sigma)^{-1} (\delta \gamma)\), and
Model Identifiability
From the above representation, one can observe that the GH model is not regular since the parameter sets \((\mu, \gamma/c, \Sigma/c, p, c\delta, \eta)\) give the same distribution for any \(c > 0\). Therefore, the Fisher information matrix of the GH distribution is singular.
There are several ways to regularize the GH family:
Set \(\delta = 1\) (simplest approach)
Set \(b = 1\) in the EM algorithm [Protassov2004]
Fix \(b\) when \(p > -1\) and fix \(a\) when \(p < 1\) [Hu2005]
Fix the determinant \(|\Sigma| = 1\) [McNeil2010]
Note
When the dimension \(d\) is high, fixing \(|\Sigma| = 1\) is recommended. Since \(|\Sigma/c| = |\Sigma|/c^d\), any small perturbation of the matrix scale will make \(|\Sigma|\) change dramatically when \(d\) is large. The matrix inversion would be intractable if \(|\Sigma|\) is too large or too small.
Exponential Family Form
The joint-GH distribution (2) belongs to the exponential family with density:
Sufficient Statistics:
where \(t_1, t_2, t_3 \in \mathbb{R}\), \(t_4, t_5 \in \mathbb{R}^d\), and \(t_6 \in \mathbb{R}^{d \times d}\).
Natural Parameters:
The natural parameters are derived from the classical parameters \((\mu, \gamma, \Sigma, p, a, b)\). By expanding the exponent in (2):
we identify the natural parameters:
Base Measure:
Log Partition Function:
Expectation Parameters:
The expectation parameters \(\eta = \nabla\psi(\theta) = E[t(X, Y)]\) are:
where \(\eta_1, \eta_2, \eta_3 \in \mathbb{R}\), \(\eta_4, \eta_5 \in \mathbb{R}^d\), and \(\eta_6 \in \mathbb{R}^{d \times d}\).
Note that \(\eta_1, \eta_2, \eta_3\) are exactly the expectation parameters of the GIG random variable \(Y\).
Recovering Parameters from Expectations
Given all the expectation parameters, we can recover the original parameters as follows:
where \(L_{\text{GIG}}\) is the GIG log-likelihood function given in (8).
These equations form the M-step in the EM algorithm, where the expectations in (6) are replaced by conditional expectations.
Hellinger Distance
While there is no analytical formulation of the Hellinger distance between two GH distributions, the Hellinger distance of the joint-GH distributions is tractable.
Proposition. Let \(\theta_1 = (\mu_1, \gamma_1, \Sigma_1, p_1, a_1, b_1)\) and \(\theta_2 = (\mu_2, \gamma_2, \Sigma_2, p_2, a_2, b_2)\) be the parameters of two joint-GH distributions. The squared Hellinger distance between the two distributions is:
where:
\(\Delta\mu = \mu_1 - \mu_2\), \(\Delta\gamma = \gamma_1 - \gamma_2\)
\(\bar{\Sigma} = (\Sigma_1 + \Sigma_2)/2\)
\(\bar{p} = (p_1 + p_2)/2\), \(\bar{a} = (a_1 + a_2)/2\), \(\bar{b} = (b_1 + b_2)/2\)
\(\bar{b}' = \bar{b} + \frac{1}{4} \Delta\mu^\top \bar{\Sigma}^{-1} \Delta\mu\)
\(\bar{a}' = \bar{a} + \frac{1}{4} \Delta\gamma^\top \bar{\Sigma}^{-1} \Delta\gamma\)
If \(\mu_1 = \mu_2\), \(\gamma_1 = \gamma_2\), and \(\Sigma_1 = \Sigma_2\), then \(H_{\text{JGH}}(\theta_1 \| \theta_2) = H_{\text{GIG}}(p_1, a_1, b_1 \| p_2, a_2, b_2)\).
Although \(H_{\text{JGH}}\) differs from the Hellinger distance of the marginal GH, it provides an upper bound, so we can use it to measure how close two GH distributions are.
Numerical Stability
The computation of (7) is not stable in terms of relative error under certain conditions. However, it is relatively stable in terms of the Hellinger distance.
Numerical experiments show that:
The relative errors of \(\mu\) and \(\gamma\) are around machine epsilon
The relative error of some GIG parameters (especially when \(|p|\) is large) can be large
However, the Hellinger distance between true and estimated parameters remains small
This behavior is consistent with the ill-conditioning of the GIG optimization problem discussed in the The Generalized Inverse Gaussian Distribution section.
Special Cases
Several important distributions are special cases of the GH family:
Normal-Inverse Gaussian (NIG): \(p = -1/2\)
Variance-Gamma (VG): \(b \to 0\) (Gamma mixing)
Normal-Inverse Gamma (NInvG): \(a \to 0\) (Inverse-Gamma mixing)
Student-t: \(p < 0\), \(a = 0\), \(\gamma = 0\)
Hyperbolic: \(p = 1\)
These are implemented as separate classes in normix:
NormalInverseGaussianVarianceGammaNormalInverseGamma
References
Protassov, R. S. (2004). EM-based maximum likelihood parameter estimation for multivariate generalized hyperbolic distributions.
Hu, W. (2005). Calibration of multivariate generalized hyperbolic distributions using the EM algorithm.
McNeil, A. J., Frey, R., & Embrechts, P. (2010). Quantitative Risk Management. Princeton University Press.