The Generalized Hyperbolic Distribution#
In this section we briefly review the basic properties of the Generalized Hyperbolic (GH) distribution.
Definition as Normal Mixture#
Let \(Y\) be a GIG random variable with parameters \((p, a, b)\), and \(Z\) be an independent Gaussian random vector with zero mean and covariance \(\Sigma\). Then the random vector:
follows the generalized hyperbolic distribution with parameters \((\mu, \gamma, \Sigma, p, a, b)\), where \(\mu, \gamma \in \mathbb{R}^d\) and \(\Sigma \in \mathbb{R}^{d \times d}\) is a positive definite matrix.
Parameter Interpretation#
\(\mu\): location parameter
\(\gamma\): skewness parameter
\(\Sigma\): models the dependency structure of the multivariate distribution
\((p, a, b)\): GIG parameters controlling the heavy-tailedness
In general, many multivariate heavy-tailed distributions can be defined by (1) given some non-negative random variable \(Y\). These distributions are usually called normal mixture or Gaussian mixture distributions.
Note
In many references, “normal mixture” refers to a discrete mixture of normal
densities. In normix, we use “normal mixture” to refer to any random variable
that can be expressed by (1).
Joint GH Distribution#
The joint distribution of \(X\) and \(Y\) is crucial in analyzing the GH distribution. We call the distribution of \((X, Y)\) the joint-GH distribution. Its density function is:
for \(y > 0\).
Marginal GH Density#
Integrating out \(y\) from (2), we obtain the marginal distribution of \(x\), which is the GH density function:
where \(q(x) = (x - \mu)^\top \Sigma^{-1} (x - \mu)\) is the Mahalanobis distance, and
Alternative Parameterization#
Similar to the GIG distribution, one can use another parameterization with \(\delta = \sqrt{b/a}\) and \(\eta = \sqrt{ab}\):
where \(q_\delta(x) = (x - \mu)^\top (\delta \Sigma)^{-1} (x - \mu)\), \(\tilde{q}_\delta = (\delta \gamma)^\top (\delta \Sigma)^{-1} (\delta \gamma)\), and
Model Identifiability#
From the above representation, one can observe that the GH model is not regular since the parameter sets \((\mu, \gamma/c, \Sigma/c, p, c\delta, \eta)\) give the same distribution for any \(c > 0\). Therefore, the Fisher information matrix of the GH distribution is singular.
There are several ways to regularize the GH family:
Set \(\delta = 1\) (simplest approach)
Set \(b = 1\) in the EM algorithm [Protassov2004]
Fix \(b\) when \(p > -1\) and fix \(a\) when \(p < 1\) [Hu2005]
Fix the determinant \(|\Sigma| = 1\) [McNeil2010]
Note
When the dimension \(d\) is high, fixing \(|\Sigma| = 1\) is recommended. Since \(|\Sigma/c| = |\Sigma|/c^d\), any small perturbation of the matrix scale will make \(|\Sigma|\) change dramatically when \(d\) is large. The matrix inversion would be intractable if \(|\Sigma|\) is too large or too small.
Exponential Family Form#
The joint-GH distribution (2) belongs to the exponential family with density:
Sufficient Statistics:
where \(t_1, t_2, t_3 \in \mathbb{R}\), \(t_4, t_5 \in \mathbb{R}^d\), and \(t_6 \in \mathbb{R}^{d \times d}\).
Natural Parameters:
The natural parameters are derived from the classical parameters \((\mu, \gamma, \Sigma, p, a, b)\). By expanding the exponent in (2):
we identify the natural parameters:
Base Measure:
Log Partition Function:
Expectation Parameters:
The expectation parameters \(\eta = \nabla\psi(\theta) = E[t(X, Y)]\) are:
where \(\eta_1, \eta_2, \eta_3 \in \mathbb{R}\), \(\eta_4, \eta_5 \in \mathbb{R}^d\), and \(\eta_6 \in \mathbb{R}^{d \times d}\).
Note that \(\eta_1, \eta_2, \eta_3\) are exactly the expectation parameters of the GIG random variable \(Y\).
Recovering Parameters from Expectations#
Given all the expectation parameters, we can recover the original parameters as follows:
where \(L_{\text{GIG}}\) is the GIG log-likelihood function given in (8).
These equations form the M-step in the EM algorithm, where the expectations in (6) are replaced by conditional expectations.
Hellinger Distance#
While there is no analytical formulation of the Hellinger distance between two GH distributions, the Hellinger distance of the joint-GH distributions is tractable.
Proposition. Let \(\theta_1 = (\mu_1, \gamma_1, \Sigma_1, p_1, a_1, b_1)\) and \(\theta_2 = (\mu_2, \gamma_2, \Sigma_2, p_2, a_2, b_2)\) be the parameters of two joint-GH distributions. The squared Hellinger distance between the two distributions is:
where:
\(\Delta\mu = \mu_1 - \mu_2\), \(\Delta\gamma = \gamma_1 - \gamma_2\)
\(\bar{\Sigma} = (\Sigma_1 + \Sigma_2)/2\)
\(\bar{p} = (p_1 + p_2)/2\), \(\bar{a} = (a_1 + a_2)/2\), \(\bar{b} = (b_1 + b_2)/2\)
\(\bar{b}' = \bar{b} + \frac{1}{4} \Delta\mu^\top \bar{\Sigma}^{-1} \Delta\mu\)
\(\bar{a}' = \bar{a} + \frac{1}{4} \Delta\gamma^\top \bar{\Sigma}^{-1} \Delta\gamma\)
If \(\mu_1 = \mu_2\), \(\gamma_1 = \gamma_2\), and \(\Sigma_1 = \Sigma_2\), then \(H_{\text{JGH}}(\theta_1 \| \theta_2) = H_{\text{GIG}}(p_1, a_1, b_1 \| p_2, a_2, b_2)\).
Although \(H_{\text{JGH}}\) differs from the Hellinger distance of the marginal GH, it provides an upper bound, so we can use it to measure how close two GH distributions are.
Numerical Stability#
The computation of (7) is not stable in terms of relative error under certain conditions. However, it is relatively stable in terms of the Hellinger distance.
Numerical experiments show that:
The relative errors of \(\mu\) and \(\gamma\) are around machine epsilon
The relative error of some GIG parameters (especially when \(|p|\) is large) can be large
However, the Hellinger distance between true and estimated parameters remains small
This behavior is consistent with the ill-conditioning of the GIG optimization problem discussed in the The Generalized Inverse Gaussian Distribution section.
Special Cases#
Several important distributions are special cases of the GH family:
Normal-Inverse Gaussian (NIG): \(p = -1/2\)
Variance-Gamma (VG): \(b \to 0\) (Gamma mixing)
Normal-Inverse Gamma (NInvG): \(a \to 0\) (Inverse-Gamma mixing)
Student-t: \(p < 0\), \(a = 0\), \(\gamma = 0\)
Hyperbolic: \(p = 1\)
These are implemented as separate classes in normix:
NormalInverseGaussianVarianceGammaNormalInverseGamma
References#
Protassov, R. S. (2004). EM-based maximum likelihood parameter estimation for multivariate generalized hyperbolic distributions.
Hu, W. (2005). Calibration of multivariate generalized hyperbolic distributions using the EM algorithm.
McNeil, A. J., Frey, R., & Embrechts, P. (2010). Quantitative Risk Management. Princeton University Press.