The Generalized Inverse Gaussian Distribution

In this section we review the definition and statistical properties of the generalized inverse Gaussian (GIG) distribution.

Definition

The generalized inverse Gaussian distribution is a continuous probability distribution with the density function:

(1)\[f(x|p,a,b) = \frac{(a/b)^{p/2}}{2K_p(\sqrt{ab})} x^{p-1} \exp\left(-\frac{1}{2}(b x^{-1} + a x)\right), \quad x > 0,\]

where \(K_p(\cdot)\) is the modified Bessel function of the second kind and the parameters \((p, a, b)\) satisfy:

\[\begin{split}\begin{cases} b > 0, \, a \geq 0 & \text{if } p < 0 \\ b > 0, \, a > 0 & \text{if } p = 0 \\ b \geq 0, \, a > 0 & \text{if } p > 0 \end{cases}\end{split}\]

Throughout this package we assume that \(a > 0\) and \(b > 0\) for simplicity.

Alternative Parameterization

Another useful way to parameterize the GIG distribution is to set \(\delta = \sqrt{b/a}\) and \(\eta = \sqrt{ab}\). In that case the density function can be written as:

(2)\[f(x|p, \delta, \eta) = \frac{\delta^p}{2K_p(\eta)} x^{p-1} \exp\left(-\frac{\eta}{2}(\delta x^{-1} + \delta^{-1} x)\right), \quad x > 0.\]

Note that \(\delta\) serves as a scale parameter of the GIG distribution.

Moment Generating Function

The moment generating function of a GIG distributed random variable \(X\) is given by:

\[E[e^{uX}] = \left(\sqrt{\frac{a}{a-2u}}\right)^p \frac{K_p(\sqrt{b(a-2u)})}{K_p(\sqrt{ab})} = \left(\sqrt{\frac{\eta}{\eta-2\delta u}}\right)^p \frac{K_p(\sqrt{\eta^2-2\delta u})}{K_p(\eta)}.\]

Moments

The moments of the GIG distribution have a particularly elegant form:

(3)\[E[X^\alpha] = \left(\sqrt{\frac{b}{a}}\right)^\alpha \frac{K_{p+\alpha}(\sqrt{ab})}{K_p(\sqrt{ab})} = \delta^\alpha \frac{K_{p+\alpha}(\eta)}{K_p(\eta)}.\]

This formula is implemented in the moment_alpha() method.

Exponential Family Form

The GIG distribution belongs to the exponential family with density:

\[f(x|\theta) = h(x) \exp\left(\theta^\top t(x) - \psi(\theta)\right)\]

Sufficient Statistics:

\[\begin{split}t(x) = \begin{pmatrix} \log x \\ x^{-1} \\ x \end{pmatrix}\end{split}\]

Natural Parameters:

The natural parameters \(\theta = (\theta_1, \theta_2, \theta_3)\) are derived from the classical parameters \((p, a, b)\):

(4)\[\begin{split}\theta_1 &= p - 1 \quad \text{(unbounded)} \\ \theta_2 &= -\frac{b}{2} < 0 \\ \theta_3 &= -\frac{a}{2} < 0\end{split}\]

The inverse transformation is:

\[p = \theta_1 + 1, \quad b = -2\theta_2, \quad a = -2\theta_3\]

Base Measure:

\[h(x) = \mathbf{1}_{x > 0}\]

Log Partition Function:

(5)\[\psi(\theta) = \log 2 + \log K_p(\sqrt{ab}) + \frac{p}{2} \log\left(\frac{b}{a}\right)\]

where \(p = \theta_1 + 1\), \(a = -2\theta_3\), and \(b = -2\theta_2\).

Expectation Parameters:

The expectation parameters \(\eta = \nabla\psi(\theta) = E[t(X)]\) are:

(6)\[\begin{split}\eta_1 &= E[\log X] = \frac{\partial \psi}{\partial \theta_1} \\ \eta_2 &= E[X^{-1}] = \sqrt{\frac{a}{b}} \frac{K_{p-1}(\sqrt{ab})}{K_p(\sqrt{ab})} \\ \eta_3 &= E[X] = \sqrt{\frac{b}{a}} \frac{K_{p+1}(\sqrt{ab})}{K_p(\sqrt{ab})}\end{split}\]

Unfortunately we do not have an analytical formula for \(\eta_1 = E[\log X]\). In practice it can only be approximated numerically.

Maximum Likelihood Estimation

Given the expectation parameters \((\eta_1, \eta_2, \eta_3)\), computing the natural parameters \((p, a, b)\) by solving the above equations is a challenging problem. Let \(x_1, x_2, \ldots, x_n\) be a sequence of sample data, then the maximum likelihood estimator (MLE) of GIG is given by:

(7)\[(\hat{p}, \hat{a}, \hat{b}) = \arg\max_{p,a,b} L_{GIG}(p, a, b | \hat{\eta}_1, \hat{\eta}_2, \hat{\eta}_3),\]

where \(L_{GIG}\) is the log-likelihood function (excluding constants):

(8)\[L_{GIG}(p, a, b | \eta_1, \eta_2, \eta_3) = -\frac{1}{2} b \hat{\eta}_1 - \frac{1}{2} a \hat{\eta}_2 + p \hat{\eta}_3 + \frac{p}{2} \log(a/b) - \log(K_p(\sqrt{ab})),\]

and the sufficient statistics are:

\(\hat{\eta}_1 = \frac{1}{n} \sum_{k=1}^n x_k^{-1}\)
\(\hat{\eta}_2 = \frac{1}{n} \sum_{k=1}^n x_k\)
\(\hat{\eta}_3 = \frac{1}{n} \sum_{k=1}^n \log(x_k)\)

One can verify that the optimal solution \((\hat{p}, \hat{a}, \hat{b})\) must satisfy (6) where \((\eta_1, \eta_2, \eta_3)\) are replaced by \((\hat{\eta}_1, \hat{\eta}_2, \hat{\eta}_3)\).

Numerical Challenges

As discussed in [Jorgensen2012], there is no analytical expression for \(\hat{p}\) or even its partial derivatives. Most literature suggests fixing \(p\) when maximizing the log-likelihood function.

Even when \(p\) is fixed, [Hu2005] reports that when \(|p|\) is large (say, above 10), there might be no solution for the first two equations in (6).

Hellinger Distance

To measure estimation errors, one good choice is the Hellinger distance between the true and estimated parameters.

Proposition. Let \((p_1, a_1, b_1)\) and \((p_2, a_2, b_2)\) be the parameters of two GIG distributions. The squared Hellinger distance between the two distributions is given by:

\[H_{GIG}^2(p_1, a_1, b_1 \| p_2, a_2, b_2) = 1 - \frac{(a_1/b_1)^{p_1/4} (a_2/b_2)^{p_2/4}} {\sqrt{K_{p_1}(\sqrt{a_1 b_1}) K_{p_2}(\sqrt{a_2 b_2})}} \frac{K_{\bar{p}}(\sqrt{\bar{a}\bar{b}})}{(\bar{a}/\bar{b})^{\bar{p}/2}},\]

where \(\bar{p} = (p_1 + p_2)/2\), \(\bar{a} = (a_1 + a_2)/2\), and \(\bar{b} = (b_1 + b_2)/2\).

Special Cases

There are several important special cases of the GIG distribution:

Inverse Gaussian (IG): when \(p = -1/2\)
Gamma: when \(p > 0\) and \(b \to 0\), giving \(\text{Gamma}(p, a/2)\)
Inverse Gamma: when \(p < 0\) and \(a \to 0\), giving \(\text{InvGamma}(-p, b/2)\)

These special cases are implemented as separate classes in normix:

InverseGaussian
Gamma
InverseGamma

References

[Jorgensen2012]

Jørgensen, B. (2012). Statistical Properties of the Generalized Inverse Gaussian Distribution. Springer.