The Generalized Inverse Gaussian Distribution

In this section we review the definition and statistical properties of the generalized inverse Gaussian (GIG) distribution.

Definition

The generalized inverse Gaussian distribution is a continuous probability distribution with the density function:

(1)\[f(x|p,a,b) = \frac{(a/b)^{p/2}}{2K_p(\sqrt{ab})} x^{p-1} \exp\left(-\frac{1}{2}(b x^{-1} + a x)\right), \quad x > 0,\]

where \(K_p(\cdot)\) is the modified Bessel function of the second kind and the parameters \((p, a, b)\) satisfy:

\[\begin{split}\begin{cases} b > 0, \, a \geq 0 & \text{if } p < 0 \\ b > 0, \, a > 0 & \text{if } p = 0 \\ b \geq 0, \, a > 0 & \text{if } p > 0 \end{cases}\end{split}\]

Throughout this package we assume that \(a > 0\) and \(b > 0\) for simplicity.

Alternative Parameterization

Another useful way to parameterize the GIG distribution is to set \(\delta = \sqrt{b/a}\) and \(\eta = \sqrt{ab}\). In that case the density function can be written as:

(2)\[f(x|p, \delta, \eta) = \frac{\delta^p}{2K_p(\eta)} x^{p-1} \exp\left(-\frac{\eta}{2}(\delta x^{-1} + \delta^{-1} x)\right), \quad x > 0.\]

Note that \(\delta\) serves as a scale parameter of the GIG distribution.

Moment Generating Function

The moment generating function of a GIG distributed random variable \(X\) is given by:

\[E[e^{uX}] = \left(\sqrt{\frac{a}{a-2u}}\right)^p \frac{K_p(\sqrt{b(a-2u)})}{K_p(\sqrt{ab})} = \left(\sqrt{\frac{\eta}{\eta-2\delta u}}\right)^p \frac{K_p(\sqrt{\eta^2-2\delta u})}{K_p(\eta)}.\]

Moments

The moments of the GIG distribution have a particularly elegant form:

(3)\[E[X^\alpha] = \left(\sqrt{\frac{b}{a}}\right)^\alpha \frac{K_{p+\alpha}(\sqrt{ab})}{K_p(\sqrt{ab})} = \delta^\alpha \frac{K_{p+\alpha}(\eta)}{K_p(\eta)}.\]

This formula is implemented in the moment_alpha() method.

Exponential Family Form

The GIG distribution belongs to the exponential family with density:

\[f(x|\theta) = h(x) \exp\left(\theta^\top t(x) - \psi(\theta)\right)\]

Sufficient Statistics:

\[\begin{split}t(x) = \begin{pmatrix} \log x \\ x^{-1} \\ x \end{pmatrix}\end{split}\]

Natural Parameters:

The natural parameters \(\theta = (\theta_1, \theta_2, \theta_3)\) are derived from the classical parameters \((p, a, b)\):

(4)\[\begin{split}\theta_1 &= p - 1 \quad \text{(unbounded)} \\ \theta_2 &= -\frac{b}{2} < 0 \\ \theta_3 &= -\frac{a}{2} < 0\end{split}\]

The inverse transformation is:

\[p = \theta_1 + 1, \quad b = -2\theta_2, \quad a = -2\theta_3\]

Base Measure:

\[h(x) = \mathbf{1}_{x > 0}\]

Log Partition Function:

(5)\[\psi(\theta) = \log 2 + \log K_p(\sqrt{ab}) + \frac{p}{2} \log\left(\frac{b}{a}\right)\]

where \(p = \theta_1 + 1\), \(a = -2\theta_3\), and \(b = -2\theta_2\).

Expectation Parameters:

The expectation parameters \(\eta = \nabla\psi(\theta) = E[t(X)]\) are:

(6)\[\begin{split}\eta_1 &= E[\log X] = \frac{\partial \psi}{\partial \theta_1} \\ \eta_2 &= E[X^{-1}] = \sqrt{\frac{a}{b}} \frac{K_{p-1}(\sqrt{ab})}{K_p(\sqrt{ab})} \\ \eta_3 &= E[X] = \sqrt{\frac{b}{a}} \frac{K_{p+1}(\sqrt{ab})}{K_p(\sqrt{ab})}\end{split}\]

Unfortunately we do not have an analytical formula for \(\eta_1 = E[\log X]\). In practice it can only be approximated numerically.

Maximum Likelihood Estimation

Given the expectation parameters \((\eta_1, \eta_2, \eta_3)\), computing the natural parameters \((p, a, b)\) by solving the above equations is a challenging problem. Let \(x_1, x_2, \ldots, x_n\) be a sequence of sample data, then the maximum likelihood estimator (MLE) of GIG is given by:

(7)\[(\hat{p}, \hat{a}, \hat{b}) = \arg\max_{p,a,b} L_{GIG}(p, a, b | \hat{\eta}_1, \hat{\eta}_2, \hat{\eta}_3),\]

where \(L_{GIG}\) is the log-likelihood function (excluding constants):

(8)\[L_{GIG}(p, a, b | \eta_1, \eta_2, \eta_3) = -\frac{1}{2} b \hat{\eta}_1 - \frac{1}{2} a \hat{\eta}_2 + p \hat{\eta}_3 + \frac{p}{2} \log(a/b) - \log(K_p(\sqrt{ab})),\]

and the sufficient statistics are:

  • \(\hat{\eta}_1 = \frac{1}{n} \sum_{k=1}^n x_k^{-1}\)

  • \(\hat{\eta}_2 = \frac{1}{n} \sum_{k=1}^n x_k\)

  • \(\hat{\eta}_3 = \frac{1}{n} \sum_{k=1}^n \log(x_k)\)

One can verify that the optimal solution \((\hat{p}, \hat{a}, \hat{b})\) must satisfy (6) where \((\eta_1, \eta_2, \eta_3)\) are replaced by \((\hat{\eta}_1, \hat{\eta}_2, \hat{\eta}_3)\).

Numerical Challenges

As discussed in [Jorgensen2012], there is no analytical expression for \(\hat{p}\) or even its partial derivatives. Most literature suggests fixing \(p\) when maximizing the log-likelihood function.

Even when \(p\) is fixed, [Hu2005] reports that when \(|p|\) is large (say, above 10), there might be no solution for the first two equations in (6).

Hellinger Distance

To measure estimation errors, one good choice is the Hellinger distance between the true and estimated parameters.

Proposition. Let \((p_1, a_1, b_1)\) and \((p_2, a_2, b_2)\) be the parameters of two GIG distributions. The squared Hellinger distance between the two distributions is given by:

\[H_{GIG}^2(p_1, a_1, b_1 \| p_2, a_2, b_2) = 1 - \frac{(a_1/b_1)^{p_1/4} (a_2/b_2)^{p_2/4}} {\sqrt{K_{p_1}(\sqrt{a_1 b_1}) K_{p_2}(\sqrt{a_2 b_2})}} \frac{K_{\bar{p}}(\sqrt{\bar{a}\bar{b}})}{(\bar{a}/\bar{b})^{\bar{p}/2}},\]

where \(\bar{p} = (p_1 + p_2)/2\), \(\bar{a} = (a_1 + a_2)/2\), and \(\bar{b} = (b_1 + b_2)/2\).

Special Cases

There are several important special cases of the GIG distribution:

  • Inverse Gaussian (IG): when \(p = -1/2\)

  • Gamma: when \(p > 0\) and \(b \to 0\), giving \(\text{Gamma}(p, a/2)\)

  • Inverse Gamma: when \(p < 0\) and \(a \to 0\), giving \(\text{InvGamma}(-p, b/2)\)

These special cases are implemented as separate classes in normix:

  • InverseGaussian

  • Gamma

  • InverseGamma

References

[Jorgensen2012]

Jørgensen, B. (2012). Statistical Properties of the Generalized Inverse Gaussian Distribution. Springer.