Thursday 29 October 2015

Entropy of Sign-Magnitude Coding of Uniformly Quantized Laplacian Variables

This is the third post in series discussing uniform quantization of Laplacian stochastic variables and is about entropy of separately coding sign and magnitude of uniformly quantized Laplacian variables.

We begin by showing that the distribution of the magnitude of uniformly quantised Laplacian variable is the same as the distribution of uniformly quantised magnitude of the Laplacian variable which is shown in [1] to be equivalent to the distribution of a corresponding uniformly quantised Exponential variable.

Let $\hat{x}$ be the uniformly quantized version, with step size $\Delta$, of a Laplacian variable $x$ with $\text{Laplace}(0,b)$ distribution and let $\hat{m}$ and $\hat{s}$ be the variables denoting magnitude and sign of $\hat{x}$ respectively. We have $\hat{m} = |\hat{x}| \in \{0, \mathbb{Z}^+\}$ and $\hat{s} = \text{sign}(\hat{x}) \in \{-1, 0, +1\}$.

Let $f_\hat{x}(\hat{x})$ and $\Phi_\hat{x}(\hat{x})$ denote the probability mass function and cumulative distribution function of $\hat{x}$ respectively. These are given as follows.

\begin{align}f_\hat{x}(\hat{x}) &= \begin{cases}1-e^{-\frac{\Delta}{2b}} & \text{if $\hat{x} = 0$}\\ \frac{1}{2}e^{-\frac{|\hat{x}|}{b}}\left(e^{\frac{\Delta}{2b}} - e^{-\frac{\Delta}{2b}}\right) & \text{otherwise}\end{cases}\\\Phi_\hat{x}(\hat{x}) &= \begin{cases}\frac{1}{2}e^{\frac{\hat{x}}{b}}e^{\frac{\Delta}{2b}} & \text{if $\hat{x} < 0$}\\1-\frac{1}{2}e^{\frac{\hat{x}}{b}}e^{-\frac{\Delta}{2b}} & \text{if $\hat{x} \geq 0$}\end{cases}\end{align}

The cumulative distribution function $\Phi_\hat{m}(\hat{m})$ of the discrete variable $\hat{m}$ is given as follows

\begin{align}\Phi_\hat{m}(\hat{m}) = \Phi_\hat{x}(\hat{m}) - \Phi_\hat{x}(-\hat{m}_{+1})\end{align},

where $\hat{m}_{+1} = \hat{m} + \Delta$ denotes the quantization point immediately succeeding $\hat{m}$. Substituting the value of $\Phi_\hat{x}(\hat{x})$ from above we have

\begin{align}\Phi_\hat{m}(\hat{m}) &= 1 - e^{-\frac{\hat{m}}{b}} e^{-\frac{\Delta}{2b}}\end{align},

which can be readily seen as the cumulative distribution function of a uniformly quantized Exponential variable $\text{Exponential}(1/b)$ quantized with step size $\Delta$. Therefore, the entropy of $\hat{m}$, denoted by $H_\hat{m}$, is as given in [2]. Note that a generic version of the above equivalence may be proven for distributions symmetric around zero.

Next, we find the entropy of the stochastic variable denoting the sign $\hat{s}$ which takes values from the set $\hat{S} = \{+1, 0, -1\}$. Let $f_\hat{s}(\hat{s})$ denote the probability mass function of the discrete variable $\hat{s}$. It is given as follows.

\begin{align}f_\hat{s}(\hat{s}) = \begin{cases}1 - e^{-\frac{\Delta}{2b}} & \text{if $\hat{s} = 0$}\\ \frac{1}{2} e^{-\frac{\Delta}{2b}} & \text{if $\hat{s} = \pm 1$}\end{cases}\end{align}.

Encoding $\hat{s} = 0$ carries no more information than what is contained in $\hat{m}$ as $\hat{s} = 0$ if and only if $\hat{m} = 0$. Therefore, we only need to encode the non-zero signs. The entropy of the reduced size alphabet may be computed using the theorem for general case given below.

Let $A = \{\chi_0, \chi_1, \cdots, \chi_n\}$ be a finite size coding alphabet with corresponding probabilities $P = \{p_0, p_1, \cdots, p_n\}$. Now let $D = \{d_0, d_1, \cdots, d_n\}$ be the probabilities that symbols from $A$ are coded. That is, if $d_i = 1$ then the i-th symbol is always coded, if $d_i = 0.5$ then it is coded 50% of the times, and the other 50% of the times it is inferred correctly at the decoder. There are no errors in the decoding process due to the reduced alphabet size. Let $H_{A^{-}}$ denote the entropy of the reduced size coding alphabet. Then $H_{A^{-}}$ may be given by the straighforward theorem below.

Theorem 1. Entropy

\begin{align}H_{A^{-}} = -\sum_{i = 1}^{n} p_i d_i \log \left( \frac{p_i d_i}{\sum_{j} p_j d_j} \right)\end{align}

For the case above for $\hat{s}$, we set $D = \{1, 0, 1\}$, that is, we would not encode the symbol '0' but infer it correctly based on the value of $\hat{m}$. We have entropy of coding $\hat{s}$ using this scheme, denoted by $H_{\hat{S}^-}$, given as follows.

\begin{align}H_{\hat{S}^-} &= -2\frac{1}{2} e^{-\frac{\Delta}{2b}} \log \frac{1}{2}\nonumber\\ &= e^{-\frac{\Delta}{2b}} \log 2\end{align}

Finally, using the values of entropy $H_\hat{x}$ and $H_\hat{m}$ from [1] and [2] and $H_{\hat{S}^-}$ from above, we have the following upon simplification.

\begin{align}H_\hat{x} = H_\hat{m} + H_{\hat{S}^-}\end{align}

Therefore, encoding the magnitude and the reduced sign has the same entropy as encoding the uniformly quantized Laplacian variable.

References:
[1] Aravindh Krishnamoorthy, Entropy of Uniformly Quantized Laplace and Half-Laplace Distributions, Applied Mathematics and Engineering Blog, October 2015.
[2] Aravindh Krishnamoorthy, Entropy of Uniformly Quantized Exponential Distribution, Applied Mathematics and Engineering Blog, October 2015.

Monday 5 October 2015

Entropy of Uniformly Quantized Laplace and Half-Laplace Distributions

This post is about entropy of discrete stochastic variables that are derived by quantizing continuous stochastic variables. A good introduction to the concept of entropy in Information Theory is in [1]. This post is in continuation to the previous post in [2].

Let $x \in \mathbb{R}$ be a continuous stochastic variable with $\text{Laplace}(0, b)$ distribution and let $\hat{x} \in \hat{X}$ such that $\hat{X} \subset \mathbb{R}$ and $0 \in \hat{X}$ be its uniformly quantized version with step size $\Delta$. We would like to find the entropy of the discrete stochastic variable $\hat{x}$, denoted by $H_\hat{x}$, which provides us an estimate of the average number of bits required to encode $\hat{x}$.

HALF-LAPLACE DISTRIBUTION


Half-Laplace distribution refers to the distribution obtained by folding the zero-mean distribution function along the center. The new distribution is equivalent to the distribution of the stochastic variable $y = |x|$, i.e. the absolute value of the original Laplace distributed variable.

Let $\varphi(x)$ and $\Phi(x)$ denote the probability distribution function and cumulative distribution function of the Laplace variable $x$. These are defined as follows.

\begin{align}\varphi(x) &= \frac{1}{2b} e^{-\frac{|x|}{b}} \\ \Phi(x) &= \begin{cases} \frac{1}{2} e^{\frac{x}{b}} & \text{if $x < 0$} \\ 1 - \frac{1}{2} e^{-\frac{x}{b}} & \text{if $x \geq 0$}\end{cases}\end{align}

The half-Laplace cumulative distribution function of $y$, denoted by $\breve{\Phi}(y)$, is then given as follows.

\begin{align} \breve{\Phi}(y) &= \begin{cases} 0 & \text{if $y = 0$} \\ \Phi(y) - \Phi(-y) & \text{if $y > 0$}\end{cases} \\ &= 1 - e^{-\frac{y}{b}} \label{eqn:le} \end{align}

The expression in equation (\ref{eqn:le}) may be directly recognized as the cumulative distribution function of $\text{Exponential}(1/b)$. Therefore, the entropy of half-Laplace distribution may be found according to the expressions in [2] with $\lambda = 1/b$. 

LAPLACE DISTRIBUTION


Let $f(\hat{x})$ denote the probability mass function of the quantized variable $\hat{x}$. It is related to the probability density function $\varphi(x)$ as follows.
\begin{align} f(\hat{x}) = \int_{\hat{x} - \frac{\Delta}{2}}^{\hat{x} - \frac{\Delta}{2}} \varphi(x) dx \label{eqn:lq} \end{align}
Further, let $\hat{X}{}^+ \subset \hat{X}$ such that it contains the positive quantized values. The negative entropy $-H_\hat{x}$ is given (by definition) as follows. Note that the simplification is possible due to the symmetry of Laplace distribution.

\begin{align} -H_\hat{x} &= \sum_{\hat{x} \in \hat{X}} f(\hat{x}) \log f(\hat{x}) \\ &= f(0) \log f(0) + 2 \sum_{\hat{x} \in \hat{X}{}^+} f(\hat{x}) \log f(\hat{x}) \end{align}

By using the definition from equation (\ref{eqn:lq}) we have the following:

\begin{align} -H_\hat{x} &= \left(1 - e^{-\frac{\Delta}{2b}}\right) \log \left[1 - e^{-\frac{\Delta}{2b}} \right] \nonumber\\ &\quad +  \left(e^{\frac{\Delta}{2b}} - e^{-\frac{\Delta}{2b}}\right) \log \left[ \frac{1}{2} \left(e^{\frac{\Delta}{2b}} - e^{-\frac{\Delta}{2b}}\right) \right] v_1(b) \nonumber\\ &\quad + \left(e^{\frac{\Delta}{2b}} - e^{-\frac{\Delta}{2b}}\right) v_2(b) \end{align}

With the functions $v_{1,2}(b)$ given by:

\begin{align} v_1(b) &= \sum_{\hat{x} \in \hat{X}{}^+} e^{-\frac{\hat{x}}{b}} \\ v_2(b) &= \sum_{\hat{x} \in \hat{X}{}^+} e^{-\frac{\hat{x}}{b}} \log e^{-\frac{\hat{x}}{b}} \end{align}

EXACT RESULT


The analytical expressions for $v_{1,2}(b)$ may be computed starting from the following identities.

\begin{align} 1 &= \int_{-\infty}^{\infty} \varphi(x) dx = 2 \int_{0}^{\frac{\Delta}{2}} \varphi(x) dx + 2 \sum_{\hat{x} \in \hat{X}{}^+} \int_{\hat{x} - \frac{\Delta}{2}}^{\hat{x} + \frac{\Delta}{2}} \varphi(x) dx \\ -h_x &= \int_{-\infty}^{\infty} \varphi(x) \log \varphi(x) dx  \nonumber\\ &= 2 \int_{0}^{\frac{\Delta}{2}} \varphi(x) \log \varphi(x) dx + 2 \sum_{\hat{x} \in \hat{X}{}^+} \int_{\hat{x} - \frac{\Delta}{2}}^{\hat{x} + \frac{\Delta}{2}} \varphi(x) \log \varphi(x) dx \end{align}

Where $h_x$ is the differential entropy. Upon simplification, we obtain the following results.

\begin{align} v_1(b) &= \frac{e^{-\frac{\Delta}{2b}}}{e^{\frac{\Delta}{2b}} - e^{-\frac{\Delta}{2b}}} \\ v_2(b) &= \frac{-h_x - c_1 - \frac{\Delta}{2b} \left[ e^{\frac{\Delta}{2b}} + e^{-\frac{\Delta}{2b}} \right] v_1(b)}{e^{\frac{\Delta}{2b}} - e^{-\frac{\Delta}{2b}}}\end{align}

Where $c_1 = log\left(\frac{1}{2b}\right) - 1 + \frac{\Delta}{2b}e^{-\frac{\Delta}{2b}}$.

We see that the expressions are quite similar to the ones obtained for Exponential distribution in [2]. This is expected as Laplace and Exponential distributions are very closely related.

References:
[1] Thomas M. Cover, Joy A. Thomas, Elements of Information Theory, Second Edition, Wiley 2006.
[2] Aravindh Krishnamoorthy, Entropy of Uniformly Quantized Exponential Distribution, Applied Mathematics and Engineering Blog, October 2015.

Saturday 3 October 2015

Entropy of Uniformly Quantized Exponential Distribution

This post is about entropy of discrete stochastic variables that are derived by quantizing continuous stochastic variables. A good introduction to the concept of entropy in Information Theory is in [1].

Let $x \in \{0\} \cup \mathbb{R}^+ $ be a continuous stochastic variable with $\text{Exponential}(\lambda)$ distribution and let $\hat{x} \in \{0\} \cup \hat{X}$ such that $\hat{X} \subset \mathbb{R}^+$ be its uniformly quantized version with step size $\Delta$. We would like to find the entropy of the discrete stochastic variable $\hat{x}$, denoted by $H_\hat{x}$, which provides us an estimate of the average number of bits required to encode $\hat{x}$.

For the continuous stochastic variable $x$, entropy is not generally defined. However, differential entropy, denoted by $h_x$, plays the same role as entropy for discrete stochastic variables. A well known approximation relating differential entropy of a variable and the entropy of its quantized version is given as follows.

\begin{align} H_\hat{x} \approx h_x - \log \Delta \end{align}

Unfortunately, this approximation is only valid for small values of $\Delta$ and breaks down as $\Delta$ increases. For example, for the common case of $\Delta = 1$, the approximation predicts the differential entropy and entropy of the quantized variable to be equal. This is generally not the case.

A common workaround is to use numerical simulations to calculate the $H_\hat{x}$. However, it would be satisfying to have an analytical expression notwithstanding its practical use. As we shall see below, the expression turns out to be quite cumbersome.

Let $\varphi(x)$ and $f(\hat{x})$ denote the probability density and mass functions of $x$ and $\hat{x}$ respectively. They are related as 
\begin{align}f(\hat{x}) = \int_{\hat{x} - [\hat{x} \neq 0]\frac{\Delta}{2}}^{\hat{x} + \frac{\Delta}{2}} \varphi(x) dx\end{align}. 
The indicator function $[\hat{x} \neq 0]$ is $0$ when $\hat{x} = 0$ and $1$ otherwise, and serves to set the lower bound correctly.

Now onto calculating the negative entropy $-H_\hat{x}$ which is given (by definition) as follows. The motivation behind using negative entropy instead of entropy is to avoid (many) negative signs on the right hand side expression.

\begin{align}-H_\hat{x} &= \sum_{\hat{x} \in \{0\} \cup \hat{X}} f(\hat{x}) \log f(\hat{x}) \\ &= \left(\int_{0}^{\frac{\Delta}{2}} \varphi(x) dx \right) \log \left[\int_{0}^{\frac{\Delta}{2}} \varphi(x) dx \right] \nonumber \\ &\quad + \sum_{\hat{x} \in \hat{X}} \left( \int_{\hat{x}-\frac{\Delta}{2}}^{\hat{x}+\frac{\Delta}{2}} \varphi(x) dx \right) \log \left[\int_{\hat{x}-\frac{\Delta}{2}}^{\hat{x}+\frac{\Delta}{2}} \varphi(x) dx \right]\end{align} 

Using the definition of the continuous distribution function for Exponential distribution, $\varphi(x) = \lambda e^{-\lambda x}$, the above equation is simplified as follows.

\begin{align} -H_\hat{x} &= \left(1-e^{-\lambda\frac{\Delta}{2}}\right) \log \left[ 1-e^{-\lambda\frac{\Delta}{2}} \right] \nonumber \\ &\quad + \left(e^{\lambda\frac{\Delta}{2}} - e^{-\lambda\frac{\Delta}{2}}\right) \log \left[e^{\lambda\frac{\Delta}{2}} - e^{-\lambda\frac{\Delta}{2}} \right] u_1(\lambda) \nonumber \\ &\quad +  \left(e^{\lambda\frac{\Delta}{2}} - e^{-\lambda\frac{\Delta}{2}}\right) u_2(\lambda) \label{eqn:hh} \end{align}
With the functions $u_{1,2}(\lambda)$ given by:
\begin{align} u_1(\lambda) &= \sum_{\hat{x} \in \hat{X}} e^{-\lambda\hat{x}} \\ u_2(\lambda) &= \sum_{\hat{x} \in \hat{X}} e^{-\lambda\hat{x}} \log e^{-\lambda\hat{x}} \end{align}

APPROXIMATE RESULT


The exact values of $u_{1,2}(\lambda)$, as we shall see below, are cumbersome. Therefore, we first obtain an approximate result by plugging in the the approximations for $u_{1,2}(\lambda)$ given below into equation (\ref{eqn:hh}).

\begin{align} u_1(\lambda) &= \frac{1}{\lambda} \sum_{\hat{x} \in \hat{X}} \lambda e^{-\lambda\hat{x}} \approx   \frac{1}{\lambda} \left[ \int_{0}^{\infty} \varphi(x) dx - \varphi(0) \right] = \frac{1}{\lambda} - 1\\ u_2(\lambda) &= \sum_{\hat{x} \in \hat{X}} e^{-\lambda\hat{x}} \log e^{-\lambda\hat{x}} \approx -\frac{h_x}{\lambda} - \log(\lambda)  - \log(\lambda) u_1(\lambda)\end{align}

The performance of the above approximation may be verified numerically. In the graph below, we hold quantization step size constant at $\Delta = 1$ and vary the parameter $\lambda$. We observe that the approximation is acceptable, i.e. bit difference < 1, for the range $0 < \lambda \leq 1.5$. For higher values of $\lambda$, the approximation fails.


EXACT RESULT


The exact result, from the graph above, may be obtained by plugging in the exact values for $u_{1,2}(\lambda)$ into equation (\ref{eqn:hh}). The exact values may be computed starting from the following identities.

\begin{align} 1 &= \int_{0}^{\infty} \varphi(x) dx = \int_{0}^{\frac{\Delta}{2}} \varphi(x) dx + \sum_{\hat{x} \in \hat{X}} \int_{\hat{x} - \frac{\Delta}{2}}^{\hat{x} + \frac{\Delta}{2}} \varphi(x) dx \\ -h_x &= \int_{0}^{\infty} \varphi(x) \log \varphi(x) dx \nonumber \\ &= \int_{0}^{\frac{\Delta}{2}} \varphi(x) \log \varphi(x) dx + \sum_{\hat{x} \in \hat{X}} \int_{\hat{x} - \frac{\Delta}{2}}^{\hat{x} + \frac{\Delta}{2}} \varphi(x) \log \varphi(x) dx\end{align}

As the derivation is lengthy, we directly present the simplified result below.

\begin{align} u_1(\lambda) &= \frac{e^{-\lambda\frac{\Delta}{2}}}{e^{\lambda\frac{\Delta}{2}} - e^{-\lambda\frac{\Delta}{2}}} \\ u_2(\lambda) &= \frac{-h_x - c_1  - \frac{\Delta}{2}\lambda  \left[e^{\lambda\frac{\Delta}{2}} + e^{-\lambda\frac{\Delta}{2}}\right] u_1(\lambda)}{e^{\lambda\frac{\Delta}{2}} - e^{-\lambda\frac{\Delta}{2}}}\end{align}

Where $c_1 = log(\lambda) - 1 + \frac{\Delta}{2}\lambda e^{-\lambda\frac{\Delta}{2}}$. We see that even with a simple distribution function, the exact expressions quickly get out of hand and are unfortunately not short and elegant as one would have expected.


References:
[1] Thomas M. CoverJoy A. Thomas, Elements of Information Theory, Second Edition, Wiley 2006.