Deriving Gaussian Distribution

Gaussian is very important distribution. During this post, we will discuss the detail of Gaussian distribution by deriving it, calculate the integral value and do MLE (Maximum Likelihood Estimation). To derive Gaussian distribution, it is more difficult if we do it in cartesian coordinate. Thus, we will use polar coordinate. Before we derive the Gaussian using polar coordinate, let’s talk about how to change the coordinate system from cartesian to polar coordinate system first.

(1) Changing coordinate system from cartesian to polar coordinate

Changing coordinate system from catersian to polar coordinate is useful, such as when we calculate integral of certain function, in certain case, we prefer to use polar coordinate system because it will be away easier to calculate. To do that, we can use Jacobian matrix. Jacobian matrix actually defines partial derivative of a vector with respect to another vector. In our case changing cartesian coordinate to polar coordinate, the Jacobian matrix of (x,y) in cartesian coordinate with respect to (r,\theta) in polar coordinate is:

J = \begin{bmatrix} \frac{\delta x}{\delta r} \,\frac{\delta x}{\delta \theta} \\\frac{\delta y}{\delta r} \,\frac{\delta y}{\delta \theta}\end{bmatrix}

Then, changing coordinate system from cartesian coordinate to polar coordinate, it can be done by this formula.

\boxed{dxdy = |J|dr d\theta} , where |J| is the determinant of J.

Let’s do an example! See picture below.

To calculate the circle area, we can do as follows.

2 \int_{-r}^{r} y dx,  where r^2 = x^2 + y^2

2 \int_{-r}^{r} \sqrt{r^2 - x^2}dx = \pi r^2, let’s just skip the detail because we are already familiar with the circle area formula

We can calculate the circle area by using coordinate polar resulting exactly same with what we get above. See picture below.

In cartesian coordinate, we calculate by dividing the circle area into small box \Delta x \Delta y, then sum all those small box area. This is relatively difficult since we have to determine the lower bound and upper bound of the intergral w.r.t dxdy. Using polar coordinate system, we can convert our calculation by making small area shown in the picture above in the right parameterized by \Delta r \Delta \theta instead of \Delta x \Delta y. Let’s do that.

\int_{lower bound}^{upperbound}\int_{lower bound}^{upperbound}dxdy = \int_{0}^{2\pi}\int_{0}^{r}|J|drd\theta,

with J = \begin{bmatrix} \frac{\delta x}{\delta r} \,\frac{\delta x}{\delta \theta} \\\frac{\delta y}{\delta r} \,\frac{\delta y}{\delta \theta}\end{bmatrix} =J \begin{bmatrix} cos\theta \,\,\,\,-rsin\theta \\\sin\theta \,\,\,\,\,\,\,\,\,\,rcos \theta\end{bmatrix}

and |J|=rcos^2\theta+rsin^2\theta=r(cos^\theta+sin^2\theta)=r

Let’s proceed our calculation.

Deriving Gaussian Distributionarea = \int_{0}^{2\pi}\int_{0}^{r}|J|drd\theta = \int_{0}^{2\pi}\int_{0}^{r}rdrd\theta\\\\  area=\int_{0}^{2\pi}\frac{r^2}{2}d\theta=\not 2\pi\frac{r^2}{\not 2}\\\\ area = \boxed{\pi r^2}

See the result. We successfully calculate the circle area using polar coordinate easily where the final result is same with circle area formula we are familiar about.

 

(2) Deriving Gaussian distribution using polar coordinate

Let’s put the condition first. We already know that the Gaussian distribution, closer to the original point (would be in 0,0 for zero mean case), bigger its value is. In two dimensions, the distribution is shown picture below.

Another characteristic we know is that the distribution is symmetric both in axis x and y. Using these assumptions, we will try to derive what is the mathematical formula of it. Again, we will use polar coordinate to make the derivation process easier. See picture below.

This is Gaussian distribution if we see from the top, and we put polar coordinate r, \, \theta. From the picture, we can calculate that P(x)P(y)=g(r), since x and y are independent. Let’s try to differentiate w.r.t \theta in both side.

\frac{d[g(r)]}{d\theta}=\frac{d[P(x)P(y)]}{d\theta}

We know that in Gaussian distribution, g(r) only depends on r (distance from origin), and will have same value for any \theta value since r is same. Thus, the left hand side will zero, and for the right hand side, we can use differential product rule.

0=P(x)\frac{d[P(y)]}{d\theta}+P(y)\frac{d[P(x)]}{d\theta}

By substituting x=rcos\theta and y=rsin\theta, and using chain rule in differentation, we get:

0=P(x)P(y)' r cos \theta +P(y)P(x)' (-rsin \theta)

By substituting back rcos\theta=x and rsin\theta = y, we get:

0=P(x)P(y)' x - P(y)P(x)'y \\\\ \frac{P(x)'}{P(x)x} = \frac{P(y)'}{P(y)y}

In Gaussian distribution, this differential equation is true for any x and y, and x and y are independent. This can only happen if the ratio defined by the differential equation is a constant.

\frac{P(x)'}{P(x)x} = \frac{P(y)'}{P(y)y}=C

Solving \frac{P(x)'}{P(x)x} = C,

\frac{P(x)'}{P(x)x} = C\\\\ \frac{1}{P(x)x}\frac{dP(x)}{dx} = C\\\\ \frac{1}{P(x)}dP(x) = Cxdx\\\\ int \frac{1}{P(x)}dP(x)=\int Cxdx\\\\lnP(x)=\frac{C}{2}x^2\\\\P(x)=e^{\frac{C}{2}x^2}

Since this is probability distribution function, thus, it sums to 1.

A\int_{-\infty}^{\infty}e^{\frac{C}{2}x^2}dx=1

Let -k=\frac{C}{2}, then we get:

A\int_{-\infty}^{\infty}e^{-kx^2}dx=1\\\\  A\sqrt{(\int_{-\infty}^{\infty}e^{-kx^2}dx)^2}=1

Because the distribution is symmetric, and x, y are independent, thus, we can modify as follows.

A\sqrt{(\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-kx^2}dx\,e^{-ky^2}dy)}=1\\\\  A\sqrt{(\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-k(x+y)^2}dxdy)}=1

Let’s solve two those integrals using coordinate polar like we already did before.

A\sqrt{(\int_{0}^{2\pi}\int_{0}^{\infty}e^{-kr^2}|J|drd\theta)}=1

For integral w.r.t dr bounds, we use from 0 to \infty since theoretically Gaussian function reaches zero value in infinite axis. And determinant of Jacobian matrix |J| in this case is r, like what we already calculate before. Proceeding our calculation, we get:

A\sqrt{(2\pi\int_{0}^{\infty}e^{-kr^2}rdr)}=1

Let u=kr^2, \, \frac{du}{dr}=2kr, thus, we get:

A\sqrt{(\not 2\pi\int_{0}^{\infty}e^{-u}\not r\frac{du}{\not 2k\not r})}=1\\\\  A\sqrt{\pi\int_{0}^{\infty}\frac{1}{k}e^{-u}du}=1\\\\  A\sqrt{\pi\frac{-1}{k}e^{-u}|_0^\infty}=1\\\\  A\sqrt{\frac{\pi}{k}}=1\\\\  A = \sqrt{\frac{k}{\pi}}

The last, to find k, we can use constraint \int_{-\infty}^{\infty}xP(x)dx=\mu or \int_{-\infty}^{\infty}x^2P(x)dx=\sigma^2 (for zero mean). We will use constraint variance \sigma because it will be easier to derive. Here we go.

\int_{-\infty}^{\infty}x^2P(x)dx=\sigma^2\\\\  \int_{-\infty}^{\infty}x^2\sqrt{\frac{k}{\pi}}e^{-kx^2}dx=\sigma^2\\\\  \frac{k}{\pi}\int_{-\infty}^{\infty}x^2e^{-kx^2}dx=\sigma^2

Next, we will solve equation above using integral by part (\int_{a}^{b} udv = uv|_a^b-\int_{a}^{b} vdu ). Let u=x, thus du = dx. Let dv = xe^{-kx^2}dx.

\int dv=\int xe^{-kx^2}dx

Let y = kx^2 \rightarrow x = \sqrt{\frac{y}{k}},

\frac{dy}{dx}=2kx\\\\  dx=\frac{1}{2kx}dy\\\\  dx=\frac{1}{2k\sqrt{\frac{y}{k}}}dy

Thus,

\int dv=\int \sqrt{\frac{y}{k}}e^{-y}\frac{1}{2k\sqrt{\frac{y}{k}}}dy\\\\  \int dv=\int \frac{1}{2k}e^{-y}dy\\\\  \int dv=\frac{1}{2k}\int e^{-y}dy\\\\  \int dv=-\frac{1}{2k} e^{-kx^2}dy\\\\v=-\frac{1}{2k}e^{-kx^2}dy

Let’s back to solve our integral by part.

\sqrt{\frac{k}{\pi}}\int_{-\infty}^{\infty}x^2e^{-kx^2}=\sigma^2\\\\  \sqrt{\frac{k}{\pi}}(x\frac{1}{2k}e^{-kx^2}|_{-\infty}^\infty)-\int_{-\infty}^{\infty}\frac{-1}{2k}e^{-kx^2}=\sigma^2\\\\  \sqrt{\frac{k}{\pi}}(x\frac{1}{2k}e^{-kx^2}|_{-\infty}^\infty)-\frac{-1}{2k}\int_{-\infty}^{\infty}e^{-kx^2}=\sigma^2

We know from what we already get before that \int_{-\infty}^{\infty}e^{-kx^2}=\frac{1}{A}=\sqrt{\frac{\pi}{k}}, thus:

\sqrt{\frac{k}{\pi}}(0+\frac{1}{2k}\sqrt{\frac{\pi}{k}})=\sigma^2\\\\  \frac{1}{2k}=\sigma^2\\\\  k=\frac{1}{2\sigma^2}

Hooray! We already successfully calculate all we need in deriving Gaussian distribution. The last, we just plug A and k to our P(x). Here we go.

P(x)=Ae^{-kx^2}\\\\  P(x)=\sqrt{\frac{k}{\pi}}e^{-kx^2}\\\\  P(x)=\sqrt{\frac{\frac{1}{2}\sigma^2}{\pi}}e^{-\frac{1}{2\sigma^2}x^2}\\\\  P(x)=\frac{1}{\sigma}\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2\sigma^2}x^2}\\\\  \boxed{P(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2\sigma^2}x^2}}

Equation above is for mean \mu=0. For non zero \mu, we can write as follows.

\boxed{P(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2\sigma^2}(x-\mu)^2}}

Up to this point, we already successfully derive Gaussian distribution function. Congratulation!

 

(3) Integralling Gaussian function using polar coordinate

We will integralling simple Gaussian function e^{-x^2}. In this case, we ignore first the constant \frac{1}{\sqrt{2\pi}\sigma} and use zero mean. Here we go.

\int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(\int_{-\infty}^{\infty}e^{-x^2}dx)^2}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(\int_{-\infty}^{\infty}e^{-x^2}dx\int_{\infty}^{\infty}e^{-y^2}dy)}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(\int_{-\infty}^{\infty}\int_{\infty}^{\infty}e^{-(x+y)^2}dxdy)}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^2}|J|drd\theta)}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^2}rdrd\theta)}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(2\pi\int_{0}^{\infty}e^{-r^2}rdr)}

Let u=r^2, thus \frac{du}{dr}=2r \rightarrow  dr = \frac{1}{2r}du. Re-plug to our last equation, we get:

\int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(\not2 \pi\int_{0}^{\infty}e^{-u}\not r\frac{du}{\not 2\not r})}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{(\pi\int_{0}^{\infty}e^{-u}du)}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{\pi(-e^{-u})|_{0}^\infty}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{\pi-(e^{-\infty}-e^{0})}\\\\  \int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{\pi-(0-1)}\\\\  \boxed{\int_{-\infty}^{\infty}e^{-x^2}dx=\sqrt{\pi}}

We’ve made it! We just successfully derived given Gaussian distribution.

(4) Maximum Likelihood Estimation of Gaussian distribution

Like we already did before in Bernoulli and Beta distribution here, we will also do MLE in this Gaussian distribution discussion. We will try to estimate mean (\mu) and variance (\sigma^2) that maximizes the likelihood.

Let say we have trial result D=\begin{Bmatrix} x1,x2,x3,...,x_n\end{Bmatrix}. What are the \mu and \sigma^2 that maximize the likelihood? Let’s do that.

The likelihood of Gaussian distribution is defined below.

P(D|\theta)=\prod_{i=1}^{n}P(x_i|\theta)=\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x_i-\mu)^2}{2\sigma^2}}

To maximize this, as usual, we will take first differential and make it equal to zero. But, before it, to make it easier, let’s do in log form.

\underset{\theta}{argmax}[lnP(D|\theta)]=\sum_{i=1}^{n}ln(\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x_i-\mu)^2}{2\sigma^2}})\\\\  \underset{\theta}{argmax}[lnP(D|\theta)]=\sum_{i=1}^{n}ln(\frac{1}{\sqrt{2\pi}\sigma})+\sum_{i=1}^{n}log(e^{\frac{-(x_i-\mu)^2}{2\sigma^2}})\\\\  \underset{\theta}{argmax}[lnP(D|\theta)]=n\,ln(\frac{1}{\sqrt{2\pi}\sigma})+\sum_{i=1}^{n}\frac{-(x_i-\mu)^2}{2\sigma^2}ln(e)\\\\  \underset{\theta}{argmax}[lnP(D|\theta)]=n\,ln(2\pi\sigma^2)^{-\frac{1}{2}}-\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}\\\\  \underset{\theta}{argmax}[lnP(D|\theta)]=-\frac{n}{2}\,ln(2\pi\sigma^2)-\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}

First, let’s derive \mu that maximizes the likelihood of Gaussian distribution. To do this, again, we can take the first differential w.r.t \mu, make it equal to zero. Here we go.

\mu_{_{MLE}} \rightarrow \frac{\delta [log\, likelihood]}{\delta \mu}=\frac{\delta[-\frac{n}{2}\,ln(2\pi\sigma^2)]}{\delta \mu}-\frac{\delta[\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}]}{\delta \mu}=0\\\\  0-\frac{1}{2\sigma^2}\frac{\delta[\sum_{i=1}^{n}(x_i^2-2x_i\mu+\mu^2)]}{\delta \mu}=0\\\\  \sum_{i=1}^{n}(2\mu-2x_i)=0\\\\  \boxed{\mu_{_{MLE}}=\frac{\sum_{i=1}^{n}x_i}{n}}

See. Equation above is exactly same with what we already know about calculating mean value. Next, we will try to calculate variance. To do this, we take the first differential w.r.t \sigma^2, and make it equal to zero. Let’s do that.

\sigma^2_{_{MLE}} \rightarrow \frac{\delta log [likelihood]}{\delta \sigma^2}=\frac{\delta[-\frac{n}{2}\,ln(2\pi\sigma^2)]}{\delta \sigma^2}-\frac{\delta[\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}]}{\delta \sigma^2}=0\\\\  \frac{\delta[-\frac{n}{2}\,ln(2\pi)]}{\delta \sigma^2}+\frac{\delta[-\frac{n}{2}\,ln(\sigma^2)]}{\delta \sigma^2}-\frac{\delta[(\sigma^2)^{-1}\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2}]}{\delta \sigma^2}=0\\\\  0+-(\sigma^2)^{-1}\frac{n}{2}\,ln(e)+(\sigma^2)^{-2}\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2}=0

Multiplying both side with (\sigma^2)^2, we get:

-\sigma^2\frac{n}{2}\,+\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2}=0\\\\  \sigma^2\frac{n}{\not 2}=\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{\not 2}\\\\  \boxed{\sigma^2=\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{n}}

Variance formula we get via MLE is also exactly same with the variance formula what we already know about. Up to this point, we already know that given  trials data experiment, we can fit the probability distribution into Gaussian distribution by parameter \mu and \sigma^2 using formula we already derive via MLE in this post.

 

 

One thought on “Deriving Gaussian Distribution

Leave a comment