Deriving Gaussian Distribution

Gaussian is very important distribution. During this post, we will discuss the detail of Gaussian distribution by deriving it, calculate the integral value and do MLE (Maximum Likelihood Estimation). To derive Gaussian distribution, it is more difficult if we do it in cartesian coordinate. Thus, we will use polar coordinate. Before we derive the Gaussian using polar coordinate, let’s talk about how to change the coordinate system from cartesian to polar coordinate system first.

(1) Changing coordinate system from cartesian to polar coordinate

Changing coordinate system from catersian to polar coordinate is useful, such as when we calculate integral of certain function, in certain case, we prefer to use polar coordinate system because it will be away easier to calculate. To do that, we can use Jacobian matrix. Jacobian matrix actually defines partial derivative of a vector with respect to another vector. In our case changing cartesian coordinate to polar coordinate, the Jacobian matrix of (x,y) in cartesian coordinate with respect to (r,\theta) in polar coordinate is:

J = \begin{bmatrix} \frac{\delta x}{\delta r} \,\frac{\delta x}{\delta \theta} \\\frac{\delta y}{\delta r} \,\frac{\delta y}{\delta \theta}\end{bmatrix} Continue reading “Deriving Gaussian Distribution”

Understanding Online/Sequential Learning in Bayesian Inference

After we understand the concept of Bernoulli, Binomial and Beta distribution we discuss here, we are ready to understand online learning used in Bayesian inference now. In Bayesian theorem we discuss here, we have equation below.

P(B|A)=\frac{P(A|B)P(B)}{P(A)}

And for multi classes c_1, \,c_2, ..., \,c_n with multi attributes \theta_1, \,\theta_2,...,\,\theta_n, we can write as follow.

P(c_i|\theta_1,\,\theta_2,\,...,\,\theta_n)=\frac{P(\theta_1,\,\theta_2,\,...,\,\theta_n|c_i)P(c_i)}{P(\theta_1,\,\theta_2,\,...,\,\theta_n)}

Using rule of sum with m is the number of classes, we can change the denominator becomes:

P(c_i|\theta_1,\,\theta_2,\,...,\,\theta_n)=\frac{P(\theta_1,\,\theta_2,\,...,\,\theta_n|c_i)P(c_i)}{\sum_{i=1}^{m}P(\theta_1,\,\theta_2,\,...,\,\theta_n)P(c_i)dc_i},  for discrete system

P(c_i|\theta_1,\,\theta_2,\,...,\,\theta_n)=\frac{P(\theta_1,\,\theta_2,\,...,\,\theta_n|c_i)P(c_i)}{\int_{}^{}P(\theta_1,\,\theta_2,\,...,\,\theta_n)P(c_i)dc_i},  for continuous system

Here, we can say P(c_i|\theta_1,\,\theta_2,\,...,\,\theta_n) is posterior probability, P(\theta_1,\,\theta_2,\,...,\,\theta_n|c_i) is likelihood,  P(c_i) is posterior probability, and \sum_{i=1}^{m}P(\theta_1,\,\theta_2,\,...,\,\theta_n)P(c_i)dc_i is evidence or marginal probability.

In online learning, we will update our prior probability when we do some new trials. For example, in the first stage, we do some tossing coin, and we model the prior probability with P_1(X). And at this point, we will use prior probability P_1(X) to estimate our posterior probability. Let it (posterior probability) be P_2(X). In the next stage trials, we will use P_2(X) as our prior probability to estimate our next posterior probability. And we will continue to do this when we do some trials again. That’s why we call this online learning. Some references also call it sequential learning. Continue reading “Understanding Online/Sequential Learning in Bayesian Inference”