online learning – Ardian Umam blog

We already discuss how online learning works here using Conjugate distributions with Binomial distribution as the likelihood and Beta distribution as conjugate prior distribution. During this post, we will try to use Gaussian distribution for online learning in Bayesian inference. Conjugate prior of Gaussian distribution is Gaussian itself. That’s why we call Gaussian distribution self-conjugate. Let’s try to derive it.

Given trial result $D=\begin{Bmatrix} x_1,x_2,x_3,... ,x_n\end{Bmatrix}$ , from Bayesian formula we get:

$P(\theta_{new}|D)=\frac{P(D|\theta_{old})P(\theta_{old})}{P(D)}$

We will try to derive the posterior $P(\theta_{new}|D)$ , given likelihood $P(D|\theta_{old})$ and prior distribution $P(\theta_{old})$ . Parameter $\theta$ of Gaussian in this case are $\mu$ and $\sigma^2$ . In this post, we will demonstrate how to calculate posterior $P(\theta_{new}|D)$ under the assumption that $\sigma_{new}^2, \mu_{old}, \sigma_{old}^2$ are know. Thus, we will only learn parameter $\mu$ . We will ignore marginal probability $P(D)$ first, since it is only constant value for normalization. Proceeding our formula above, we can do as follows. Continue reading “Using Gaussian Distribution for Online Learning/Sequential Learning in Bayesian Inference” →

After we understand the concept of Bernoulli, Binomial and Beta distribution we discuss here, we are ready to understand online learning used in Bayesian inference now. In Bayesian theorem we discuss here, we have equation below.

$P(B|A)=\frac{P(A|B)P(B)}{P(A)}$

And for multi classes $c_1, \,c_2, ... , \,c_n$ with multi attributes $\theta_1, \,\theta_2,... ,\,\theta_n$ , we can write as follow.

$P(c_i|\theta_1,\,\theta_2,\,... ,\,\theta_n)=\frac{P(\theta_1,\,\theta_2,\,... ,\,\theta_n|c_i)P(c_i)}{P(\theta_1,\,\theta_2,\,... ,\,\theta_n)}$

Using rule of sum with $m$ is the number of classes, we can change the denominator becomes:

$P(c_i|\theta_1,\,\theta_2,\,... ,\,\theta_n)=\frac{P(\theta_1,\,\theta_2,\,... ,\,\theta_n|c_i)P(c_i)}{\sum_{i=1}^{m}P(\theta_1,\,\theta_2,\,... ,\,\theta_n)P(c_i)dc_i}$ , for discrete system

$P(c_i|\theta_1,\,\theta_2,\,... ,\,\theta_n)=\frac{P(\theta_1,\,\theta_2,\,... ,\,\theta_n|c_i)P(c_i)}{\int_{}^{}P(\theta_1,\,\theta_2,\,... ,\,\theta_n)P(c_i)dc_i}$ , for continuous system

Here, we can say $P(c_i|\theta_1,\,\theta_2,\,... ,\,\theta_n)$ is posterior probability, $P(\theta_1,\,\theta_2,\,... ,\,\theta_n|c_i)$ is likelihood, $P(c_i)$ is posterior probability, and $\sum_{i=1}^{m}P(\theta_1,\,\theta_2,\,... ,\,\theta_n)P(c_i)dc_i$ is evidence or marginal probability.

In online learning, we will update our prior probability when we do some new trials. For example, in the first stage, we do some tossing coin, and we model the prior probability with $P_1(X)$ . And at this point, we will use prior probability $P_1(X)$ to estimate our posterior probability. Let it (posterior probability) be $P_2(X)$ . In the next stage trials, we will use $P_2(X)$ as our prior probability to estimate our next posterior probability. And we will continue to do this when we do some trials again. That’s why we call this online learning. Some references also call it sequential learning. Continue reading “Understanding Online/Sequential Learning in Bayesian Inference” →