We already derive the posterior update formula for Bayesian regression here, telling us that it is distribution of our parameter regression given data set . We are not directly interested in the value of , but, we are interested in the value of itself given new value of new . This is exactly same with regression problem, given new value , we want to predict output value of , which is in continuous value mode. And we already did linear regression problem using LSE (Least Square Error) here. During this post, we will do regression from Bayesian point of view. Using Bayesian in regression, we will have additional benefit. We will see later in the end of this post.
From #Part1 here, we already get . To do regression in Bayesian point of view, we have to derive predictive distribution, so that we will have probability of , . We can achieve that by doing marginalization. Here we go.
where is likelihood and is posterior we derive here Continue reading “Bayesian Linear / Polynomial Regression #Part2: Deriving Predictive Distribution”
In the beginning of our article series, we already talk about how to derive polynomial regression using LSE (Linear Square Estimation) here. During this post, we will try to discuss linear regression from Bayesian point of view. Note that linear and polynomial regression here are similar in derivation, the difference is only in design matrix. You may check again our couple previous articles here and here.
I let you know in the beginning that the final result of deriving regression using LSE is equal to the result of deriving linear regression using MLE (Maximal Likelihood Estimation) in Bayesian method. Furthermore, the result of deriving regression using LSE with regularization is equal to the result of deriving using MAP (Maximum A Posteriori) in Bayesian method. During this post, we will try to prove it. And we will proceed to derive the posterior update formula for online learning using Conjugate prior.
(1) Regression using LSE = MLE Bayesian?
See picture below.
Continue reading “Bayesian Linear / Polynomial Regression #Part1: Prove LSE vs Bayesian Regression and Derive Posterior Update”
After we understand the concept of Bernoulli, Binomial and Beta distribution we discuss here, we are ready to understand online learning used in Bayesian inference now. In Bayesian theorem we discuss here, we have equation below.
And for multi classes with multi attributes , we can write as follow.
Using rule of sum with is the number of classes, we can change the denominator becomes:
, for discrete system
, for continuous system
Here, we can say is posterior probability, is likelihood, is posterior probability, and is evidence or marginal probability.
In online learning, we will update our prior probability when we do some new trials. For example, in the first stage, we do some tossing coin, and we model the prior probability with . And at this point, we will use prior probability to estimate our posterior probability. Let it (posterior probability) be . In the next stage trials, we will use as our prior probability to estimate our next posterior probability. And we will continue to do this when we do some trials again. That’s why we call this online learning. Some references also call it sequential learning. Continue reading “Understanding Online/Sequential Learning in Bayesian Inference”