We already derive the posterior update formula for Bayesian regression here, telling us that it is distribution of our parameter regression given data set . We are not directly interested in the value of , but, we are interested in the value of itself given new value of new . This is exactly same with regression problem, given new value , we want to predict output value of , which is in continuous value mode. And we already did linear regression problem using LSE (Least Square Error) here. During this post, we will do regression from Bayesian point of view. Using Bayesian in regression, we will have additional benefit. We will see later in the end of this post.
From #Part1 here, we already get . To do regression in Bayesian point of view, we have to derive predictive distribution, so that we will have probability of , . We can achieve that by doing marginalization. Here we go.
where is likelihood and is posterior we derive here
Equation above is rule of sum (term used in Bishop text, or also called law of total probability) in Bayesian formula. Parameters in our likelihood is , and for our posterior are . Let’s write in our equation first.
Form equation above, let’s complete our derivation.
We will use similar technique we already used before in multiplying two Gaussians, which is “completing the square”. But, because our probability is , we need to collect the terms regarding coefficients, not . To do this, we have to modify a little bit. Let , and .
We know that , since it’s a Gaussian probability distribution. Thus, our last formula becomes:
We will remove since we don’t really care constant value in Gaussian probability. Parameters we care are only mean and variance, and once we get them, the Gaussian function is already normalized (integrates to 1). Putting and we defined before, we get:
The last line we get because is symmetric so that . And . Proceeding our derivation, we get:
In last formula above we already successfully gather term coefficients regarding . Let’s do “completing the square” now. Our new probability .
By comparing coefficient in , we can get our .
And by comparing coefficient in , we can get our .
We still have to calculate , where . By using Sherman-Morrison formula, . Doing some algrebra operations, our and become: