Bayesian Linear / Polynomial Regression #Part2: Deriving Predictive Distribution

We already derive the posterior update formula P(W|D) for Bayesian regression here, telling us that it is distribution of our parameter regression \textbf{W} given data set D. We are not directly interested in the value of W, but, we are interested in the value of Y itself given new value of new x. This is exactly same with regression problem, given new value x, we want to predict output value of Y, which is in continuous value mode. And we already did linear regression problem using LSE (Least Square Error) here. During this post, we will do regression from Bayesian point of view. Using Bayesian in regression, we will have additional benefit. We will see later in the end of this post.

From #Part1 here, we already get P(W|D). To do regression in Bayesian point of view, we have to derive predictive distribution, so that we will have probability of Y, P(Y|\theta). We can achieve that by doing marginalization. Here we go.

P(Y|\theta)=\int P(Y|W)P(W|\theta)dW

where P(Y|W) is likelihood and P(W|\theta) is posterior we derive here Continue reading “Bayesian Linear / Polynomial Regression #Part2: Deriving Predictive Distribution”