(1) Introduction
Probability plays important role in machine learning. Thus, during this article, we will introduce you about probability also random variable. Suppose we have one trail tossing coin resulting H=head and T=tail. Thus, our sample space is , where
is universe. And subset of all outcomes, we call it as event. For instance, in our one trail tossing coin, the event is a trail that gives H as a result, and so on.
(2) What is random variable?
Then, what is random variable? Actually, random variable is not a variable. It is a function that maps event to number. For example from two trails of tossing coin, we will have . We map it using random variable as we want, e.g :
where
represents HH,
represents HT, and so on.
(3) Define probability in random variable notation
As for what probability is, we already familiar about that, which is frequency of an event divided by the number of universe member. But, let’s we write using random variable notation. Using case two trails of tossing coin above, . Likewise for other events. Or if the trails already occur, we can use “number total trails” as the denominator, instead of number of universe member.
(4) Probability distribution and cumulative distribution
Let’s say we do 100 times trails, and for example we get ,
,
, and
We can draw the probability distribution as follows.
This discrete probability, we call it PMF (Probability Mass Function). If we do a number of trails limit to infinity, we will get continuous distribution. For example like picture below.
This continuous probability,we call it PDF (Probability Density Function), dented as
. If we take intertegral of
, we will get so-called CDF (Cumulative Density Function). Here is a sample of CDF picture.
(5) Joint probability
For example we have two event, A and B shown in picture below. The shaded area is area that event A and B occur together. Probability when event A and B occur together is called joint probability, denoted by .
(6) Expectation
Expected value of a random variable , denoted as
, is the long-run average value of repetitions of trails. We can say, it’s the mean (
) value of trails, which is
, where
is the value of random variable
, and
is the number of trails. We can say like this if we assume that the PDF is uniform distribution. But, if the PDF is not a uniform distribution, the expected value is defined as follows.
, for discrete system
, for discrete system
(7) Variance
Variance is a parameter that determine the dispersion of probability distribution. See picture below. Larger the variance value , more disperse it is

strength of the correlation between two or more sets of random variate
Mathematically, variance is defined as follows.
By doing some algebras, actually we can simplify it. Here we go.
By using our last equation, we get benefit that we can calculate variance value realtime. Whereas, if we use the original definition of variance, we cannot calculate realtime, because we have to calculate the mean () first, afterward, we can calculate the variance.
(8) Covariance
Covariance is useful to measure how strength the correlation between two or more sets of random variables. If we define variance , to easily remember the covariance formula, we can just change the variable
in the second term to another variable, and the equation will be as follows.
Equation above measures the covariance between random variable and random variable
.