# Introduction to Probability and Random Variable

## (1) Introduction

Probability plays important role in machine learning. Thus, during this article, we will introduce you about probability also random variable. Suppose we have one trail tossing coin resulting H=head and T=tail. Thus, our sample space is $u=\begin{Bmatrix}H, T\end{Bmatrix}$, where $u$ is universe. And subset of all outcomes, we call it as event. For instance, in our one trail tossing coin, the event is a trail that gives H as a result, and so on.

## (2) What is random variable?

Then, what is random variable? Actually, random variable is not a variable. It is a function that maps event to number. For example from two trails of tossing coin, we will have $u=\begin{Bmatrix}{HH, HT, TH, TT}\end{Bmatrix}$. We map it using random variable as we want, e.g : $X=\begin{Bmatrix}{0,1,2,4}\end{Bmatrix}$ where $X=0$ represents HH, $X=1$ represents HT, and so on.

## (3) Define probability in random variable notation

As for what probability is, we already familiar about that, which is frequency of an event divided by the number of universe member. But, let’s we write using random variable notation. Using case two trails of tossing coin above, $P(X=0)=\frac{the\, number\, of\, event\, HH}{the\, number\, of\, universe\, member}=\frac{1}{4}$. Likewise for other events. Or if the trails already occur, we can use “number total trails” as the denominator, instead of number of universe member.

## (4) Probability distribution and cumulative distribution

Let’s say we do 100 times trails, and for example we get $P(X=0)=0.2$$P(X=1)=0.3$$P(X=3)=0.4$, and $P(X=4)=0.1$ We can draw the probability distribution as follows.This discrete probability, we call it PMF (Probability Mass Function). If we do a number of trails limit to infinity, we will get continuous distribution. For example like picture below.This continuous probability,we call it PDF (Probability Density Function), dented as $P(x)$. If we take intertegral of $P(x)$, we will get so-called CDF (Cumulative Density Function). Here is a sample of CDF picture.

$\boxed{CDF = F_X(x)=\int_{-\infty}^{+\infty}P(x)dx}$

## (5) Joint probability

For example we have two event, A and B shown in picture below. The shaded area is area that event A and B occur together. Probability when event A and B occur together is called joint probability, denoted by $P(A \cap B)$

## (6) Expectation

Expected value of a random variable $X$, denoted as $E[X]$, is the long-run average value of repetitions of trails. We can say, it’s the mean ($\mu$) value of trails, which is $\mu=\frac{\sum_{i=1}^{n}x_i}{n}$, where $x_i$ is the value of random variable $i^{th}$, and $n$ is the number of trails. We can say like this if we assume that the PDF is uniform distribution. But, if the PDF is not a uniform distribution, the expected value is defined as follows.

$\boxed{E[X]=\sum_{i=-\infty}^{+\infty}x_iP(x_i)}$, for discrete system

$\boxed{E[X]=\int_{-\infty}^{+\infty}xP(x)}dx$, for discrete system

## (7) Variance

Variance is a parameter that determine the dispersion of probability distribution. See picture below. Larger the variance value $\sigma$, more disperse it is

strength of the correlation between two or more sets of random variate

Mathematically, variance is defined as follows.

$\boxed {\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i-E[X])^2}$

By doing some algebras, actually we can simplify it. Here we go.

$\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i-E[X])^2\\\\ \sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i^2-2x_iE[X]+E^2[X])\\\\ \sigma^2 =\frac{1}{n}\sum_{i=1}^{n}x_i^2-2E[X]\frac{\sum_{i=1}^{n}x_i}{n}+\frac{1}{n}\sum_{i=1}^{n}E^2[X]\\\\ \sigma^2 =E[X^2]-2E[X]E[X]+\frac{1}{\not n}\not nE^2[X]\\\\ \sigma^2 =E[X^2]-2E^2[X]+E^2[X]\\\\ \boxed {\sigma^2 =E[X^2]-E^2[X]}$

By using our last equation, we get benefit that we can calculate variance value realtime. Whereas, if we use the original definition of variance, we cannot calculate realtime, because we have to calculate the mean ($E[X]$) first, afterward, we can calculate the variance.

## (8) Covariance

Covariance is useful to measure how strength the correlation between two or more sets of random variables. If we define variance $\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i-E[X])^2 =\frac{1}{n}\sum_{i=1}^{n}(x_i-E[X])(x_i-E[X])$, to easily remember the covariance formula, we can just change the variable $x$ in the second term to another variable, and the equation will be as follows.

$\boxed {Cov(X, Y) = \frac{1}{n}\sum_{i=1}^{n}(x_i-E[X])(y_i-E[Y])}$

Equation above measures the covariance between random variable $X$ and random variable $Y$.