Probability and Sampling FundamentalsWeek 8: The Multivariate Normal Distribution and Large Sample Theory
The multivariate normal distribution and large sample theory
In Week 7 we learned about bivariate continuous random variables with a brief discussion on how to extend these ideas to a continuous random vector. The first part of this week's material will introduce the most commonly encountered standard distribution for a continuous random vector, the multivariate normal distribution, with a particular focus on the bivariate case. The second part will introduce some large sample theory including the law of large numbers and one of the most important theorems in probability, the central limit theorem.
Week 8 learning material aims
The material in week 8 covers:
the multivariate normal distribution;
calculating marginal and conditional distributions for the bivariate normal;
the weak law of large numbers;
the central limit theorem;
the normal approximation to the binomial and Poisson distribution.
Vector-matrix notation for expected value and variance
In Week 4 and Week 7 we have looked at the expected value ("mean"), variance and covariance of random variables. In this section we will introduce the vector-matrix notation for the mean and (co)variance.
Suppose we have two random variables and and their expected values are given by and .
We can think of and as being part of a vector . Similarly, we can arrange the expected values into a vector
We can arrange the variances and as well as the covariance into a symmetric matrix, called the variance matrix or covariance matrix (or even sometimes variance-covariance matrix):
Notice that the values in the top right and bottom left of the matrix are the same, this is because the order of arguments does not matter for covariance, i.e. .
This vector-matrix notation comes in especially handy when we look at linear transformations of random vectors.
In Week 4, we have seen that for univariate random variables X and Y, the linear function has expected value and variance
where and . You might, at this stage be wondering, why we have written the variance of as , but we will see that the formula for the multivariate case will have that form.
Let's go back to the vector and define a matrix and a vector .
We can then define a random vector as a linear function
or, equivalently
Then, the expected value and variance are given by
The above formulae hold for vectors and of any dimension, not just bivariate vectors.
Note that matrix multiplication is not commutative (i.e. the order in the multiplication matters), i.e. we cannot write the covariance matrix as , which would be more similar to the formula for the univariate case.
Supplement 1
Derivation of the formulae
We can work out the expected value, variances and covariance of and using the rules we have learned in Week 7.
For the expected values,
We can recognise that this is the same as
which means nothing other than
Let's now turn to the variances and covariances. These are a lot more complicated, so let's only work out one variance.
If we compare this to the top-left entry of
we can observe that these are the same. Hence,
The multivariate normal distribution
In Week 6 we learned about the univariate normal distribution. There is also a multivariate family of normal distributions. To begin with, let's consider the bivariate case. A univariate normal distribution has one random variable; a bivariate normal distribution is made up of two random variables. The two variables in the bivariate normal are both normally distributed, and they have a normal distribution when they are added together. Let's consider the bivariate normal distribution using the following example.
Example 1
Karl Pearson, a very famous statistician, analysed 1078 pairs of heights of fathers, and their adult sons in inches. Let
Let's assume that a bivariate normal distribution is appropriate for these data and the corresponding parameters are
where is the mean height for fathers, is the mean height for adult sons, is the standard deviation for the height of fathers and is the standard deviation for the height of adult sons.
In general, to characterise the bivariate normal distribution, we need the following parameters:
the mean and variance for and . These can be denoted as , and
the covariance between and , denoted as .
So we need a total of 5 parameters, however only one of these parameters, the covariance , is needed to specify the dependence between the two random variables.
Rather than list all these parameters separately, it is more convenient and useful for calculations to write these in the vector matrix notation we have just seen, where we have a mean vector and a covariance matrix
The covariance matrix can also be written in terms of the correlation . In Week 4 we defined correlation as
Which is equivalent to
We can rearrange this to be
We can now write that
If the random vector follows a bivariate (or multivariate normal) with mean vector and covariance matrix .
Example 2
In Example 1, say we are also told that . So the mean vector and the covariance matrix in this example would be
Since
The vector matrix notation shown above makes it easier to generalise to more than two random variables. In general, the multivariate normal distribution (MVN) is made up of random variables and has some generic dimensional mean vector and covariance matrix .
In Week 6 we discussed the characteristic bell-shaped curve of the normal density curve. The contour lines of the joint density of multivariate normal distributions have a characteristic elliptical shape to them. Below are some examples of bivariate random variables where upper X 1 and upper X 2 both follow standard normal distributions (upper N left parenthesis 0 comma 1 right parenthesis) with varying amounts of correlation. The contours on the plot are in fact ellipses (for a two-dimensional MVN) centered on bold italic mu equals left parenthesis 0 comma 0 right parenthesis (in red). The elliptical regions moving outwards from the centre contain, respectively, 50%, 90%, 95% and 99% of the total probability.
If we plot the data in Example 1 we can see the characteristic elliptical shape. Here, there is clearly a positive relationship between the father's and son's heights.
Probability density function of multivariate normal
Let's now define the probability density function for a multivariate normal distribution.
Definition 1
Multivariate normal p.d.f.
Suppose that the random variable bold italic upper X can take any real value and that bold italic upper X has the p.d.f.
f Subscript bold italic upper X Baseline left parenthesis bold italic x right parenthesis equals StartFraction 1 Over left parenthesis 2 pi right parenthesis Superscript p divided by 2 Baseline StartAbsoluteValue bold upper Sigma EndAbsoluteValue Superscript 1 divided by 2 Baseline EndFraction exp left parenthesis minus StartFraction left parenthesis bold italic x minus bold italic mu right parenthesis Superscript down tack Baseline bold upper Sigma Superscript negative 1 Baseline left parenthesis bold italic x minus bold italic mu right parenthesis Over 2 EndFraction right parenthesis
for all bold italic x element of double struck upper R Superscript p, then bold italic upper X is said to have a multivariate normal distribution, with mean upper E left parenthesis bold italic upper X right parenthesis equals bold italic mu and (co)variance matrix Var left parenthesis bold italic upper X right parenthesis equals bold upper Sigma, written
bold italic upper X tilde upper N left parenthesis bold italic mu comma bold upper Sigma right parenthesis period
Note:
StartAbsoluteValue bold upper Sigma EndAbsoluteValue corresponds to the determinant of bold upper Sigma and
bold upper Sigma Superscript negative 1 refers to the inverse of bold upper Sigma.
Example 4
Suppose that bold italic upper X tilde upper N left parenthesis bold italic mu comma bold upper Sigma right parenthesis with
bold italic mu equals StartBinomialOrMatrix 1 Choose 2 EndBinomialOrMatrixbold upper Sigma equals Start 2 By 2 Matrix 1st Row 1st Column 5 2nd Column 2 2nd Row 1st Column 2 2nd Column 4 EndMatrix
What is the p.d.f. of bold italic upper X?
Answer:
First, StartAbsoluteValue bold upper Sigma EndAbsoluteValue equals 5 times 4 minus left parenthesis 2 right parenthesis times left parenthesis 2 right parenthesis equals 16 and
bold upper Sigma Superscript negative 1 Baseline equals one sixteenth Start 2 By 2 Matrix 1st Row 1st Column 4 2nd Column negative 2 2nd Row 1st Column negative 2 2nd Column 5 EndMatrix period
Then
StartLayout 1st Row 1st Column f Subscript bold italic upper X Baseline left parenthesis bold italic upper X right parenthesis 2nd Column equals left parenthesis 2 pi right parenthesis Superscript negative 1 Baseline left parenthesis 16 right parenthesis Superscript negative one half Baseline exp left bracket minus one half Start 1 By 2 Matrix 1st Row 1st Column x 1 minus 1 2nd Column x 2 minus 2 EndMatrix one sixteenth Start 2 By 2 Matrix 1st Row 1st Column 4 2nd Column negative 2 2nd Row 1st Column negative 2 2nd Column 5 EndMatrix StartBinomialOrMatrix x 1 minus 1 Choose x 2 minus 2 EndBinomialOrMatrix right bracket 2nd Row 1st Column Blank 2nd Column equals StartFraction 1 Over 8 pi EndFraction exp left bracket minus one thirty second left parenthesis 4 left bracket x 1 minus 1 right bracket squared plus 5 left bracket x 2 minus 2 right bracket squared minus 4 left bracket x 1 minus 1 right bracket left bracket x 2 minus 2 right bracket right parenthesis right bracket 3rd Row 1st Column Blank 2nd Column equals StartFraction 1 Over 8 pi EndFraction exp left bracket minus one thirty second left parenthesis 4 x 1 squared plus 5 x 2 squared minus 16 x 2 minus 4 x 1 x 2 plus 16 right parenthesis right bracket period EndLayout
Linear functions
In Week 6 we have seen that if upper X has a univariate normal distribution, upper X tilde sans serif upper N left parenthesis mu comma sigma squared right parenthesis then a upper X plus b tilde sans serif upper N left parenthesis a mu plus b comma a squared sigma squared right parenthesis. A similar property holds for the multivariate normal distribution.
Proposition 1
Linear functions of MVN
Suppose bold italic upper X tilde upper N left parenthesis bold italic mu comma bold upper Sigma right parenthesis, then
bold italic upper A bold italic upper X plus bold italic b tilde upper N left parenthesis bold italic upper A bold italic mu plus bold italic b comma bold italic upper A bold upper Sigma bold italic upper A Superscript down tack Baseline right parenthesis period
In other words, all this means is that linear functions of a normally distributed random vector are again normally distributed, just like in the univariate case.
Again just like in the univariate case, we can standardise a multivariate normal distribution.
If we choose bold upper Sigma Superscript negative one half to be the inverse matrix square root of bold upper Sigma, i.e. bold upper Sigma Superscript negative 1 Baseline equals bold upper Sigma Superscript negative one half Baseline bold upper Sigma Superscript negative one half Baseline Superscript down tack then if
bold upper X tilde sans serif upper N left parenthesis bold italic mu comma bold upper Sigma right parenthesis we can standardise bold upper X as follows,
bold upper Sigma Superscript negative one half Baseline left parenthesis bold upper X minus bold italic mu right parenthesis tilde sans serif upper N left parenthesis bold 0 comma bold upper I right parenthesis period
Task 1
Suppose that the continuous random vector
StartLayout 1st Row bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix 0 Choose 0 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column negative one half 2nd Row 1st Column negative one half 2nd Column 1 EndMatrix right parenthesis period EndLayout
Identify the distributions of:
(i) upper X 1 plus upper X 2,
(ii) upper X 1 minus upper X 2,
(iii) upper X 2 minus upper X 1,
(iv) 3 upper X 1 minus 2 upper X 2 plus 1.
Show answer
(i) upper X 1 plus upper X 2 equals Start 1 By 1 Matrix 1st Row 1 1 EndMatrix StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrixStartLayout 1st Row 1st Column upper E left parenthesis upper X 1 plus upper X 2 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row 1 1 EndMatrix bold italic mu 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row 1 1 EndMatrix StartBinomialOrMatrix 0 Choose 0 EndBinomialOrMatrix 3rd Row 1st Column Blank 2nd Column equals 0 EndLayout
StartLayout 1st Row 1st Column Var left parenthesis upper X 1 plus upper X 2 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row 1 1 EndMatrix bold upper Sigma StartBinomialOrMatrix 1 Choose 1 EndBinomialOrMatrix 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row 1 1 EndMatrix Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column negative one half 2nd Row 1st Column negative one half 2nd Column 1 EndMatrix StartBinomialOrMatrix 1 Choose 1 EndBinomialOrMatrix 3rd Row 1st Column Blank 2nd Column equals 1 EndLayout
Using Proposition 1, then, upper X 1 plus upper X 2 tilde upper N left parenthesis 0 comma 1 right parenthesis period
(ii) upper X 1 minus upper X 2 equals Start 1 By 1 Matrix 1st Row 1 negative 1 EndMatrix StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrixStartLayout 1st Row 1st Column upper E left parenthesis upper X 1 minus upper X 2 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row 1 negative 1 EndMatrix bold italic mu 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row 1 negative 1 EndMatrix StartBinomialOrMatrix 0 Choose 0 EndBinomialOrMatrix 3rd Row 1st Column Blank 2nd Column equals 0 EndLayout
StartLayout 1st Row 1st Column Var left parenthesis upper X 1 minus upper X 2 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row 1 negative 1 EndMatrix bold upper Sigma StartBinomialOrMatrix 1 Choose negative 1 EndBinomialOrMatrix 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row 1 negative 1 EndMatrix Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column negative one half 2nd Row 1st Column negative one half 2nd Column 1 EndMatrix StartBinomialOrMatrix 1 Choose negative 1 EndBinomialOrMatrix 3rd Row 1st Column Blank 2nd Column equals 3 EndLayout
Using Proposition 1, then, upper X 1 minus upper X 2 tilde upper N left parenthesis 0 comma 3 right parenthesis period
(iii) upper X 2 minus upper X 1 equals Start 1 By 1 Matrix 1st Row negative 1 1 EndMatrix StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrixStartLayout 1st Row 1st Column upper E left parenthesis upper X 2 minus upper X 1 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row negative 1 1 EndMatrix bold italic mu 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row negative 1 1 EndMatrix StartBinomialOrMatrix 0 Choose 0 EndBinomialOrMatrix 3rd Row 1st Column Blank 2nd Column equals 0 EndLayout
StartLayout 1st Row 1st Column Var left parenthesis upper X 2 minus upper X 1 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row negative 1 1 EndMatrix bold upper Sigma StartBinomialOrMatrix negative 1 Choose 1 EndBinomialOrMatrix 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row negative 1 1 EndMatrix Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column negative one half 2nd Row 1st Column negative one half 2nd Column 1 EndMatrix StartBinomialOrMatrix negative 1 Choose 1 EndBinomialOrMatrix 3rd Row 1st Column Blank 2nd Column equals 3 EndLayout
Using Proposition 1, then, upper X 2 minus upper X 1 tilde upper N left parenthesis 0 comma 3 right parenthesis period
(iv) 3 upper X 1 minus 2 upper X 2 plus 1 equals Start 1 By 1 Matrix 1st Row 3 negative 2 EndMatrix StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix plus 1StartLayout 1st Row 1st Column upper E left parenthesis 3 upper X 1 minus 2 upper X 2 plus 1 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row 3 negative 2 EndMatrix bold italic mu plus 1 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row 3 negative 2 EndMatrix StartBinomialOrMatrix 0 Choose 0 EndBinomialOrMatrix plus 1 3rd Row 1st Column Blank 2nd Column equals 1 EndLayout
StartLayout 1st Row 1st Column Var left parenthesis 3 upper X 1 minus 2 upper X 2 plus 1 right parenthesis 2nd Column equals Start 1 By 1 Matrix 1st Row 3 negative 2 EndMatrix bold upper Sigma StartBinomialOrMatrix 3 Choose negative 2 EndBinomialOrMatrix 2nd Row 1st Column Blank 2nd Column equals Start 1 By 1 Matrix 1st Row 3 negative 2 EndMatrix Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column negative one half 2nd Row 1st Column negative one half 2nd Column 1 EndMatrix StartBinomialOrMatrix 3 Choose negative 2 EndBinomialOrMatrix 3rd Row 1st Column Blank 2nd Column equals 19 EndLayout
Using Proposition 1, then, 3 upper X 1 minus 2 upper X 2 plus 1 tilde upper N left parenthesis 1 comma 19 right parenthesis period
Marginal distributions
The marginal distribution of a subset of variables in a MVN can be found by simply taking the relevant subsets of means, and the relevant subset of the covariance matrix for the variables you are interested in.
Proposition 2
Marginal distributions for bivariate normal
Let upper X 1 and upper X 2 be bivariate normal random variables, and suppose
StartLayout 1st Row bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix mu 1 Choose mu 2 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column sigma 1 squared 2nd Column rho 12 sigma 1 sigma 2 2nd Row 1st Column rho 12 sigma 1 sigma 2 2nd Column sigma 2 squared EndMatrix right parenthesis period EndLayout
Then
upper X 1 tilde upper N left parenthesis mu 1 comma sigma 1 squared right parenthesis comma
upper X 2 tilde upper N left parenthesis mu 2 comma sigma 2 squared right parenthesis period
An important consequence of this property is that the marginal distribution of every single variable of a multivariate normal random vector is again normal.
Example 5
In Example 1, calculate the marginal distribution of upper X and upper Y.
Answer:
We know
bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix 67.7 Choose 68.7 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column 2.74 squared 2nd Column 3.85 2nd Row 1st Column 3.85 2nd Column 2.81 squared EndMatrix right parenthesis period
Therefore
upper X 1 tilde upper N left parenthesis 67.7 comma 2.74 squared right parenthesis and upper X 2 tilde upper N left parenthesis 68.7 comma 2.81 squared right parenthesis period
If we plot these variables separately we can see that both variables have the typical bell-shaped curve as we would expect for data which follows a normal distribution.
0.000.050.100.1560657075Height of fathers (inches)Probability density0.000.050.100.1560657075Height of sons (inches)Probability density
Suppose that the continuous random vector
bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix 0 Choose 0 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column negative one half 2nd Row 1st Column negative one half 2nd Column 1 EndMatrix right parenthesis period
Identify the marginal distributions of upper X 1 and upper X 2.
Find
(i) upper E left parenthesis upper X 1 right parenthesis and Var left parenthesis upper X 1 right parenthesis,
(ii) upper E left parenthesis upper X 2 right parenthesis and Var left parenthesis upper X 2 right parenthesis,
(iii) Cov left parenthesis upper X 1 comma upper X 2 right parenthesis and rho left parenthesis upper X 1 comma upper X 2 right parenthesis.
Show answer
upper X 1 tilde upper N left parenthesis 0 comma 1 right parenthesis and upper X 2 tilde upper N left parenthesis 0 comma 1 right parenthesis
(i) upper E left parenthesis upper X 1 right parenthesis equals 0 comma Var left parenthesis upper X 1 right parenthesis equals 1,
(ii) upper E left parenthesis upper X 2 right parenthesis equals 0 comma Var left parenthesis upper X 2 right parenthesis equals 1,
(iii) Cov left parenthesis upper X 1 comma upper X 2 right parenthesis equals negative one half and
rho left parenthesis upper X 1 comma upper X 2 right parenthesis equals StartFraction Cov left parenthesis upper X 1 comma upper X 2 right parenthesis Over StartRoot Var left parenthesis upper X 1 right parenthesis dot Var left parenthesis upper X 2 right parenthesis EndRoot EndFraction equals StartFraction negative one half Over StartRoot 1 dot 1 EndRoot EndFraction equals negative one half.
Conditional distributions
Another important property of the MVN distribution is that if upper X 1 and upper X 2 have a multivariate normal distribution, then the conditional distribution of upper X 1 given that upper X 2 equals bold italic x 2 also has a normal distribution.
Proposition 3
Conditional distributions for bivariate normal
Let upper X 1 and upper X 2 be bivariate normal random variables, and suppose
bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix mu 1 Choose mu 2 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column sigma 1 squared 2nd Column rho 12 sigma 1 sigma 2 2nd Row 1st Column rho 12 sigma 1 sigma 2 2nd Column sigma 2 squared EndMatrix right parenthesis period
Then
upper X 1 vertical bar upper X 2 equals x 2 tilde upper N left parenthesis mu 1 plus StartFraction sigma 1 Over sigma 2 EndFraction rho 12 left parenthesis x 2 minus mu 2 right parenthesis comma left parenthesis 1 minus rho 12 squared right parenthesis sigma 1 squared right parenthesis period
and
upper X 2 vertical bar upper X 1 equals x 1 tilde upper N left parenthesis mu 2 plus StartFraction sigma 2 Over sigma 1 EndFraction rho 12 left parenthesis x 1 minus mu 1 right parenthesis comma left parenthesis 1 minus rho 12 squared right parenthesis sigma 2 squared right parenthesis period
Let's take a moment to try and understand what is going on here, focusing on the conditional probability of upper X 1 vertical bar upper X 2 equals x 2.
the conditional mean is equal to the mean of upper X 1 (mu 1) plus a constant which will be positive if the value observed for x 2 is larger than the mean for x 2, or negative if the value observed for x 2 is smaller than the mean for x 2 (assuming rho 12 is positive, if rho 12 is negative the opposite is true).
the conditional variance left parenthesis left parenthesis 1 minus rho 12 squared right parenthesis sigma 1 squared right parenthesis is smaller than the marginal variance (sigma 1 squared), and gets smaller as the correlation increases.
Example 6
In Example 1, calculate the conditional distribution of fathers' heights given that a son's height is equal to 65 inches.
Answer:
We know
bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix 67.7 Choose 68.7 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column 2.74 squared 2nd Column 3.85 2nd Row 1st Column 3.85 2nd Column 2.81 squared EndMatrix right parenthesis period
We want the conditional distribution of upper X 1 vertical bar upper X 2 equals 65 period Looking at Proposition Proposition 3 we can pull out all of the relevant pieces of information we need to calculate the conditional mean and variance.
mu 1 equals 67.7 comma mu 2 equals 68.7 comma sigma 1 equals 2.74 comma sigma 2 equals 2.81 comma rho 12 equals 0.50 period
We can then substitute these values into the formulae from Proposition Proposition 3 to get
StartLayout 1st Row 1st Column upper E left parenthesis upper X 1 vertical bar upper X 2 equals 65 right parenthesis 2nd Column equals mu 1 plus StartFraction sigma 1 Over sigma 2 EndFraction rho 12 left parenthesis x 2 minus mu 2 right parenthesis 2nd Row 1st Column Blank 2nd Column equals 67.7 plus StartFraction 2.74 Over 2.81 EndFraction dot 0.50 dot left parenthesis 65 minus 68.7 right parenthesis 3rd Row 1st Column Blank 2nd Column equals 65.90 EndLayout
StartLayout 1st Row 1st Column Var left parenthesis upper X 1 vertical bar upper X 2 equals 65 right parenthesis 2nd Column equals left parenthesis 1 minus rho 12 squared right parenthesis sigma 1 squared 2nd Row 1st Column Blank 2nd Column equals left parenthesis 1 minus 0.50 squared right parenthesis dot 2.74 squared 3rd Row 1st Column Blank 2nd Column equals 5.63 EndLayout
Therefore, upper X 1 vertical bar upper X 2 equals 65 tilde upper N left parenthesis 65.90 comma 5.63 right parenthesis
So the conditional mean (65.90) is smaller than the mean for fathers (mu 1 equals 67.7) since we know that the son's height is smaller than the mean height for sons (mu 2 equals 68.7). We can also see that the conditional variance (5.63) is smaller than the marginal variance for fathers (2.74 squared equals 7.51).
This video discusses the bivariate normal distribution using Example 1. Apologies for the poor sound quality.
Suppose that the continuous random vector
bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix 0 Choose 0 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column negative one half 2nd Row 1st Column negative one half 2nd Column 1 EndMatrix right parenthesis period
Identify the conditional distribution of upper X 1 given upper X 2 equals x 2.
Show answer
We have
mu 1 equals 0 comma mu 2 equals 0 comma sigma 1 equals 1 comma sigma 2 equals 1 comma rho 12 equals negative one half period
We can then substitute these values into the formulae from Proposition Proposition 3 to get
StartLayout 1st Row 1st Column upper E left parenthesis upper X 1 vertical bar upper X 2 equals x 2 right parenthesis 2nd Column equals mu 1 plus StartFraction sigma 1 Over sigma 2 EndFraction rho 12 left parenthesis x 2 minus mu 2 right parenthesis 2nd Row 1st Column Blank 2nd Column equals 0 plus 1 dot left parenthesis negative one half right parenthesis dot left parenthesis x 2 minus 0 right parenthesis 3rd Row 1st Column Blank 2nd Column equals minus StartFraction x 2 Over 2 EndFraction EndLayout
StartLayout 1st Row 1st Column Var left parenthesis upper X 1 vertical bar upper X 2 equals x 2 right parenthesis 2nd Column equals left parenthesis 1 minus rho 12 squared right parenthesis sigma 1 squared 2nd Row 1st Column Blank 2nd Column equals left parenthesis 1 minus one half squared right parenthesis dot 1 3rd Row 1st Column Blank 2nd Column equals three fourths EndLayout
Here is a video worked solution for all of Task 1.
The results above for the marginal and conditional distributions are for the bivariate case. These results can be generalised to the multivariate normal as shown below.
Marginal distributions
Let the random vector upper X be split into two blocks, upper X 1 and upper X 2, and suppose
bold italic upper X equals StartBinomialOrMatrix bold italic upper X 1 Choose bold italic upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix bold italic mu 1 Choose bold italic mu 2 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column bold upper Sigma 11 2nd Column bold upper Sigma 12 2nd Row 1st Column bold upper Sigma 12 2nd Column bold upper Sigma 22 EndMatrix right parenthesis period
Then
bold italic upper X 1 tilde upper N left parenthesis bold italic mu 1 comma bold upper Sigma 11 right parenthesis comma
bold italic upper X 2 tilde upper N left parenthesis bold italic mu 2 comma bold upper Sigma 22 right parenthesis period
Conditional distributions
Let the random vector bold italic upper X be split into two blocks, bold italic upper X 1 and bold italic upper X 2, and suppose
bold italic upper X equals StartBinomialOrMatrix bold italic upper X 1 Choose bold italic upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix bold italic mu 1 Choose bold italic mu 2 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column bold upper Sigma 11 2nd Column bold upper Sigma 12 2nd Row 1st Column bold upper Sigma 12 2nd Column bold upper Sigma 22 EndMatrix right parenthesis period
Then
bold italic upper X 1 vertical bar bold italic upper X 2 equals bold italic x 2 tilde upper N left parenthesis bold italic mu 1 plus bold upper Sigma 12 bold upper Sigma 22 Superscript negative 1 Baseline left parenthesis bold italic x 2 minus bold italic mu 2 right parenthesis comma bold upper Sigma 11 minus bold upper Sigma 12 bold upper Sigma 22 Superscript negative 1 Baseline bold upper Sigma 12 Superscript down tack Baseline right parenthesis period
and
bold italic upper X 2 vertical bar bold italic upper X 1 equals bold italic x 1 tilde upper N left parenthesis bold italic mu 2 plus bold upper Sigma 12 Superscript down tack Baseline bold upper Sigma 11 Superscript negative 1 Baseline left parenthesis bold italic x 1 minus bold italic mu 1 right parenthesis comma bold upper Sigma 22 minus bold upper Sigma 12 Superscript down tack Baseline bold upper Sigma 11 Superscript negative 1 Baseline bold upper Sigma 12 right parenthesis period
Notice that
the conditional mean is linear in x, it passes through the mean left parenthesis mu 1 comma mu 2 right parenthesis, and has a steeper slope with higher correlation.
the conditional variance is smaller than the marginal variance, and gets smaller as the correlation increases.
Independence
We have seen that uncorrelated random variables are not necessarily independent: their relationship might be entirely non-linear.
The multivariate normal distribution is an exception to this. For the multivariate normal distribution absence of correlation and independence are one and the same thing. The reason for this is that the multivariate normal distribution only allows for linear dependency between its components, as we have seen when we have looked at the conditional distributions.
Large sample theory
In probability, we study limits to understand the long-term behaviour of random processes and sequences of random variables. In general, a limit tells us the value that a function approaches as that function's inputs get closer and closer to some number (often infinity). This may not, on the face of it, seem particularly useful. However, studying limits can often lead to simplified formulas for otherwise unsolvable probability models, which can lead to insights into complex problems.
In Week 1, we discussed the concept of relative frequency when interpreting a probability, which is an intuitive way of interpreting a probability as simply the frequency with which that outcome occurs in the long run, when the experiment is repeated a large number of times. This idea is illustrated in the example below.
Example 7
Real-world example
John Kerrich's famous experiment
Whilst visiting relatives in Copenhagen in 1940, John Kerrich, a British mathematician, was caught up in the Nazi invasion and interned in a prisoner of war camp. During his time in the camp, Kerrich conducted an experiment tossing a coin 10,000 times and recording the number of heads obtained. The following graph shows the proportion of heads for 0 - 2000 tosses using the data recorded by Kerrich.
0.00.20.40.60500100015002000Number of coin tossesProportion of heads
The figure shows wide fluctuations in the proportion of heads at the beginning of the experiment which eventually settle down close to the proportion we would expect of 0.5.
This example illustrates the Law of Large Numbers, which justifies the use of simulation to approximate the probability upper P left parenthesis upper A right parenthesis of an event upper A occurring. A consequence of the Law of Large Numbers is that in repeated trials of a random experiment the proportion of trials in which upper A occurs converges to upper P left parenthesis upper A right parenthesis.
Let upper X 1 comma upper X 2 comma period period period be an independent and identically distributed sequence of random variables with finite expectation mu. For n equals 1 comma 2 comma period period period comma let
upper S Subscript n Baseline equals upper X 1 plus midline horizontal ellipsis plus upper X Subscript n Baseline period
Then the Law of Large Numbers says that the average, or the cumulative mean, upper X overbar Subscript n Baseline equals upper S Subscript n Baseline divided by n, converges to mu, as n right arrow normal infinity period
Example 8
In the Kerrich experiment, upper A is the event
upper A equals StartSet coin tossed is a head EndSet and using what we have learned so far we can identify this experiment as a series of Bernoulli distributions where the probability of tossing a head is equal to one half.
upper A tilde Ber left parenthesis one half right parenthesis
Therefore, mu equals upper E left parenthesis upper A right parenthesis equals one half.
Let
upper X Subscript k Baseline equals StartLayout Enlarged left brace 1st Row 1st Column 1 comma 2nd Column if upper A occurs on the k th experiment 2nd Row 1st Column 0 comma 2nd Column otherwise period EndLayout
From the figure we can see that as the number of trials increases, the average upper X overbar equals upper S Subscript n Baseline divided by n equals left parenthesis upper X 1 plus midline horizontal ellipsis plus upper X Subscript n Baseline right parenthesis divided by n, which is just proportion of trials in which upper A occurs tends towards upper P left parenthesis upper A right parenthesis equals mu equals 1 divided by 2.
That is, the proportion of n trials in which upper A (heads) occurs converges to upper P left parenthesis upper A right parenthesis as n right arrow normal infinity.
To discuss the Law of Large Numbers more formally, let's define what convergence in probability means.
Definition 2
Convergence in probability
Let upper X 1 comma upper X 2 comma ellipsis be a sequence of random variables defined on a sample space upper S. The sequence upper X Subscript n is said to converge in probability to a constant mu if, for every epsilon greater than 0,
upper P left parenthesis StartAbsoluteValue upper X Subscript n Baseline minus mu EndAbsoluteValue less than epsilon right parenthesis right arrow 1 as n right arrow normal infinity period
For most of the results we will study in this chapter, they key quantity of interest for us will be the cumulative mean, i.e. we study the sample mean
upper X overbar Subscript n Baseline equals StartFraction 1 Over n EndFraction sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i
as the sample size n increases to infinity, i.e. n right arrow normal infinity.
The figures below illustrate this definition. It shows the distribution of upper X overbar Subscript n for n equals 1, n equals 10 and n equals 100. We are considering the probability that upper X Subscript n Baseline overbar is within a "tube" of width 2 epsilon around mu. We can see that as we increase n the higher becomes the probability that upper X Subscript n Baseline overbar is in the interval left parenthesis mu minus epsilon comma mu plus epsilon right parenthesis. If we were to keep increasing n then this probability would become 1, as mandated by the definition. This can be seen by noting that in the final figure, the whole distribution is contained within an epsilon distance from the actual mean mu.
In Example 7, we were interested in the average upper X overbar Subscript n Baseline equals upper S Subscript n Baseline divided by n equals left parenthesis upper X 1 plus midline horizontal ellipsis plus upper X Subscript n Baseline right parenthesis divided by n , which was just the proportion of trials in which the tossed coin resulted in heads, as the number of tosses increased.
Let's now define the Weak Law of Large Numbers which shows that the sample mean of an independent sample drawn from any arbitrary distribution (as long as this distribution does not have too heavy tails) of size n is increasingly concentrated around its mean.
Theorem 1
The weak law of large numbers
Let upper X 1, upper X 2 comma ellipsis be a sequence of i.i.d. random variables, each with finite expected value mu. For n equals 1 comma 2 comma ellipsis, let
upper X overbar Subscript n Baseline equals StartFraction 1 Over n EndFraction sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline period
Then for any epsilon greater than 0,
upper P left parenthesis StartAbsoluteValue upper X overbar Subscript n Baseline minus mu EndAbsoluteValue less than epsilon right parenthesis right arrow 1 as n right arrow normal infinity period
In other words, the probability that the absolute value of the difference between the sample mean, upper X Subscript n Baseline overbar and the expected value mu is less than some very small number epsilon tends towards 1 as n goes to infinity.
The proof of this theorem is beyond the scope of this course but is provided as supplementary material.
Supplement 3
A proof of the weak law
One proof requires a result called Chebyshev's inequality
(although it only works when the variance exists too).
A key part of the proof of the Weak Law of Large Numbers is the so-called Chebyshev's inequality.
Let upper X be a random variable with finite expected value mu.
If c is a real constant such that
upper E left bracket left parenthesis upper X minus c right parenthesis squared right bracket is finite, then for any value epsilon greater than 0upper P left parenthesis StartAbsoluteValue upper X minus c EndAbsoluteValue less than epsilon right parenthesis greater than or equals 1 minus StartFraction 1 Over epsilon squared EndFraction upper E left bracket left parenthesis upper X minus c right parenthesis squared right bracket period
In particular, if upper X has finite variance, sigma squared equals upper E left bracket left parenthesis upper X minus mu right parenthesis squared right bracket, then for any value epsilon greater than 0upper P left parenthesis StartAbsoluteValue upper X minus mu EndAbsoluteValue less than epsilon right parenthesis greater than or equals 1 minus StartFraction sigma squared Over epsilon squared EndFraction period
This result is easily proved when upper X is a continuous random variable with probability density function f Subscript upper X Baseline left parenthesis x right parenthesis. In this case,
StartLayout 1st Row 1st Column upper E left bracket left parenthesis upper X minus c right parenthesis squared right bracket 2nd Column equals integral Subscript negative normal infinity Superscript normal infinity Baseline left parenthesis x minus c right parenthesis squared f Subscript upper X Baseline left parenthesis x right parenthesis d x 2nd Row 1st Column Blank 2nd Column greater than or equals integral Subscript negative normal infinity Superscript c minus epsilon Baseline left parenthesis x minus c right parenthesis squared f Subscript upper X Baseline left parenthesis x right parenthesis d x plus integral Subscript c plus epsilon Superscript normal infinity Baseline left parenthesis x minus c right parenthesis squared f Subscript upper X Baseline left parenthesis x right parenthesis d x 3rd Row 1st Column Blank 2nd Column greater than or equals integral Subscript negative normal infinity Superscript c minus epsilon Baseline epsilon squared f Subscript upper X Baseline left parenthesis x right parenthesis d x plus integral Subscript c plus epsilon Superscript normal infinity Baseline epsilon squared f Subscript upper X Baseline left parenthesis x right parenthesis d x 4th Row 1st Column Blank 2nd Column equals epsilon squared left bracket integral Subscript negative normal infinity Superscript c minus epsilon Baseline f Subscript upper X Baseline left parenthesis x right parenthesis d x plus integral Subscript c plus epsilon Superscript normal infinity Baseline f Subscript upper X Baseline left parenthesis x right parenthesis d x right bracket 5th Row 1st Column Blank 2nd Column equals epsilon squared left bracket 1 minus upper P left parenthesis StartAbsoluteValue upper X minus c EndAbsoluteValue less than epsilon right parenthesis right bracket 6th Row 1st Column long right double arrow upper P left parenthesis StartAbsoluteValue upper X minus c EndAbsoluteValue less than epsilon right parenthesis 2nd Column greater than or equals 1 minus StartFraction 1 Over epsilon squared EndFraction upper E left bracket left parenthesis upper X minus c right parenthesis squared right bracket period EndLayout
The first line is a definition. The second line removes a non-negative contribution to the integral from c minus epsilon to c plus epsilon. The third line replaces x minus c by its smallest value and x plus c by its smallest value. The fourth line tidies up. The fifth line identifies the integrals with the probability of a particular event and the last line rearranges.
The second part of the theorem follows immediately from the first by putting c equals mu.
We can now prove the Weak Law of Large Numbers. We will prove this result for the simple case where the random variables have finite variance sigma squared; however this assumption is not required for the weak law to hold.
For all n greater than or equals 1, upper X overbar Subscript n has expected value mu and variance sigma squared divided by n . By Chebyshev's inequality, for any epsilon greater than 0,
upper P left parenthesis StartAbsoluteValue upper X overbar Subscript n Baseline minus mu EndAbsoluteValue less than epsilon right parenthesis greater than or equals 1 minus StartFraction sigma squared divided by n Over epsilon squared EndFraction right arrow 1 as n right arrow normal infinity period
Informally, the laws of large numbers (there is also a Strong Law
too, which involves another form of stochastic convergence, which we will say no more about) tell us that the probability distribution of upper X overbar Subscript n becomes more and more concentrated at its expected value mu as n right arrow normal infinity. Although interesting and important, this does not help us to calculate probabilities of interest associated with upper X overbar Subscript n since it does not tell us how close upper X overbar Subscript n is to mu for a given value of n. The central limit theorem (CLT) provides a means of doing this, at least approximately.
The central limit theorem
Theorem 2
Let upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript n Baseline be a sequence of independent and identically distributed random variables, each with a finite mean mu and a finite variance sigma squared. Then for sufficiently large n we have that
StartRoot n EndRoot StartFraction upper X overbar Subscript n Baseline minus mu Over sigma EndFraction right arrow upper N left parenthesis 0 comma 1 right parenthesis comma
in the sense that the cdf of the left-hand side tends to the cdf of the standard normal distribution.
The central limit theorem is often used in one of the following two equivalent forms, which can be obtained by re-arranging the terms.
sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i approximately follows the upper N left parenthesis n mu comma n sigma squared right parenthesis distribution for 'sufficiently large' n.
upper X overbar Subscript n approximately follows the upper N left parenthesis mu comma StartFraction sigma squared Over n EndFraction right parenthesis distribution for 'sufficiently large' n.
Let's focus on 2. for now. Whatever the value of n, the rules for the expected value and the variance of a linear functions tell us that upper E left parenthesis upper X overbar Subscript n Baseline right parenthesis equals mu and Var left parenthesis upper X overbar Subscript n Baseline right parenthesis equals StartFraction sigma squared Over n EndFraction. What the central limit theorem tells us is that the shape of the distribution of upper X overbar Subscript n tends to the normal distribution.
The useful and counter-intuitive thing about the central limit theorem is that this happens no matter what the shape of the original distribution is (unless it has too heavy tails and no finite variance). For most distributions, a normal distribution is approached very quickly as n increases.
Task 4
Verify that upper E left parenthesis upper X overbar Subscript n Baseline right parenthesis equals mu and Var left parenthesis upper X overbar Subscript n Baseline right parenthesis equals StartFraction sigma squared Over n EndFraction.
Show answer
Using that upper X overbar Subscript n Baseline equals StartFraction 1 Over n EndFraction sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i and that the upper X Subscript i are independent,
StartLayout 1st Row 1st Column upper E left parenthesis upper X overbar Subscript n Baseline right parenthesis 2nd Column equals 3rd Column upper E left parenthesis StartFraction 1 Over n EndFraction sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline right parenthesis 2nd Row 1st Column Blank 2nd Column equals 3rd Column StartFraction 1 Over n EndFraction ModifyingBelow sigma summation Underscript i equals 1 Overscript n Endscripts ModifyingBelow upper E left parenthesis upper X Subscript i Baseline right parenthesis With bottom brace Underscript equals mu Endscripts With bottom brace Underscript equals n mu Endscripts 3rd Row 1st Column Blank 2nd Column equals 3rd Column mu 4th Row 1st Column upper V a r left parenthesis upper X overbar Subscript n Baseline right parenthesis 2nd Column equals 3rd Column upper V a r left parenthesis StartFraction 1 Over n EndFraction sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline right parenthesis 5th Row 1st Column Blank 2nd Column equals 3rd Column StartFraction 1 Over n squared EndFraction ModifyingBelow sigma summation Underscript i equals 1 Overscript n Endscripts ModifyingBelow upper V a r left parenthesis upper X Subscript i Baseline right parenthesis With bottom brace Underscript equals sigma squared Endscripts With bottom brace Underscript equals n sigma squared Endscripts 6th Row 1st Column Blank 2nd Column equals 3rd Column StartFraction sigma squared Over n EndFraction EndLayout
The simplest way to illustrate the central limit theorem is using a graphical example.
Example 10
Exponential to normal
Suppose the random variable upper X tilde Exp left parenthesis 1.5 right parenthesis period The figure below shows a sample of 100 points from this distribution.
You can see very clearly from this plot that this is a highly skewed distribution and therefore non-normal.
A further simulation was carried out from the same model. This time, a sample of n equals 5 simulated values was obtained and the sample mean calculated. This was done 1,000 times and the sample means are displayed in histogram (i) below. Though skewed, the distribution of sample means is a lot less skewed than that of the original data. The remaining histograms repeat the simulation for even larger sample sizes, (ii) n equals 10, (iii) n equals 25 and (iv) n equals 100. Clearly as n increases, the distribution of the sample means become more symmetric and looks more and more like the bell-shaped curve of the normal distribution. It can also be seen that as n increases the spread of the distribution decreases (note that the scale on the horizontal axes differs between these plots).
0501001500.51.01.52.0xFrequency(i) n = 502550750.40.81.21.6xFrequency(ii) n = 100501001500.51.0xFrequency(iii) n = 250501001500.500.751.00xFrequency(iv) n = 100
It is usual for even very large financial transactions, to the value of hundreds of thousands of pounds, to be settled to fractions of
pence. Suppose, instead, that financial institutions agreed to round all settlements of transactions between them to the nearest whole £1. In one year, a certain institution makes 1500 transactions. What is the probability that this institution will lose more than £5 over the course of the year?
To answer this question, let's begin by defining upper X Subscript i to be the difference in cost in £ between the computed cost of the i-th transaction and its true cost left parenthesis i equals 1 comma 2 comma ellipsis comma 1500 right parenthesis. For each transaction the most that an institution can lose is 50 pence (or £0.5) and the amount that they can gain is also 50 pence (or £0.5), since a transaction will either be rounded up to the nearest pound, or rounded down to the nearest pound. We can therefore assume that upper X Subscript i Baseline tilde Un left parenthesis negative 0.5 comma 0.5 right parenthesis.
We are not interested in the amount lost for a single transaction, but rather the total amount lost in the year over all 1500 transactions. The difference between the total cost of the 1500 transactions and the computed cost is
upper S 1500 equals upper X 1 plus ellipsis plus upper X 1500 period
Although we know the distribution for each upper X Subscript i, we don't know the distribution of upper S 1500. We can however estimate this using the central limit theorem.
Using the results from Week 6 we can calculate mu equals upper E left parenthesis upper X Subscript i Baseline right parenthesis equals StartFraction negative 0.5 plus 0.5 Over 2 EndFraction equals 0 and sigma squared equals Var left parenthesis upper X Subscript i Baseline right parenthesis equals StartFraction 0.5 minus left parenthesis negative 0.5 right parenthesis Over 12 EndFraction equals one twelfth.
Then, using the central limit theorem
upper S Subscript n Baseline equals sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline tilde Overscript approx Endscripts upper N left parenthesis n mu comma n sigma squared right parenthesis periodupper S Subscript n Baseline tilde Overscript approx Endscripts upper N left parenthesis 0 comma StartFraction 1500 Over 12 EndFraction right parenthesis period
We can then follow the same process as in Week 6 to find the probability that the institution loses at least £5 in total is
StartLayout 1st Row 1st Column upper P left parenthesis upper S 1500 less than negative 5 right parenthesis 2nd Column equals 1 minus upper P left parenthesis upper S 1500 less than 5 right parenthesis 2nd Row 1st Column Blank 2nd Column equals 1 minus upper P left parenthesis StartFraction upper S 1500 minus 0 Over StartRoot 1500 divided by 12 EndRoot EndFraction less than StartFraction 5 minus 0 Over StartRoot 1500 divided by 12 EndRoot EndFraction right parenthesis 3rd Row 1st Column Blank 2nd Column equals 1 minus upper P left parenthesis upper Z less than 0.45 right parenthesis 4th Row 1st Column Blank 2nd Column equals 1 minus normal upper Phi left parenthesis 0.45 right parenthesis 5th Row 1st Column Blank 2nd Column equals 1 minus 0.674 6th Row 1st Column Blank 2nd Column equals 0.326 EndLayout
Task 5
The life-time of video projector light bulbs are known to follow an exponential distribution with a mean life-time of StartFraction 1 Over lamda EndFraction equals 90. The university uses projectors for 8500 hours per semester. What is the probability that 100 light bulbs will be sufficient for the semester?
Show answer
Let upper X Subscript i be the life time of the i-th light bulb.
then upper S 100 equals upper X 1 plus midline horizontal ellipsis upper X 100.
Since the life-time of a light bulb is exponentially distributed, the mean and standard deviation of individual life-time is mu equals sigma equals 90.
Then, using the central limit theorem
upper S Subscript n Baseline tilde Overscript approx Endscripts upper N left parenthesis 100 dot 90 comma 100 dot 90 squared right parenthesis period
We are looking for
StartLayout 1st Row 1st Column upper P left parenthesis upper S 100 greater than 8500 right parenthesis 2nd Column equals 1 minus upper P left parenthesis upper S 100 less than 8500 right parenthesis 2nd Row 1st Column Blank 2nd Column equals 1 minus upper P left parenthesis StartFraction upper S 100 minus 9000 Over StartRoot 90 squared dot 100 EndRoot EndFraction less than StartFraction 8500 minus 9000 Over StartRoot 90 squared dot 100 EndRoot EndFraction right parenthesis 3rd Row 1st Column Blank 2nd Column equals 1 minus upper P left parenthesis upper Z less than negative 0.56 right parenthesis 4th Row 1st Column Blank 2nd Column equals 1 minus left parenthesis 1 minus upper P left parenthesis upper Z less than 0.56 right parenthesis right parenthesis 5th Row 1st Column Blank 2nd Column equals 0.7123 period EndLayout
Consider the Binomial distribution upper X tilde Bi left parenthesis 1000 comma 0.4 right parenthesis and suppose you wish to calculate upper P left parenthesis upper X greater than 661 right parenthesis. The shortest way to calculate this is:
upper P left parenthesis upper X greater than 661 right parenthesis equals upper P left parenthesis upper X equals 662 right parenthesis plus upper P left parenthesis upper X equals 663 right parenthesis plus ellipsis plus upper P left parenthesis upper X equals 1000 right parenthesis comma
which involves 340 separate calculations! The central limit theorem allows us to make an approximation using a normal distribution.
Let upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript n Baseline be a sequence of independent and identical Bernleft parenthesis theta right parenthesis random variables. From Week 3 we know that
upper E left parenthesis upper X Subscript i Baseline right parenthesis equals theta and Var left parenthesis upper X Subscript i Baseline right parenthesis equals theta left parenthesis 1 minus theta right parenthesis.
The sum of these variables upper X equals sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline tilde Bi left parenthesis n comma theta right parenthesis.
The central limit theorem tells us that upper X equals sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline almost equals upper N left parenthesis n theta comma n theta left parenthesis 1 minus theta right parenthesis right parenthesis, which in turn means that
Bi left parenthesis n comma theta right parenthesis almost equals upper N left parenthesis n theta comma n theta left parenthesis 1 minus theta right parenthesis right parenthesis comma
providing n is large enough and theta is not too close to zero or one. Therefore, to calculate upper P left parenthesis upper X greater than 661 right parenthesis we simply approximate the binomial distribution with a normal and use the normal tables to calculate the probability.
Continuity correction
However in moving from a discrete binomial distribution to a continuous normal approximation we encounter the following problem.
Let upper X tilde Bi left parenthesis 100 comma 0.3 right parenthesis, and consider calculating
upper P left parenthesis upper X less than 50 right parenthesis.
upper P left parenthesis upper X less than or equals 50 right parenthesis.
As the binomial is a discrete distribution these two probabilities are different. However, if we apply the central limit theorem and approximate upper X tilde Bi left parenthesis 100 comma 0.3 right parenthesis with upper X tilde upper N left parenthesis 100 dot 0.3 comma 100 dot 0.3 dot 0.7 right parenthesis equals upper N left parenthesis 30 comma 21 right parenthesis (its normal approximation), we have a problem. This is because using the normal approximation and calculating upper P left parenthesis upper X less than 50 right parenthesis and upper P left parenthesis upper X less than or equals 50 right parenthesis will give the same probability, because for a continuous distribution upper P left parenthesis upper X equals 50 right parenthesis equals 0 (its a single outcome). However, as upper X is a discrete distribution upper P left parenthesis upper X equals 50 right parenthesis greater than 0.
Therefore each time we approximate a discrete distribution with a continuous one we make the following continuity correction. The correction works by adding or subtracting 0.5 to the outcome as follows:
upper P left parenthesis upper X greater than x right parenthesis is replaced with upper P left parenthesis upper X greater than x plus 0.5 right parenthesis.
upper P left parenthesis upper X greater than or equals x right parenthesis is replaced with upper P left parenthesis upper X greater than or equals x minus 0.5 right parenthesis.
upper P left parenthesis upper X less than x right parenthesis is replaced with upper P left parenthesis upper X less than x minus 0.5 right parenthesis.
upper P left parenthesis upper X less than or equals x right parenthesis is replaced with upper P left parenthesis upper X less than or equals x plus 0.5 right parenthesis.
Essentially if the probability to be calculated has a less than or greater than sign, we need to add or subtract 0.5 to x so that the probability you calculate is smaller than it would have been. In contrast, if the probability to be calculated has a less than or equals or greater than or equals sign, then we need to add or subtract 0.5 to x so that the probability you calculate is bigger than it would have been.
Example 12
Let upper X tilde Bi left parenthesis 10000 comma 0.005 right parenthesis. Using the normal distribution calculate upper P left parenthesis upper X less than 70 right parenthesis.
Answer:
As upper X tilde Bi left parenthesis 10000 comma 0.005 right parenthesis then upper X tilde Overscript approx Endscripts upper N left parenthesis 50 comma 49.75 right parenthesis so that sigma equals StartRoot 49.75 EndRoot. Therefore
StartLayout 1st Row 1st Column upper P left parenthesis upper X less than 70 right parenthesis 2nd Column equals upper P left parenthesis StartFraction upper X minus 50 Over StartRoot 49.5 EndRoot EndFraction less than StartFraction 69.5 minus 50 Over StartRoot 49.5 EndRoot EndFraction right parenthesis 2nd Row 1st Column Blank 2nd Column equals upper P left parenthesis upper Z less than 2.77 right parenthesis where upper Z tilde upper N left parenthesis 0 comma 1 right parenthesis 3rd Row 1st Column Blank 2nd Column equals normal upper Phi left parenthesis 2.77 right parenthesis equals 0.9972 period EndLayout
Task 6
Let upper X tilde Bi left parenthesis 100 comma 0.4 right parenthesis. Using the normal distribution calculate
(a) upper P left parenthesis upper X less than 51 right parenthesis,
(b) upper P left parenthesis upper X greater than or equals 33 right parenthesis,
(c) upper P left parenthesis 35 less than or equals upper X less than or equals 41 right parenthesis,
(d) upper P left parenthesis upper X equals 38 right parenthesis.
Show answer
As upper X tilde Bi left parenthesis 100 comma 0.4 right parenthesis then upper X tilde Overscript approx Endscripts upper N left parenthesis 40 comma 24 right parenthesis so that sigma equals StartRoot 24 EndRoot.
(a) StartLayout 1st Row 1st Column upper P left parenthesis upper X less than 51 right parenthesis 2nd Column equals upper P left parenthesis StartFraction upper X minus 40 Over StartRoot 24 EndRoot EndFraction less than or equals StartFraction 50.5 minus 40 Over StartRoot 24 EndRoot EndFraction right parenthesis 2nd Row 1st Column Blank 2nd Column equals upper P left parenthesis upper Z less than or equals 2.14 right parenthesis where upper Z tilde upper N left parenthesis 0 comma 1 right parenthesis 3rd Row 1st Column Blank 2nd Column equals normal upper Phi left parenthesis 2.14 right parenthesis equals 0.9834 period EndLayout
(b) StartLayout 1st Row 1st Column upper P left parenthesis upper X greater than or equals 33 right parenthesis 2nd Column equals upper P left parenthesis StartFraction upper X minus 40 Over StartRoot 24 EndRoot EndFraction greater than or equals StartFraction 32.5 minus 40 Over StartRoot 24 EndRoot EndFraction right parenthesis 2nd Row 1st Column Blank 2nd Column equals upper P left parenthesis upper Z greater than or equals negative 1.53 right parenthesis 3rd Row 1st Column Blank 2nd Column equals upper P left parenthesis upper Z less than or equals 1.53 right parenthesis equals 0.937 period EndLayout
(c) StartLayout 1st Row 1st Column upper P left parenthesis 35 less than or equals upper X less than or equals 41 right parenthesis 2nd Column equals upper P left parenthesis StartFraction 34.5 minus 40 Over StartRoot 24 EndRoot EndFraction less than or equals StartFraction upper X minus 40 Over StartRoot 24 EndRoot EndFraction less than or equals StartFraction 41.5 minus 40 Over StartRoot 24 EndRoot EndFraction right parenthesis 2nd Row 1st Column Blank 2nd Column equals upper P left parenthesis negative 1.12 less than or equals upper Z less than or equals 0.31 right parenthesis 3rd Row 1st Column Blank 2nd Column equals normal upper Phi left parenthesis 0.31 right parenthesis minus normal upper Phi left parenthesis negative 1.12 right parenthesis equals 0.6217 minus 0.1314 equals 0.4903 period EndLayout
(d) StartLayout 1st Row 1st Column upper P left parenthesis upper X equals 38 right parenthesis 2nd Column equals upper P left parenthesis StartFraction 37.5 minus 40 Over StartRoot 24 EndRoot EndFraction less than or equals StartFraction upper X minus 40 Over StartRoot 24 EndRoot EndFraction less than or equals StartFraction 38.5 minus 40 Over StartRoot 24 EndRoot EndFraction right parenthesis 2nd Row 1st Column Blank 2nd Column equals upper P left parenthesis negative 0.51 less than or equals upper Z less than or equals negative 0.31 right parenthesis 3rd Row 1st Column Blank 2nd Column equals normal upper Phi left parenthesis negative 0.31 right parenthesis minus normal upper Phi left parenthesis negative 0.51 right parenthesis equals 0.3783 minus 0.3050 equals 0.0733 period EndLayout
Let upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript n Baseline be a sequence of independent and identical Poisleft parenthesis 1 right parenthesis random variables. From Week 3 we know that
upper E left parenthesis upper X Subscript i Baseline right parenthesis equals 1 and Var left parenthesis upper X Subscript i Baseline right parenthesis equals 1 period
An important property of the Poisson distribution is that the sum of independent Poisson random variables has a Poisson distribution:
If upper X tilde Pois left parenthesis lamda right parenthesis and upper Y tilde Pois left parenthesis mu right parenthesis then upper X plus upper Y tilde Pois left parenthesis lamda plus mu right parenthesis period
So if upper X tilde Pois left parenthesis 1 right parenthesis and upper Y tilde Pois left parenthesis 1 right parenthesis then upper X plus upper Y tilde Pois left parenthesis 2 right parenthesis period
Therefore upper X equals sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline tilde Pois left parenthesis n right parenthesis.
Now the central limit theorem tells us that upper X equals sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline almost equals upper N left parenthesis n comma n right parenthesis, which in turn means
Po left parenthesis n right parenthesis almost equals upper N left parenthesis n comma n right parenthesis comma
providing n is large enough. As with the binomial approximation we have to do a continuity correction as we are moving from a discrete to a continuous distribution.
Example 13
Let upper X tilde Pois left parenthesis 50 right parenthesis and calculate
(a) upper P left parenthesis upper X less than 60 right parenthesis,
(b) upper P left parenthesis 50 less than or equals upper X less than 60 right parenthesis.
Answer:
As upper X tilde Pois left parenthesis 50 right parenthesis then upper X tilde Overscript approx Endscripts upper N left parenthesis 50 comma 50 right parenthesis so that sigma equals StartRoot 50 EndRoot.
(a)
StartLayout 1st Row 1st Column upper P left parenthesis upper X less than 60 right parenthesis 2nd Column equals upper P left parenthesis StartFraction upper X minus 50 Over StartRoot 50 EndRoot EndFraction less than StartFraction 59.5 minus 50 Over StartRoot 50 EndRoot EndFraction right parenthesis 2nd Row 1st Column Blank 2nd Column equals upper P left parenthesis upper Z less than 1.34 right parenthesis equals 0.9099 period EndLayout
(b)
StartLayout 1st Row 1st Column upper P left parenthesis 50 less than or equals upper X less than 60 right parenthesis 2nd Column equals upper P left parenthesis StartFraction 49.5 minus 50 Over StartRoot 50 EndRoot EndFraction less than or equals StartFraction upper X minus 50 Over StartRoot 50 EndRoot EndFraction less than StartFraction 59.5 minus 50 Over StartRoot 50 EndRoot EndFraction right parenthesis 2nd Row 1st Column Blank 2nd Column equals upper P left parenthesis negative 0.07 less than or equals upper Z less than 1.34 right parenthesis 3rd Row 1st Column Blank 2nd Column equals normal upper Phi left parenthesis 1.34 right parenthesis minus normal upper Phi left parenthesis negative 0.07 right parenthesis equals 0.9099 minus left parenthesis 1 minus 0.5279 right parenthesis equals 0.4378 period EndLayout
Learning outcomes for week 8
By the end of week 8, you should be able to:
calculate linear functions of the multivariate normal distribution;
calculate marginal and conditional distributions of the multivariate normal (for the bivariate case);
state and use the central limit theorem;
calculate normal approximations to the binomial and Poisson distributions.
A summary of the most important concepts and written answers to all tasks are provided overleaf.
Week 8 summary
The multivariate normal distribution
Linear functions of MVN
Suppose bold italic upper X tilde upper N left parenthesis bold italic mu comma bold upper Sigma right parenthesis, then
bold italic upper A bold italic upper X plus bold italic b tilde upper N left parenthesis bold italic upper A bold italic mu plus bold italic b comma bold italic upper A bold upper Sigma bold italic upper A Superscript down tack Baseline right parenthesis period
Probability density function
Suppose that the random variable bold italic upper X can take any real value and that bold italic upper X has the p.d.f.
f Subscript bold italic upper X Baseline left parenthesis bold italic x right parenthesis equals StartFraction 1 Over left parenthesis 2 pi right parenthesis Superscript p divided by 2 Baseline StartAbsoluteValue bold upper Sigma EndAbsoluteValue Superscript 1 divided by 2 Baseline EndFraction exp left parenthesis minus StartFraction left parenthesis bold italic x minus bold italic mu right parenthesis Superscript down tack Baseline bold upper Sigma Superscript negative 1 Baseline left parenthesis bold italic x minus bold italic mu right parenthesis Over 2 EndFraction right parenthesis
for all bold italic x element of double struck upper R Superscript p, then bold italic upper X is said to have a multivariate normal distribution, with mean upper E left parenthesis bold italic upper X right parenthesis equals bold italic mu and (co)variance matrix Var left parenthesis bold italic upper X right parenthesis equals bold upper Sigma, written
bold italic upper X tilde upper N left parenthesis bold italic mu comma bold upper Sigma right parenthesis period
Marginal distributions
Let upper X 1 and upper X 2 be bivariate normal random variables, and suppose
bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix mu 1 Choose mu 2 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column sigma 1 squared 2nd Column rho 12 sigma 1 sigma 2 2nd Row 1st Column rho 12 sigma 1 sigma 2 2nd Column sigma 2 squared EndMatrix right parenthesis period
Then
upper X 1 tilde upper N left parenthesis mu 1 comma sigma 1 squared right parenthesis comma
upper X 2 tilde upper N left parenthesis mu 2 comma sigma 2 squared right parenthesis period
Conditional distributions
Let upper X 1 and upper X 2 be bivariate normal random variables, and suppose
bold italic upper X equals StartBinomialOrMatrix upper X 1 Choose upper X 2 EndBinomialOrMatrix tilde upper N left parenthesis StartBinomialOrMatrix mu 1 Choose mu 2 EndBinomialOrMatrix comma Start 2 By 2 Matrix 1st Row 1st Column sigma 1 squared 2nd Column rho 12 sigma 1 sigma 2 2nd Row 1st Column rho 12 sigma 1 sigma 2 2nd Column sigma 2 squared EndMatrix right parenthesis period
Then
upper X 1 vertical bar upper X 2 equals x 2 tilde upper N left parenthesis mu 1 plus StartFraction sigma 1 Over sigma 2 EndFraction rho 12 left parenthesis x 2 minus mu 2 right parenthesis comma left parenthesis 1 minus rho 12 squared right parenthesis sigma 1 squared right parenthesis period
and
upper X 2 vertical bar upper X 1 equals x 1 tilde upper N left parenthesis mu 2 plus StartFraction sigma 2 Over sigma 1 EndFraction rho 12 left parenthesis x 1 minus mu 1 right parenthesis comma left parenthesis 1 minus rho 12 squared right parenthesis sigma 2 squared right parenthesis period
Large sample theory
The weak law of large numbers
Let upper X 1, upper X 2 comma ellipsis be a sequence of i.i.d. random variables, each with finite expected value mu. For n equals 1 comma 2 comma ellipsis, let
upper X overbar Subscript n Baseline equals StartFraction 1 Over n EndFraction sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i Baseline period
Then for any epsilon greater than 0,
upper P left parenthesis StartAbsoluteValue upper X overbar Subscript n Baseline minus mu EndAbsoluteValue less than epsilon right parenthesis right arrow 1 as n right arrow normal infinity period
The central limit theorem
Let upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript n Baseline be a sequence of independent and identically distributed random variables, each with a finite mean mu and a finite variance sigma squared. Then for sufficiently large n we have that
StartRoot n EndRoot StartFraction upper X overbar Subscript n Baseline minus mu Over sigma EndFraction right arrow upper N left parenthesis 0 comma 1 right parenthesis comma
in the sense that the cdf of the left-hand side tends to the cdf of the standard normal distribution.
The central limit theorem is often used in one of the following two equivalent forms:
sigma summation Underscript i equals 1 Overscript n Endscripts upper X Subscript i approximately follows the upper N left parenthesis n mu comma n sigma squared right parenthesis distribution for 'sufficiently large' n.
upper X overbar approximately follows the upper N left parenthesis mu comma StartFraction sigma squared Over n EndFraction right parenthesis distribution for 'sufficiently large' n.