Statistical Inference - Course Project 1 - Amazon S3 Inference - Course Project 1 Chan Chee-Foong...
-
Upload
duongquynh -
Category
Documents
-
view
241 -
download
4
Transcript of Statistical Inference - Course Project 1 - Amazon S3 Inference - Course Project 1 Chan Chee-Foong...
Statistical Inference - Course Project 1Chan Chee-Foong
May 14, 2016
Overview
In this assignment, we will investigate the exponential distribution in R and compare it with the CentralLimit Theorem. The exponential distribution will be simulated in R with rexp(n, λ) where λ is the rateparameter. The mean of exponential distribution is 1/λ and the standard deviation is also 1/λ
For all simulations, unless otherwise stated, the following parameters are set:
1. lambda = λ = 0.22. noOfSim = no of simulations = 10003. n = no of exponential distribution per simulation = 404. Random seed set = 2016
Exponential Distribution
Let us generate 1000 * 40 random numbers of an exponential distribution with λ = 0.2 and take a look atthe distribution and its properties.
expSample <- rexp(noOfSim*n,lambda)
dfExpSample <- data.frame(sample = expSample)
ggplot(data=dfExpSample, aes(x=sample)) +geom_histogram(stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +ylab('Frequency') +xlab('') +labs(title = 'Histogram\n')
1
0
500
1000
1500
0 10 20 30 40 50
Fre
quen
cyHistogram
Properties of the exponential distribution generated are as follows:
1. Mean : 4.982. Standard Deviation : 4.993. Skewness : 1.974. Excess Kurtosis : 5.73
Noticed that the mean and standard deviation is close to 1/λ = 1/0.2 = 5. The distribution is not normalbecause skewness and excess kurtosis is not close to 0. The QQ plot below also shows that the distribution isnot normal.
2
0
20
40
−2.5 0.0 2.5Sample Quantiles
The
oret
ical
Qua
ntile
sQQ Plot
Studying the distribution of the mean of n exponentially generatedrandom variables
Simulations
Let us try instead to simulate 1000 sample set of 40 exponential random variables and calculating the meanof each sample. Noting that the expected sample mean and its standard error is as follows:
E[X] = 1/λ = 1/0.2 = 5Var[X] = 1/λˆ2 * 1/n = 1/0.2ˆ2 * 1/40 = 0.625SE[X] =
√V ar[X]/n = 5/
√40 = 0.79057
simSample <- matrix(rexp(n*noOfSim,lambda),noOfSim,n)
expMean <- 1/lambdastdError <- 1/lambda/sqrt(n)sampleMean <- apply(simSample, 1, mean)
dfSampleMean <- data.frame(sample = sampleMean)
ggplot(data=dfSampleMean, aes(x=sampleMean)) +geom_histogram(stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +
3
ylab('Frequency') +xlab('') +labs(title = 'Histogram\n') +geom_vline(xintercept = mean(sampleMean), color = 'red', size = 1.5)
0
30
60
90
4 6 8
Fre
quen
cy
Histogram
Sample Mean vs Theoretical Mean
mean(sampleMean)
## [1] 5.009748
The sample mean is 5.00975. As indicated (red vertical line) on the histogram. This value is close to thetheoretical mean of 1/λ = 1/0.2 = 5.
Sample Variance vs Theoretical Variance
var(sampleMean)
## [1] 0.6194703
The sample variance is 0.61947. This value is close to the theoretical variance of 1/λˆ2 * 1/n = 1/0.2ˆ2 *1/40 = 0.625.
4
Distribution
Let us study the distribution of the sample means to see whether it follows the Central Limit Theorem whichstates that the distribution of averages of iid variables (properly normalised) becomes that of a standardnormal if the sample size is large.
To standardise the sample means, we will substract the sample means off the expected mean and divide bythe standard error.
stdSampleMean <- (sampleMean - expMean)/stdErrordfStdSampleMean <- data.frame(sample = stdSampleMean)
We plot the standardised sample means in the density plot. Noticed that the sample density plot (red) isvery close to the standard normal density plot (yellow).
ggplot(data=dfStdSampleMean, aes(x=sample)) +geom_histogram(aes(y = ..density..),
stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +geom_density(col='red', size = 1.5) +stat_function(fun=dnorm, colour = "yellow", size = 1.5) +ylab('Density') +xlab('') +labs(title = 'Density Plot of the Standardised Sample Mean\n')
0.0
0.1
0.2
0.3
0.4
−2 0 2 4
Den
sity
Density Plot of the Standardised Sample Mean
Properties of the standardised sample means are as follows:
5
1. Mean : 0.012. Standard Deviation : 13. Skewness : 0.284. Excess Kurtosis : 0.23
Noticed that the mean and standard deviation is close to those of a standard normal distribution of 0 and 1respectively. The distribution is normal because skewness and excess kurtosis is close to 0. The QQ plotbelow also shows that the distribution is close to normal.
−2.5
0.0
2.5
−2 0 2Sample Quantiles
The
oret
ical
Qua
ntile
s
QQ Plot
Conclusion
We have shown that the standardised sample means of the random variables generated from the exponentialdistribution has a distribution like that of a standard normal when n is large.
Libraries required for this assignment project: ggplot2, moments
6