IC102 2011 Lecture End of Two Weeks Ending18th

download IC102 2011 Lecture End of Two Weeks Ending18th

If you can't read please download the document

Transcript of IC102 2011 Lecture End of Two Weeks Ending18th

The Chi-Squared DistributionLet v be a positive integer.Then a random variable X is said to have a chi-squared distribution with parameter v if the pdf of X is ( / 2) 1 / 2/ 210( ; )2 ( / 2)0 0v xvx e xf x vvx '< 6/12/11 01:22:08 PM IIT Bombay (IC102)The Chi-Squared DistributionThe parameter v is called the number of degrees of freedom (df) of X.The symbol is often used in place of chi-squared.If X1 , X2 , Xn are n i.i.d r.v. following N( , 2), Then, 2 6/12/11 01:22:09 PM IIT Bombay (IC102)221niniX | ` . ,: IIT Bombay (IC102)Chi-squared Critical ValueLet, called a chi-squared critical value, denote the number on the measurement axis such that of the area under the chi-squared curve with n d.f. lies to the right of2,.n 2,n 6/12/11 01:22:09 PM IIT Bombay (IC102)Notation Illustrated2,n 2,n shaded area = 2 pdfn 6/12/11 01:22:09 PM 6/12/11 01:22:09 PM IIT Bombay (IC102)If X is a chi-sq r.v. with n d.f., then for(0,1), the quantity 2,n is defined to be such that2,{ }nPX 6/12/11 01:22:09 PM IIT Bombay (IC102) 6/12/11 01:22:09 PM IIT Bombay (IC102)Cont. distn.1. Uniform2. Normal3. Gamma4. Exponential5. Chi-squared6. t-dist.7. F-dist. 6/12/11 01:22:09 PM IIT Bombay (IC102)Random Samples 6/12/11 01:22:09 PM IIT Bombay (IC102)Data from random samples drawn are used for inferring certain population characteristic of interest.The distribution of the population variable is usually known, except for some unknown population parameters.Problems in which the form of the underlying distn. is specified up to a set of unknown parameters are called parametric inference problems. 6/12/11 01:22:09 PM IIT Bombay (IC102)Random SamplesThe rvs X1,, Xn are said to form a simple random sample of size n if1. The Xis are independent rvs.2. Every Xi has the same (identical) probability distribution. 6/12/11 01:22:09 PM IIT Bombay (IC102)i.e.,ifX1,,Xnareindependentr.v.havinga common (identical) distn. F, then we say that they are i.i.d. random sample of size n from the distn. F. Distribution of a Linear Combination of Random Variables 6/12/11 01:22:09 PM IIT Bombay (IC102)Linear CombinationGiven a collection of n random variables X1,, Xn and n numerical constants a1,,an, the r.v.is called a linear combination of the Xis.1 11...nn n i iiY a X aX a X + + 6/12/11 01:22:09 PM IIT Bombay (IC102)Expected Value of a Linear CombinationLet X1,, Xn have mean valuesand variances of respectively1 2, ,...,n 2 2 21 2, ,..., ,n Whether or not the Xis are independent,( ) ( ) ( )1 1 1 1... ...n n n nEa X aX a E X aE X + + + +1 1...n na a + + 6/12/11 01:22:10 PM IIT Bombay (IC102)For identically distributed Xi `s and ai= 1/n, we get, E( ) = .XVariance of a Linear Combination( ) ( ) ( )2 21 1 1 1... ...n n n nV a X aX a V X a V X + + + +If X1,, Xn are independent,2 2 2 21 1...n na a + +and1 12 2 2 2... 1 1...n na X aX n na a + + + + 6/12/11 01:22:10 PM IIT Bombay (IC102)For i.i.d. Xi `s and ai= 1/n, we get, V( ) = 2/n.XVariance of a Linear Combination( )( )1 11 1... Cov ,n nn n i j i ji jV a X aX a a X X + + For any X1,, Xn, 6/12/11 01:22:10 PM IIT Bombay (IC102)Difference Between Two Random Variables( ) ( ) ( )1 2 1 2E X X E X E X and, if X1 and X2 are independent,( ) ( ) ( )1 2 1 2V X X V X V X + 6/12/11 01:22:10 PM IIT Bombay (IC102)If X1, X2,Xn are independent and normally distributed rvs, then any linear combination of the Xis also has a normal distribution.The difference X1 X2 between two independent, normally distributed variables is itself normally distributed.Also, is normally distributed. 6/12/11 01:22:10 PM IIT Bombay (IC102)XStatisticsand theirDistributions 6/12/11 01:22:10 PM IIT Bombay (IC102)StatisticA statistic is any quantity whose value can be calculated from sample data,e.g. sample mean, sample variance, sample range, sample median, etc.Prior to obtaining data, there is uncertainty as to what value is taken by any particular statistic. Thus, a statistic is itself a random variable and its prob. distn. is referred to as its sampling distn. 6/12/11 01:22:10 PM IIT Bombay (IC102)Simulation ExperimentsDraw 300 random samples each of size n=25, from a normal (size N) pop. with mean =5.4 and SD =0.2. So we get sample means . Draw the histogram of 6/12/11 01:22:10 PM IIT Bombay (IC102)1 2 300, ,..., x x x, 1,..., 300.ix i This gives a good approximation of the sampling distn. of In order to find the exact sampling dist. of,we need the dist. based on all the possible samples. . X30025| ` . ,X 6/12/11 01:22:10 PM IIT Bombay (IC102)The Distributionof the Sample Mean 6/12/11 01:22:10 PM IIT Bombay (IC102)General properties of the Sample MeanLet X1,, Xn be a random sample from a distribution with mean value and standard deviation Then. ( )( )221.2.XXE XV Xn In addition, with To = X1 ++ Xn,( ) ( )2, , and.oo o TET n VT n n 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102)( )( )2E XV XnCase: when Pop. is Normally dist.Let X1,, Xn be a random sample from a normal distribution with mean value and standard deviationThen for any n, is normally distributed with (, 2/n) Tois normally distributed with (n, n2).. X 6/12/11 01:22:10 PM IIT Bombay (IC102)The Central Limit Theorem CLTLet X1,, Xn be a random sample from a distribution with mean value and variance Then if n is sufficiently large,has approximately a normal distribution withX2. 22 and,X Xn and To also hasapproximately a normal distribution with2, .o oT Tn n n, the better the approximation.The larger the value of 6/12/11 01:22:10 PM IIT Bombay (IC102)Case: when Pop. is not Normally dist. 6/12/11 01:22:10 PM IIT Bombay (IC102)CLT:The Central Limit TheoremPopulation distribution small to moderate nX large nX 6/12/11 01:22:10 PM IIT Bombay (IC102)Rule of ThumbIf n > 30, the Central Limit Theorem can be used. 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102)Points to note:1. When n is large, the sampling distn. ofsample mean is well approximated by a normal curve, even when the pop. distn. is not itself normal.2. Sample mean based on a large n will tend to be closer to pop. mean than will sample mean based on a small n.3. The sampling dist. oftends to be centered at the value of the pop. mean.4. The spread of the sampling distn. oftends to grow smaller as the sample size n increases.5. As n increases, the sampling distn. oftends to a normal distn. with meanXXX and SD X Xn 6/12/11 01:22:10 PM IIT Bombay (IC102)When a sample X1,, Xnis drawn from a pop. with mean and SD , and when n is large (CLT)or when the pop. has a normal distn.,(0,1)/xxXXZ Nn : 6/12/11 01:22:10 PM IIT Bombay (IC102)Example: 6/12/11 01:22:10 PM IIT Bombay (IC102)Example: 6/12/11 01:22:10 PM IIT Bombay (IC102)Point estimate for pop. mean isA single number (statistic) based on sample data that represents our best guess for the value of the pop. mean (e.g., sample mean, sample median, sample mode)A statistic whose mean value is equal to is said to be an unbiased statistic (e.g., sample mean; )The point estimate (say, 5.5 feet) says nothing about how close it might be to the true pop. mean .As an alternative, we might report an entire interval of plausible values for the pop. mean .( )E X Unbiasedness (Illustration) 6/12/11 01:22:10 PM IIT Bombay (IC102)(Methods of point est.Method of Moments; Method of Max. Likelihood) 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102) 6/12/11 01:22:10 PM IIT Bombay (IC102)Card-Holders Survey on Payment Cardswww.math.iitb.ac.in/~udai/card-holders_survey2011.html IIT Bombay (IC102)Confidence IntervalsAn alternative to reporting a single value for the parameterbeingestimatedistocalculateand reportanentireintervalofplausiblevaluesa confidence interval (CI).ACIisalwayscalculatedbyfirstselectinga confidencelevel,whichisameasureofthe degreeofreliabilityoftheintervaltohave captured the true pop. mean . 6/12/11 01:22:10 PM 6/12/11 01:22:10 PM IIT Bombay (IC102)A confidence level of 95% implies that 95% of all samples would give an interval that includes and only 5% of all samples would yield an erroneous interval. 6/12/11 01:22:10 PM IIT Bombay (IC102)Consider a random sample IIT Bombay (IC102)95% Confidence Interval for when is knownIf after observing X1 = x1,, Xn = xn, we compute the observed sample mean, then a 95% confidence interval for can be expressed as x1.96 , 1.96 x xn n | ` + . , 6/12/11 01:22:10 PM 6/12/11 01:22:10 PM IIT Bombay (IC102)Given that pop. SD is 0.2, a 95% CI for the pop. mean heights (when sample mean, based on n=25, is say 5.45) is0.2 0.25.45 1.96 , 5.45 1.9625 25| ` + . ,1.96 , 1.96 x xn n | ` + . ,(5.45-0.08, 5.45+0.08) = (5.37,5.53) IIT Bombay (IC102)Other Levels of Confidence( )/ 2 / 21 P z Z z 1 shaded area =/ 2 curve z/ 2z/ 2z0 6/12/11 01:22:10 PM IIT Bombay (IC102)Other Levels of Confidence/ 2 / 2, x z x zn n | ` + . ,A confidence interval for the meanof a normal population when the value of is known is given by100(1 )% 6/12/11 01:22:11 PM 6/12/11 01:22:11 PM IIT Bombay (IC102)Given that pop. SD is 0.2, a 99% CI for the pop. mean heights (when sample mean, based on n=25, is say 5.45) is0.2 0.25.45 2.58 , 5.45 2.5825 25| ` + . ,(5.45-0.10, 5.45+0.10) = (5.35,5.55)