Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential...

26
26133: BUSINESS STATISTICS EXAM NOTES 1 TYPES OF DATA..............................................................2 1.1 Data Quality (Nominal, Ordinal, Interval, Ratio)..............................................................................................................................2 1.2 Method of data collection....................................................................................................................................................................2 1.3 Types of graphs.....................................................................................................................................................................................2 2 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES......................................3 2.1 Numerical Data Summaries................................................................................................................................................................3 2.2 Finding Outliers.....................................................................................................................................................................................4 3 PROBABILITY [1]............................................................5 3.1 Miscellaneous Laws..............................................................................................................................................................................5 4 DEPENDANCE (CHI 2 TEST)......................................................6 5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS..............................7 5.1 Binomial Distribution...........................................................................................................................................................................7 5.2 Poisson Distributions............................................................................................................................................................................7 6 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS......................................8 6.1 Uniform Distribution............................................................................................................................................................................8 6.2 Normal Distribution..............................................................................................................................................................................8 6.3 Exponential Distributions....................................................................................................................................................................8 7 SAMPLING AND SAMPLING DISTRIBUTIONS............................................9 7.1 Can the sample be assumed to be normal?......................................................................................................................................9 7.2 Standard error of a sample mean......................................................................................................................................................9 7.3 Finite correction factor.........................................................................................................................................................................9 8 INTERVAL ESTIMATION........................................................10 8.1 Estimating the population mean with a large N, using “z”..........................................................................................................10 8.2 Estimating the population mean, using “t-statistic” ( σ unknown).............................................................................................11 8.3 Estimating the population proportion.............................................................................................................................................12 8.4 Estimating population variance.......................................................................................................................................................12 8.5 Estimating sample size......................................................................................................................................................................13 9 HYPOTHESIS TESTING [1 POPULATION]............................................14 9.1 Methodology........................................................................................................................................................................................14 9.2 Rejection and non-rejection regions................................................................................................................................................14 9.3 Types of questions..............................................................................................................................................................................14 10 HYPOTHESIS TESTING [2+ POPULATIONS]..........................................17 11 REGRESSION [1]............................................................19 12 REGRESSION [2]............................................................20 1

Transcript of Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential...

Page 1: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

26133: BUSINESS STATISTICSEXAM NOTES

1 TYPES OF DATA.............................................................................................................................................. 21.1 Data Quality (Nominal, Ordinal, Interval, Ratio).....................................................................................................................................21.2 Method of data collection.......................................................................................................................................................................21.3 Types of graphs........................................................................................................................................................................................2

2 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES...................................................................................................32.1 Numerical Data Summaries.....................................................................................................................................................................32.2 Finding Outliers........................................................................................................................................................................................4

3 PROBABILITY [1]............................................................................................................................................. 53.1 Miscellaneous Laws.................................................................................................................................................................................5

4 DEPENDANCE (CHI2 TEST)................................................................................................................................. 6

5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS.........................................................................................75.1 Binomial Distribution...............................................................................................................................................................................75.2 Poisson Distributions...............................................................................................................................................................................7

6 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS.....................................................................................................86.1 Uniform Distribution................................................................................................................................................................................86.2 Normal Distribution.................................................................................................................................................................................86.3 Exponential Distributions.........................................................................................................................................................................8

7 SAMPLING AND SAMPLING DISTRIBUTIONS............................................................................................................ 97.1 Can the sample be assumed to be normal?.............................................................................................................................................97.2 Standard error of a sample mean............................................................................................................................................................97.3 Finite correction factor............................................................................................................................................................................9

8 INTERVAL ESTIMATION.................................................................................................................................... 108.1 Estimating the population mean with a large N, using “z”...................................................................................................................108.2 Estimating the population mean, using “t-statistic” (σ unknown)......................................................................................................118.3 Estimating the population proportion...................................................................................................................................................128.4 Estimating population variance.............................................................................................................................................................128.5 Estimating sample size..........................................................................................................................................................................13

9 HYPOTHESIS TESTING [1 POPULATION]............................................................................................................... 149.1 Methodology.........................................................................................................................................................................................149.2 Rejection and non-rejection regions......................................................................................................................................................149.3 Types of questions.................................................................................................................................................................................14

10 HYPOTHESIS TESTING [2+ POPULATIONS]............................................................................................................ 17

11 REGRESSION [1]............................................................................................................................................ 19

12 REGRESSION [2]............................................................................................................................................ 20

1

Page 2: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

1 TYPES OF DATA

1.1 DATA QUALITY (NOMINAL, ORDINAL , INTERVAL , RATIO)

Nominal (purely descriptive) Ordinal (ordered) Interval (each group of equal magnitude) Ratio (has a zero point)

1.2 METHOD OF DATA COLLECTION

Sampling (small group to represent population)o Cheap

Population (everyone)o Thorough

Time-series (over time)o Shows change

Cross-sectional (once/a snapshot)o Cheap/where time is irrelevant

1.3 TYPES OF GRAPHS

Bar charto Sectional comparison/growth

Line graph Ogive

o Cumulative frequency (percentage less than) Pie chart

o Percentages Scatter plot

o Infer trends

2

Page 3: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

2 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES

2.1 NUMERICAL DATA SUMMARIES

2.1.1 Mode

Most popular option

2.1.2 Median

Central option

2.1.3 Mean

M ean=μ=x= ∑ observations¿of observations

1. SD Mode [MODE], [MODE], [1].2. Stat clear [SHIFT], [MODE], [1].3. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each

observation value).4. Calculate [SHIFT], [2] (S-VAR), [1] (X ), [=].

2.1.4 Variance

Variance=σ2=∑i=1

n

(x i−μ )2

n=s

2

=∑i=1

n

( xi−x )2

n−1

1. Stat clear [SHIFT], [MODE], [1].2. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each

observation value).3. Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (xσn) OR [3] (xσn-1), [=].4. Square for variance [2], [=].

2.1.5 Standard Deviation

Standard Deviation=σ=√σ 2=s=√s2

1. Stat clear [SHIFT], [MODE], [1].2. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each

observation value).3. Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (xσn) OR [3] (xσn-1), [=].

3

Page 4: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

2.1.6 Coefficient of Variation

Measure of data spread; best method where the data set is positive.

Coefficient of Variation= sx

(100 )=σμ

(100)

2.2 F INDING OUTLIERS

2.2.1 Z-Score

Z-score describes the distance of a number from the average in terms of standard deviations.

Z score=zi=x i−xs

In outliers, z i>3

2.2.2 Box and whisker plot

Use for irregular/asymmetrical data

Describes the data set in terms of 5 points: min ,q1 ,median ,q3 ,max→IQR=q3−q1.

min ¿q1−1.5 (IQR ) q1=split again median=central data point q3=split again max ¿q3+1.5 (IQR )

4

Page 5: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

3 PROBABILITY [1]

3.1 M ISCELLANEOUS LAWS

Sum of probabilities = 1 = 100% p'=1−p

3.1.1 Intersection

Both occur: P(A∩B)

3.1.2 Union

Either A or B or both occurring:P (A∪B )

P (A∪B )=P ( A )+P (B )−P ( A∩B )

3.1.3 Conditional Probability

Probability of A occurring given that B already occurs

P (A|B )= P (A∩B )P (B )

5

P(A∩B) P(A∩B’) P(A)P(A’∩B) P(A’∩B’) P(A’)P(B) P(B’) 1

Page 6: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

4 DEPENDANCE (CHI2 TEST)

4.1.1 Observed Data

Insert observed data into a probability table

4.1.2 Probability from observations

Probability¿ observation=Observation∑∑

4.1.3 Predicted results if events are independent

Predicted results if events are independant=Column∗Row∑∑

Events as independent W 'W Retail 546 154 700 sum rpSale 1014 286 1300 sum 'rp 1560 440 2000 sum w sum 'w TT

4.1.4 Chi2 Test

1. Create table: for each cell, Chi= χ= (Observed results− predicted results if independant )2

predicted results if independant2. Total all cells: TTotal = Chi2 value

6

Observed data W 'W Retail 420 280 700 sum rpSale 1140 160 1300 sum 'rp 1560 440 2000 sum w sum 'w TT

Probability P (W) P' (W) P (Retail) 0.21 0.14 0.35 P (RP)P' (Sale) 0.57 0.08 0.65 P' (RP) 0.78 0.22 1 P (W) P' (W) TT

Page 7: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

Compare Chi2 value with Chi2 critical value [found by entering degrees of freedom (number of rows−1¿(number of columns−1) and alpha value (1−certainty required ¿ into the chi2

tables] if ch i2>ch i2critical value, then the values are dependant

7

Page 8: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS

Finite number of observations

1. Determine the type of distributiona. Binomial Distributionb. Poisson Distribution

2. What is the question?a. Probability of x? Probability of more or less than?b. DRAW

3. Get the formula4. Apply the terms

5.1 B INOMIAL D ISTRIBUTION

P ( x )=(nx )Pxqn−x= n!x ! (n−x ) !

pxqn−x

X = number of successes requiredN = number of trialsP = probability of successQ = 1-probability of failure

f ( x=a )=nCa∗pa∗qn−a

F(x) = probability of x successes in n trials

5.2 POISSON D ISTRIBUTIONS

P ( x )= λx e− λ

x !

Λ = mean of Poisson distribution

8

Page 9: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

6 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS

Working strictly with probabilities (percentages etc)

6.1 UNIFORM D ISTRIBUTION

This one looks like a rectangle; you merely need to find the area.

6.2 NORMAL D ISTRIBUTION

6.2.1 Probability density function of the normal distribution

f ( x )= 1σ √2π

e−( 12 )[ x−μ

σ ]2

6.2.2 Standardization (z-scores)

z= x−μσ

Then plug the z score into the z distribution table (single sided test)

6.3 EXPONENTIAL D ISTRIBUTIONS

6.3.1 Probability density function of the exponential distribution

f ( x )= λ e−λx

X & λmust be greater than zero

6.3.2 Probability of the right tail of the exponential distribution

P (x≥ x0 )=e− λ x0

X0 must be greater than 0

9

Page 10: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

7 SAMPLING AND SAMPLING DISTRIBUTIONS

7.1 CAN THE SAMPLE BE ASSUMED TO BE NORMAL?

If: sample >30, yes

If: population is normal, yes

7.2 STANDARD ERROR OF A SAMPLE MEAN

For infinite population

σ x=σ√n

For finite population

σ x=( σ√n )(√ N−n

N−1 )N = observations in populationn = observations in sample

7.3 F INITE CORRECTION FACTOR

This is necessary when nN

>0.05

For proportions

σ p̂=√ pqn √ N−n

N−1

p̂=proportion= xn

X = number of items in sample with the requisite characteristic

For quantitative data

σ x=( σ√n )(√ N−n

N−1 )

10

Page 11: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

8 INTERVAL ESTIMATION

8.1 ESTIMATING THE POPULATION MEAN WITH A LARGE N, USING “Z”

8.1.1 Basic form

point estimate±criticalvalue∗standarderr∨¿

If z= x−μ

σ√n

and sample mean can be greater or less than the population mean, the confidence interval is:

μ= x± z σ√n

8.1.2 Estimating μ

μ= x± zα /2σ√n

zα /2=z-score of the one sided area outside of the confidence interval

Or

x−zα /2σ√n

≤ μ≤x+zα /2σ√n

Usually, zα /2 for confidence of 95%, see below

8.1.3 Finding zα /2

11

Page 12: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

1. Draw2. Plug zα /2into z-tables

8.1.4 Add a finite correction factor

x−zα /2σ√n √ N−n

N−1≤μ≤ x+zα /2

σ√n √ N−n

N−1

8.1.5 If n is small (<30), then you can only use the above formulae if the population is normal

8.2 ESTIMATING THE POPULATION MEAN , USING “T-STATISTIC” (σ UNKNOWN)

8.2.1 T distribution

A distribution that describes the standardized sample mean when σ is unknown and population is normal

8.2.2 T value

Tool used to reach conclusions about null hypothesis

t= x−μs /√n

8.2.3 T distribution table

To read the table we need degrees of freedom and a t value

Degrees of freedom=n−1

t=α /2

8.2.4 Confidence intervals to estimate the population mean using the t-stat

x−t α2 ,n−1

s√n

≤ μ≤x+t α2 ,n−1

s√n

12

Page 13: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

8.3 ESTIMATING THE POPULATION PROPORTION

z= p̂− p

√ p̂ q̂n

p̂= sample proportionq̂= 1- p̂p= population proportionn= sample size

8.3.1 Confidence interval to estimated p

p̂−zα /2√ p̂ q̂n

≤ p≤ p̂+zα /2√ p̂ q̂n

8.4 ESTIMATING POPULATION VARIANCE

s2=∑ ( x−x )2

n−1

8.4.1 Chi2 formula for variance

NB: Distribution must be normal to use this formula

χ2=(n−1 ) s2

σ2

df =(n−1 )

8.4.2 Confidence interval to estimate the population variance

(n−1 ) s2

χ a/22 ≤σ2≤

(n−1 ) s2

χ1−a /22

df =(n−1 )

Work χ2out using χ( a2 ), df2

and χ(1−a2 ),df

2and the χ2 tables.

13

Page 14: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

8.5 ESTIMATING SAMPLE SIZE

This is used to find the minimum sample size to fulfill the requirements of a particular confidence level within a certain amount of error.

8.5.1 Sample size when estimating µ

n=za /22 σ2

E2=( z a2 σE )

2

E=( x−μ )=Error of Estimation

You either need to work out E, or it can be given as “to be within .03 of the true population proportion”

Always round up, since you can’t have half-people

8.5.2 Sample size when estimating p

n= z2 pqE2

Work out z-stat through confidence interval and tables

14

Page 15: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

9 HYPOTHESIS TESTING [1 POPULATION]

9.1 METHODOLOGY

1. Specify the thing of interest2. Formulate H0 and Ha

a. Draw3. Define the level of significance

a. 1 sided or two sided test?i. 1 sided for greater or less

ii. 2 sided for equals4. Test

a. Determine the appropriate statistical testb. Establish the decision rulec. Gather sample datad. Analyze the data

5. Conclude/business application

9.2 REJECTION AND NON-REJECTION REGIONS

Via critical values (inside is non-rejection, outside is rejection region)

15

Page 16: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

9.3 USING Z-STAT

9.3.1 Testing hypothesis about a population mean using the z-stat

Z test for a single mean

z= x−μσ /√n

Where result is z, minus z from 0.5 or 1 and find on z table then look up row/column (i.e. the reverse of finding z score)

9.3.1.1 EXAMPLE QUESTION

CPA’s average net Y for sole proprietor is $74914 [statistic from 10 years ago]

Test again, n=112, σ=$14530

STEP 1: HYPOTHESISE

H0: µ=$74914

Ha: µ≠$74914

STEP 2: WHICH TEST TO USE?

Sample size is large (n>30), sample mean as stat, therefore z-stat.

z= x−μσ /√n

STEP 3: WHAT ARE THE CRITICAL VALUES?

Accuracy required: 95%, therefore α=.05

This test involves an = sign, not a ≤ or ≥ sign, so it is a two tailed test

α/2=.05/2=.025

Each side therefore has a .475 success area and a .025 fail area.

Plug .025 into z table to find zα/2 +/- 1.96

STEP 4: FIND TEST STATISTIC

Sample mean = $78695, n = 112, µ = $74914,σ=$14530

z=78695−74914σ 14530 /√112

=2.75

16

Page 17: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

STEP 5: COMPARE TO CRITICAL VALUES

Accepted range = +/- 1.96; 2.75 is not in this range, reject null hypothesis

9.3.2 Testing the mean with a finite population

z= x−μσ√n √ N−n

N−1

9.4 USING F-STAT

9.4.1 T-test for µ

P320

t= x−μs√n

df =n−1

9.5 HYPOTHESIS ABOUT A PROPORTION

z= p̂−p

√ pqp

9.6 HYPOTHESIS ABOUT A VARIANCE

P331

χ2=(n−1 ) s2

σ2

df =n−1

9.7 TYPE 2 ERRORS

When null hypothesis is false

See p 334

17

Page 18: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

18

Page 19: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

10 HYPOTHESIS TESTING [2+ POPULATIONS]

p399

10.1 Z FORMULA FOR THE DIFFERENCE IN TWO SAMPLE MEANS AND POPULATION VARIANCES

z=(x1−x2 )−(μ1−μ2 )

√( σ12

n1+σ22

n2 )μ1−μ2=0

10.1.1 Confidence intervals in estimate of μ1−μ2

(SEE P360)

10.2 T STAT FOR THE DIFFERENCE IN TWO SAMPLE MEANS (VARIANCES UNKNOWN)

(see p365)

10.2.1 Confidence intervals in estimate of μ1−μ2

(see p369)

10.3 STATISTICAL INFERENCES FOR RELATED POPULATIONS

(see p 373)

10.4 STATISTICAL INFERENCES FOR TWO POPULATION PROPORTIONS

(p383)

10.5 STATISTICAL INFERENCES FOR TWO POPULATION VARIANCES

(p390)

Ratio of two sample variances gives F value

19

Page 20: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

11 ANOVA

20

Page 21: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

12 REGRESSION [1]

12.1 S INGLE REGRESSION

y= (intercept )+c1 x1+c2 x2+…+cn xn

If regression output “p-value” is smaller than .05 reject null hypothesis and use in formula

R^2 shows “goodness” of model (0=bad, 1=good)

12.2 MULTIPLE REGRESSION

In multiple regression R^2 is inaccurate, so we have to adjust

12.3 PROBLEMS

Multi collinearity (values overlap)

21

Page 22: Types of Data - Oats and Sugar Web viewProbability of the right tail of the exponential distribution. P x≥ x 0 = e -λ x 0 . X 0. must be greater than 0. Sampling and Sampling Distributions

13 REGRESSION [2] MORE PROBLEMS

Residual is the difference between predicted and actual results

13.1 F-TEST

H0, all of the coefficients = 0

If f-stat > critical F

If significance f < alpha, reject

Testing each coefficient, change one at a time to 0, see if there is a change

22