Descriptive Statistics

72
1 Intro to Research in Information Studies Inferential Statistics Standard Error of the Mean Significance Inferential tests you can use

Transcript of Descriptive Statistics

Page 1: Descriptive Statistics

1

Intro to Research in Information Studies

Inferential StatisticsStandard Error of the MeanSignificanceInferential tests you can use

Page 2: Descriptive Statistics

2

t =XA

—XB

XA 2 XA 2( )

n1

- XB 2 XB 2( )

n2

-( ) ( )+[ ]1

n1

1

n2

+( )x

-

(n1-1) + (n2-1)

Do you speak the language?

Page 3: Descriptive Statistics

3

Don’t Panic !

Don’t Panic !

t =XA

—XB

XA 2 XA 2( )

n1

- XB 2 XB 2( )

n2

-( ) ( )+[ ]1

n1

1

n2

+( )x

-

Compare with SD formula

(n1-1) + (n2-1)

Difference betweenmeans

Page 4: Descriptive Statistics

4

o Descriptive statistics which summarize the characteristics of a sample of data

o Inferential statistics which attempt to say something about a population on the basis of a sample of data - infer to all on the basis of some

Basic types of statistical treatment

Statistical tests are inferentialStatistical tests are inferential

Page 5: Descriptive Statistics

5

o Measures of central tendency– mean

– median – mode

o Measures of dispersion (variation)– range – inter-quartile range– variance/standard deviation

Two kinds of descriptive statistic:

Or where about on the measurement scale most of the data fall

Or where about on the measurement scale most of the data fall

Or how spread out they are

Or how spread out they are

The different measures have different sensitivity and should be used at the appropriate times…

The different measures have different sensitivity and should be used at the appropriate times…

Page 6: Descriptive Statistics

6

Symbol check

x i

i 1

n

o Sigma: Means the ‘sum of’

o Sigma (1 to n) x of i: means add all values of i from 1 to n in a data set

o Xi = the ith data point

Page 7: Descriptive Statistics

7

i = 1

n

xi

n

Refer to handout on notationRefer to handout on notation

See example on next slideSee example on next slide

Mean

Sum of all observations divided by the number of observations

In notation:

Mean uses every item of data but is sensitive to extreme ‘outliers’

Page 8: Descriptive Statistics

8

Variance and standard deviation

o A deviation is a measure of how far from the mean is a score in our datao Sample: 6,4,7,5 mean =5.5o Each score can be expressed in terms of distance

from 5.5o 6,4,7,5, => 0.5, -1.5, 1.5, -0.5 (these are distances

from mean)o Since these are measures of distance, some are

positive (greater than mean) and some are negative (less than the mean)

o TIP: Sum of these distances ALWAYS = 0

To overcome problems with range etc. we need a better measure of spread

To overcome problems with range etc. we need a better measure of spread

Page 9: Descriptive Statistics

9

Symbol check

o Called ‘x bar’; refers to the ‘mean’

o Called ‘x minus x-bar’; implies subtracting the mean from a data point x. also known as a deviation from the mean

x

(x x)

Page 10: Descriptive Statistics

10

Two ways to get SD

sd (x x)2

n

•Sum the sq. deviations from the mean•Divide by No. of observations•Take the square root of the result

•Sum the squared raw scores•Divide by N•Subtract the squared mean•Take the square root of the result

sd x2

n x

2

Page 11: Descriptive Statistics

x x

2 4 2 4 2 4 2 4 2 4 3 9 3 9 4 16 4 16 5 25

x = 29 x = 952

2

s = -2

n x 2x

= -95

102.9

2

= -9.5 8.41

= 1.09

= 1.044 If we recalculate the variance with the 60 instead of the 5 in the data…

If we recalculate the variance with the 60 instead of the 5 in the data…

Page 12: Descriptive Statistics

x x

2 4 2 4 2 4 2 4 2 4 3 9 3 9 4 16 4 16 60 3600

x = 84 x = 36702

2

s = -2

n x 2x

= -3760

108.4

2

= -367 70.56

= 296.44

= 17.22

If we include a large outlier:

Note increase in SD

Like the mean, the standard deviation uses every piece of data and is therefore sensitive to extreme values

Like the mean, the standard deviation uses every piece of data and is therefore sensitive to extreme values

Page 13: Descriptive Statistics

Mean

Two sets of data can have the same mean but different standard deviations.

The bigger the SD, the more s-p-r-e-a-d out are the data.

Page 14: Descriptive Statistics

14

On the use of N or N-1

o When your observations are the complete set of people that could be measured (parameter)

o When you are observing only a sample of potential users (statistic), the use of N-1 increases size of sd slightly

sd (x x)2

n

sd (x x)2n 1

Page 15: Descriptive Statistics

Summary

Mode •

Median •

Mean •

Range •

Interquartile Range •

Variance / Standard Deviation •

Most frequent observation. Use with nominal data‘Middle’ of data. Use with ordinal data or when data contain outliers‘Average’. Use with interval and ratio data if no outliers

Dependent on two extreme valuesMore useful than range. Often used with medianSame conditions as mean. With mean, provides excellent summary of data

Measures of Central Tendency

Measures of Dispersion

Page 16: Descriptive Statistics

16

Deviation units: Z scores

z x x

sd

Any data point can be expressed in terms of itsDistance from the mean in SD units:

A positive z score implies a value above the meanA negative z score implies a value below the mean

Andrew Dillon:

Move this to later in the course, after distributions?

Andrew Dillon:

Move this to later in the course, after distributions?

Page 17: Descriptive Statistics

17

Interpreting Z scores

o Mean = 70,SD = 6o Then a score of

82 is 2 sd [ (82-70)/6] above the mean, or 82 = Z score of 2

o Similarly, a score of 64 = a Z score of -1

o By using Z scores, we can standardize a set of scores to a scale that is more intuitive

o Many IQ tests and aptitude tests do this, setting a mean of 100 and an SD of 10 etc.

Page 18: Descriptive Statistics

18

Comparing data with Z scores

You score 49 in class A but 58 in class B

How can you compare your performance in both?

Class A: Class B:Mean =45 Mean =55SD=4 SD = 6

49 is a Z=1.0 58 is a Z=0.5

Page 19: Descriptive Statistics

19

With normal distributions

Mean, SD and Z tables

In combination provide powerful means of estimating what your data indicates

Page 20: Descriptive Statistics

20

Graphing data - the histogram

0

10

20

30

40

50

60

70

80

90

100

NumberOf errors

The categories of data we are studying, e.g., task or interface, or user group etc.

The frequency of occurrence for measure of interest,e.g., errors, time, scores on a test etc.

1 2 3 4 5 6 7 8 9 10Graph gives instant summary of data - check spread, similarity, outliers, etc.

Graph gives instant summary of data - check spread, similarity, outliers, etc.

Page 21: Descriptive Statistics

21

Very large data sets tend to have distinct shape:

0

10

20

30

40

50

60

70

80

Page 22: Descriptive Statistics

22

Normal distribution

o Bell shaped, symmetrical, measures of central tendency convergeo mean, median, mode are equal in normal

distributiono Mean lies at the peak of the curve

o Many events in nature follow this curveo IQ test scores, height, tosses of a fair

coin, user performance in tests,

Page 23: Descriptive Statistics

23

The Normal Curve

MeanMedianMode

50% of scoresfall below meanf

NB: position of measures of central tendency

NB: position of measures of central tendency

Page 24: Descriptive Statistics

24

Positively skewed distribution

Mode Median Mean

f

Note how the various measures of central tendency separate now - note the direction of the change…mode moves left of other two, mean stays highest, indicating frequency of scores less than the mean

Note how the various measures of central tendency separate now - note the direction of the change…mode moves left of other two, mean stays highest, indicating frequency of scores less than the mean

Page 25: Descriptive Statistics

25

Negatively skewed distribution

Mean Median Mode

f

Here the tendency to have higher values more common serves to increase the value of the mode

Here the tendency to have higher values more common serves to increase the value of the mode

Page 26: Descriptive Statistics

26

Other distributions

o Bimodalo Data shows 2 peaks separated by trough

o Multimodalo More than 2 peaks

o The shape of the underlying distribution determines your choice of inferential test

Page 27: Descriptive Statistics

27

Bimodal

f

MeanMedian

Mode Mode

Will occur in situations where there might be distinct groups being tested e.g., novices and experts

Note how each mode is itself part of a normal distribution (more later)

Will occur in situations where there might be distinct groups being tested e.g., novices and experts

Note how each mode is itself part of a normal distribution (more later)

Page 28: Descriptive Statistics

28

Standard deviations and the normal curve

Mean

1 sd

f

1 sd

68% of observationsfall within ± 1 s.d.

95% of observations fallwithin ± 2 s.d. (approx)

1 sd1 sd

Page 29: Descriptive Statistics

29

Z scores and tables

Knowing a Z score allows you to determine where under the normal distribution it occurs

Z score between:

0 and 1 = 34% of observations1 and -1 = 68% of observations etc.

Or 16% of scores are >1 Z score above mean

Check out Z tables in any basic stats book

Page 30: Descriptive Statistics

30

Remember:

o A Z score reflects position in a normal distribution

o The Normal Distribution has been plotted out such that we know what proportion of the distribution occurs above or below any point

Page 31: Descriptive Statistics

31

Importance of distribution

o Given the mean, the standard deviation, and some reasonable expectation of normal distribution, we can establish the confidence level of our findings

o With a distribution, we can go beyond descriptive statistics to inferential statistics (tests of significance)

Page 32: Descriptive Statistics

32

So - for your research:

o Always summarize the data by graphing it - look for general pattern of distribution

o Then, determine the mean, median, mode and standard deviation

o From these we know a LOT about what we have observed

Page 33: Descriptive Statistics

33

Inference is built on Probability

o Inferential statistics rely on the laws of probability to determine the ‘significance’ of the data we observe.

o Statistical significance is NOT the same as practical significance

o In statistics, we generally consider ‘significant’ those differences that occur less than 1:20 by chance alone

Page 34: Descriptive Statistics

34

Calculating probability

o Probability refers to the likelihood of any given event occurring out of all possible events e.g.:o Tossing a coin - outcome is either head or tail

o Therefore probability of head is 1/2o Probability of two heads on two tosses is 1/4 since

the other possible outcomes are two tails, and two possible sequences of head and tail.

o The probability of any event is expressed as a value between 0 (no chance) and 1 (certain)

At this point I ask people to take out a coin and toss it 10 times, noting the exact sequence of outcomes e.g.,

h,h,t,h,t,t,h,t,t,h.

Then I have people compare outcomes….

At this point I ask people to take out a coin and toss it 10 times, noting the exact sequence of outcomes e.g.,

h,h,t,h,t,t,h,t,t,h.

Then I have people compare outcomes….

Page 35: Descriptive Statistics

35

Sampling distribution for 3 coin tosses

00.5

11.5

22.5

33.5

0 heads 1

1 head 3

2 heads 3

3 heads 1

Page 36: Descriptive Statistics

36

Probability and normal curves

o Q? When is the probability of getting 10 heads in 10 coin tosses the same as getting 6 heads and 4 tails?o HHHHHHHHHHo HHTHTHHTHT

o Answer: when you specify the precise order of the 6 H/4T sequence:o (1/2)10 =1/1024 (specific order)o But to get 6 heads, in any order it is:

210/1024 (or about 1:5)

Page 37: Descriptive Statistics

37

What use is probability to us?

o It tells us how likely is any event to occur by chance

o This enables us to determine if the behavior of our users in a test is just chance or is being affected by our interfaces

Page 38: Descriptive Statistics

38

Determining probability

o Your statistical test result is plotted against the distribution of all scores on such a test.

o It can be looked up in stats tables or is calculated for you in EXCEL or SPSS etc

o This tells you its probability of occurrence

o The distributions have been determined by statisticians.

Introduce simple stats tables here :

Introduce simple stats tables here :

Page 39: Descriptive Statistics

39

What is a significance level?

o In research, we estimate the probability level of finding what we found by chance alone.

o Convention dictates that this level is 1:20 or a probability of .05, usually expressed as : p<.05.

o However, this level is negotiableo But the higher it is (e.g., p<.30 etc) the more likely

you are to claim a difference that is really just occurring by chance (known as a Type 1 error)

Page 40: Descriptive Statistics

40

What levels might we chose?

o In research there are two types of errors we can make when considering probability:o Claiming a significant difference when there

is none (type 1 error)o Failing to claim a difference where there is

one (type 2 error)

o The p<.05 convention is the ‘balanced’ case but tends to minimize type 1 errors

Page 41: Descriptive Statistics

41

Using other levels

o Type 1 and 2 errors are interwoven, if we lessen the probability of one occurring, we increase the chance of the other.

o If we think that we really want to find any differences that exist, we might accept a probability level of .10 or higher

Page 42: Descriptive Statistics

42

Thinking about p levels

o The p<.x level means we believe our results could occur by chance alone (not because of our manipulation) at least x/100 timeso P<.10 => our results should occur by chance 1 in

10 timeso P<.20=> our results should occur by chance 2 in 10

times

o Depending on your context, you can take your chances :)

o In research, the consensus is 1:20 is high enough…..

Page 43: Descriptive Statistics

43

Putting probability to work

o Understanding the probability of gaining the data you have can guide your decisions

o Determine how precise you need to be IN ADVANCE, not after you see the result

o It is like making a bet….you cannot play the odds after the event!

Page 44: Descriptive Statistics

44

Sampling error and the mean

o Usually, our data forms only a small part of all the possible data we could collecto All possible users do not participate in a usability

testo Every possible respondent did not answer our

questions

o The mean we observe therefore is unlikely to be the exact mean for the whole populationo The scores of our users in a test are not going to

be an exact index of how all users would perform

I find that this is the hardest part of stats for novices to grasp, since it is the bridge between descriptive and inferential stats…..needs to be explained slowly!!

I find that this is the hardest part of stats for novices to grasp, since it is the bridge between descriptive and inferential stats…..needs to be explained slowly!!

Page 45: Descriptive Statistics

45

How can we relate our sample to everyone else?

o Central limit theoremo If we repeatedly sample and calculate

means from a population, our list of means will itself be normally distributed

o Holds true even for samples taken from a skewed population distribution

o This implies that our observed mean follows the same rules as all data under the normal curve

Page 46: Descriptive Statistics

46

2 4 6 8 10 12 14 16 18

The distribution of the means forms a smaller normal distribution about the true mean:

Page 47: Descriptive Statistics

47

True for skewed distributions too

Mean

f

Plot of means from samples

Here the tendency to have higher values more common serves to increase the value of the mode

Here the tendency to have higher values more common serves to increase the value of the mode

Page 48: Descriptive Statistics

48

How means behave..

o A mean of any sample belongs to a normal distribution of possible means of samples

o Any normal distribution behaves lawfullyo If we calculate the SD of all these means,

we can determine what proportion (%) of means fall within specific distances of the ‘true’ or population mean

Page 49: Descriptive Statistics

49

But...

o We only have a sample, not the population…

o We use an estimate of this SD of means known as the Standard Error of the Mean

SE SD

N

Page 50: Descriptive Statistics

50

Implications

o Given a sample of data, we can estimate how confident we are in it being a true reflection of the ‘world’ or…

o If we test 10 users on an interface or service, we can estimate how much variability about our mean score we will find within the intended full population of users

Page 51: Descriptive Statistics

51

Example

o We test 20 users on a new interface:o Mean error score: 10, sd: 4o What can we infer about the broader user

population?

o According to the central limit theorem, our observed mean (10 errors) is itself 95% likely to be within 2 s.d. of the ‘true’ (but unknown to us) mean of the population

Page 52: Descriptive Statistics

52

The Standard Error of the Means

SE s.d .(sample)

N

4

20

4

4.470.89

Page 53: Descriptive Statistics

53

If standard error of mean = 0.89

o Then observed (sample) mean is within a normal distribution about the ‘true’ or population mean:o So we can be

o 68% confident that the true mean=10 0.89 o 95% confident our population mean = 10

1.78 o 99% confident it is within 10 2.67

o This offers a strong method of interpreting of our data

Page 54: Descriptive Statistics

54

Issues to note

o If s.d. is large and/or sample size is small, the estimated deviation of the population means will appear large.o e.g., in last example, if n=9, SE mean=1.33 o So confidence interval becomes 10 2.66 (i.e.,

we are now 95% confident that the true mean is somewhere between 7.34 and 12.66.

o Hence confidence improves as sample increases and variability lessens

o Or in other words: the more users you study, the more sure you can be….!

Page 55: Descriptive Statistics

55

Exercise:

o If the mean = 10 and the s.d.=4, what is the 68% confidence interval when we have:o 16 users?o 9 users?

o If the s.d. = 12, and mean is still 10, what is the 95% confidence interval for those N?

Answers:

9-11

8.66-11.33

4-16

2-18

Answers:

9-11

8.66-11.33

4-16

2-18

Page 56: Descriptive Statistics

56

Exercise answers:

o If the mean = 10 and the s.d.=4, what is the 68% confidence interval when we have:

16 users?= 9-11 (hint: sd/n = 4/4=1)

9 users? = 8.66-11.33

o If the s.d. = 12, and mean is still 10, what is

the 95% confidence interval for those N? 16 users: 4-16 (hint: 95% CI implies 2 SE either side of

mean)9 users: 2-18

Page 57: Descriptive Statistics

57

Recap

o Summarizing data effectively informs us of central tendencies

o We can estimate how our data deviates from the population we are trying to estimate

o We can establish confidence intervals to enable us to make reliable ‘bets’ on the effects of our designs on users

Page 58: Descriptive Statistics

58

Comparing 2 means

o The differences between means of samples drawn from the same population are also normally distributed

o Thus, if we compare means from two samples, we can estimate if they belong to the same parent population

This is the beginning of significance testing

This is the beginning of significance testing

Page 59: Descriptive Statistics

59

SE of difference between means

[x 1 x

2] 2

x 12

x 2

SEdiff .means SE(sample1)2 SE(sample2)2

This lets us set up confidence limits for the differences between the two means

Page 60: Descriptive Statistics

60

Regardless of population mean:

o The difference between 2 true measures of the mean of a population is 0

o The differences between pairs of sample means from this population is normally distributed about 0

Page 61: Descriptive Statistics

61

Consider two interfaces:

We capture 10 users’ times per task on each.

The results are:

Interface A = mean 8, sd =3Interface B = mean 10, sd=3.5

Q? - is Interface A really different?

How do we tackle this question?

Page 62: Descriptive Statistics

62

Calculate the SE difference between the means

SEa = 3/10 = 0.95

SEb= 3.5/ 10=1.11

SE a-b = (0.952+1.112) = (0.90+1.23)=1.46

Observed Difference between means= 2.0

95% Confidence interval of difference between means is 2 x(1.46) or 2.92 (i.e. we expect to find difference between 0-2.92 by chance alone).

suggests there is no significant difference at the p<.05 level.

Page 63: Descriptive Statistics

63

But what else?

We can calculate the exact probability of finding this difference by chance:Divide observed difference between the means by the SE(diff between means): 2.0/1.46 = 1.37Gives us the number of standard deviation units between two means (Z scores)Check Z table: 82% of observations are within 1.37 sd, 18% are greater; thus the precise sig level of our findings is p<.18.

Thus - Interface A is different, with rough odds of 5:1

Page 64: Descriptive Statistics

64

Hold it!

o Didn’t we first conclude there was no significant difference?o Yes, no significant difference at p<.05o But the probability of getting the differences we

observed by chance was approximately 0.18 o Not good enough for science (must avoid type 1 error),

but very useful for making a judgment on designo But you MUST specify levels you will accept BEFORE

not after….

o Note - for small samples (n<20) t- distribution is better than z distribution, when looking up probability

Page 65: Descriptive Statistics

65

Why t?

o Similar to the normal distributiono t distribution is flatter than Z for small

degrees of freedom (n-1), but virtually identical to Z when N>30

o Exact shape of t-distribution depends on sample size

Page 66: Descriptive Statistics

66

Simple t-test:

o You want all users of a new interface to score at least 70% on an effectiveness test. You test 6 users on a new interface and gain the following scores:

629275688395

Mean = 79.17Sd=13.17

Page 67: Descriptive Statistics

67

T-test:

t 79.17 70

13.17

6

9.17

5.381.71

From t-tables, we can see that this value of t exceeds t value (with 5 d.f.) for p.10 level

So we are confident at 90% level that our new interface leads to improvement

Page 68: Descriptive Statistics

68

T-test:

t 79.17 70

13.17

6

9.17

5.381.71

SE mean

Sample mean

Thus - we can still talk in confidence intervals, e.g., We are 68% confident the mean of population =79.17 5.38

Page 69: Descriptive Statistics

69

Predicting the direction of the difference

o Since you stated that you wanted to see if new Interface was BETTER (>70), not just DIFFERENT (< or > 70%), this is asking for a one-sided test….

o For a two-sided test, I just want to see if there is ANY difference (better or worse) between A and B.

Page 70: Descriptive Statistics

70

One tail (directional) test

o Tester narrows the odds by half by testing for a specific difference

o One sided predictions specify which part of the normal curve the difference observed must reside in (left or right)

o Testing for ANY difference is known as ‘two-tail’ testing,

o Testing for a directional difference (A>B) is known as ‘one-tail’ testing

Page 71: Descriptive Statistics

71

So to recap

o If you are interested only in certain differences, you are being ‘directional’ or ‘one-sided’

o Under the normal curve, random or chance differences occur equally on both sides

o You MUST state directional expectations (hypothesis) in advance

Page 72: Descriptive Statistics

72

Why would you predict the direction?

o Theoretical groundso Experience or previous findings

suggested the difference

o Practical groundso You redesigned the interface to make

it better, so you EXPECT users will perform better….