Hypothesis testing S2 - NKI Basic medical statistics for clinical and experimental research...

Click here to load reader

  • date post

    24-Mar-2018
  • Category

    Documents

  • view

    225
  • download

    2

Embed Size (px)

Transcript of Hypothesis testing S2 - NKI Basic medical statistics for clinical and experimental research...

  • 1/43

    Basic medical statistics for clinical and experimental research

    Hypothesis testingS2

    Katarzyna [email protected]

    2nd November 2015

  • 2/43

    Introduction

    I Point estimation: use a sample statistic to estimate a populationparameter of interest.

    I Interval estimation: use a 95% CI to give a 95% probability that thisinterval contains .

    I Hypothesis testing: assume a value for and make a probabilitystatement about the value of the corresponding statistic.I A statistical hypothesis is an assumption about a population parameter.I Hypothesis testing refers to the formal procedures used to accept or

    reject statistical hypotheses.

  • 3/43

    Introduction

    I Example: The average person sleeps 8 hours per day. But wouldyou say college students tend to sleep also 8 hours?I Take random sample of 100 college students and ask them how long

    they sleep on an average day.I Sample mean x = 6.5 hours.

    Is this difference just due to sampling variation? Or do college studentsreally sleep different hours than the general population?

    I Would you say students do not sleep 8 hours on average if x = 3? And ifx = 8.5?

    I Would your decision depend on something else?

  • 4/43

    Null and alternative hypothesis

    I Null hypothesis (H0): specifies (a) hypothesized true value/s for aparameter.I Always a statement about independence, no effect or equality in

    populations (e.g., no difference between means; no relationshipbetween variables).

    I Very precise statement.I Usually a statement that we wish to reject.I Keep it simple: if e.g. we compare the mean effect of a new treatment

    with a standard treatment, a simple H0 is that the treatments have thesame effect.

    I Alternative hypothesis (H1): specifies (a) true value/s for aparameter, that will be considered if H0 rejected.I Is always a statement about dependence, an effect or differences in

    populations (e.g., there is a difference between means; there is arelationship between variables).

    I Not precise statement.

  • 5/43

    Null and alternative hypothesis

    I Null hypothesis:H0 : Parameter = Parameter value

    I Alternative hypothesis:H1: Parameter 6= Parameter value

    orH1: Parameter > Parameter value

    orH1: Parameter < Parameter value

    I Parameter is the population parameter; Parameter value is thehypothesized value of the parameter

    I Example with number of hours college students sleep:H0 : = 8H1 : < 8

  • 6/43

    Null and alternative hypothesis

    I We test the null hypothesis:Assuming that the null hypothesis is correct, what is the probabilityof obtaining our pattern of results?I We either "reject the null hypothesis" or "fail to reject the null hypothesis"

    I If the probability of obtaining our pattern of results is high, the result isconsidered to be likely so we do not reject the null hypothesis.

  • 7/43

    Test statistic

    I We construct a test statistic:

    Test statistic =Statistic Parameter value

    Standard error of the statistic=

    EffectError

    I whereI Statistic is the observed sample statistic, i.e. point estimate of the

    population parameterI Parameter value is the hypothesized value of the parameter

    I Test statistic measures by how many SEs the statistic differs from itshypothesized value in H0

    I Test statistic has a sampling distributionI Example with number of hours college students sleep where

    H0 : = 8 and x = 6.5, SE=1:Test statistic = 6.581 = 1.5

  • 8/43

    Significance level and critical value

    I Level of significance, , is the probability value of the criterion forrejecting the null hypothesisI Usually = .05, sometimes = .01 or = .1I Level of significance should be chosen before conducting any null

    hypothesisI Rejection region is the set of values for the test statistic that leads

    to rejection of the null hypothesisI Non-rejection region is the set of values not in the rejection region

    that leads to non-rejection of the null hypothesisI Critical value is the value of the test statistic that separate the

    rejection and non-rejection regions

  • 9/43

    Significance level and critical value

    I One-sided test, H1: Parameter > Parameter valueI When Test statistic (1-)th quantile: we reject the null hypothesis

    (1)th

    percentile

    Rejectionregion

    Sampling distributionunder H0

  • 10/43

    Significance level and critical value

    I One-sided test, H1: Parameter < Parameter valueI When Test statistic th quantile: we reject the null hypothesis

    th percentile

    Rejectionregion

    Sampling distributionunder H0

  • 11/43

    Significance level and critical valueI Two-sided test, H1: Parameter 6= Parameter valueI When Test statistic /2th quantile or Test statistic (1-/2)th

    quantile: we reject the null hypothesis

    (1/2)th

    percentile

    /2

    Rejectionregion

    /2th percentile

    /2

    Rejectionregion

    Sampling distributionunder H0

  • 12/43

    Statistical table with critical valuesSTATISTICAL TABLES

    2

    TABLE A.2

    t Distribution: Critical Values of t Significance level

    Degrees of Two-tailed test: 10% 5% 2% 1% 0.2% 0.1% freedom One-tailed test: 5% 2.5% 1% 0.5% 0.1% 0.05%

    1 6.314 12.706 31.821 63.657 318.309 636.619 2 2.920 4.303 6.965 9.925 22.327 31.599 3 2.353 3.182 4.541 5.841 10.215 12.924 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015 2.571 3.365 4.032 5.893 6.869

    6 1.943 2.447 3.143 3.707 5.208 5.959 7 1.894 2.365 2.998 3.499 4.785 5.408 8 1.860 2.306 2.896 3.355 4.501 5.041 9 1.833 2.262 2.821 3.250 4.297 4.781 10 1.812 2.228 2.764 3.169 4.144 4.587

    11 1.796 2.201 2.718 3.106 4.025 4.437 12 1.782 2.179 2.681 3.055 3.930 4.318 13 1.771 2.160 2.650 3.012 3.852 4.221 14 1.761 2.145 2.624 2.977 3.787 4.140 15 1.753 2.131 2.602 2.947 3.733 4.073

    16 1.746 2.120 2.583 2.921 3.686 4.015 17 1.740 2.110 2.567 2.898 3.646 3.965 18 1.734 2.101 2.552 2.878 3.610 3.922 19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725 2.086 2.528 2.845 3.552 3.850

    21 1.721 2.080 2.518 2.831 3.527 3.819 22 1.717 2.074 2.508 2.819 3.505 3.792 23 1.714 2.069 2.500 2.807 3.485 3.768 24 1.711 2.064 2.492 2.797 3.467 3.745 25 1.708 2.060 2.485 2.787 3.450 3.725

    26 1.706 2.056 2.479 2.779 3.435 3.707 27 1.703 2.052 2.473 2.771 3.421 3.690 28 1.701 2.048 2.467 2.763 3.408 3.674 29 1.699 2.045 2.462 2.756 3.396 3.659 30 1.697 2.042 2.457 2.750 3.385 3.646

    32 1.694 2.037 2.449 2.738 3.365 3.622 34 1.691 2.032 2.441 2.728 3.348 3.601 36 1.688 2.028 2.434 2.719 3.333 3.582 38 1.686 2.024 2.429 2.712 3.319 3.566 40 1.684 2.021 2.423 2.704 3.307 3.551

    42 1.682 2.018 2.418 2.698 3.296 3.538 44 1.680 2.015 2.414 2.692 3.286 3.526 46 1.679 2.013 2.410 2.687 3.277 3.515 48 1.677 2.011 2.407 2.682 3.269 3.505 50 1.676 2.009 2.403 2.678 3.261 3.496

    60 1.671 2.000 2.390 2.660 3.232 3.460 70 1.667 1.994 2.381 2.648 3.211 3.435 80 1.664 1.990 2.374 2.639 3.195 3.416 90 1.662 1.987 2.368 2.632 3.183 3.402 100 1.660 1.984 2.364 2.626 3.174 3.390

    120 1.658 1.980 2.358 2.617 3.160 3.373 150 1.655 1.976 2.351 2.609 3.145 3.357 200 1.653 1.972 2.345 2.601 3.131 3.340 300 1.650 1.968 2.339 2.592 3.118 3.323 400 1.649 1.966 2.336 2.588 3.111 3.315

    500 1.648 1.965 2.334 2.586 3.107 3.310 600 1.647 1.964 2.333 2.584 3.104 3.307 1.645 1.960 2.326 2.576 3.090 3.291

    .

  • 13/43

    pvalue

    I Using distribution of a test statistic we find probability of obtainingvalue of our test statistic or more extreme valuesI pvalue is the probability of obtaining a test statistic result at least as

    extreme as the one that is actually observed, assuming that the nullhypothesis is true

    I when pvalue is small the test statistic is significant

  • 14/43

    pvalue

    I One-sided test, H1: Parameter > Parameter valueI When pvalue < : we reject the null hypothesis

    Sam

    ple valueof test statistic

    pvalue

    More extremetest statistic

    values

    Sampling distributionunder H0

  • 15/43

    pvalue

    I One-sided test, H1: Parameter < Parameter valueI When pvalue < : we reject the null hypothesis

    Sam

    ple valueof test statistic

    pvalue

    More extremetest statistic

    values

    Sampling distributionunder H0

  • 16/43

    pvalue

    I Two-sided test, H1: Parameter 6= Parameter valueI When pvalue < : we reject the null hypothesis

    Sam

    ple valueof test statistic

    Sam

    ple valueof test statistic

    pvalue/2pvalue/2

    Moreextremevalues

    Moreextremevalues

    Sampling distributionunder H0

  • 17/43

    pvalue

    I pvalue lower than 0.05 (0.1): results are "significant at the 5%(10%) level" (small enough to reject H0).I Reject H0 at a 5% significance level.I There is evidence to reject H0.

    I pvalue not lower than 0.05 (0.1): results are "not significant at the5% (10%) level".I Fail to reject H0 at a 5% significance level.I There is not enough evidence to reject H0.I But this does NOT mean that we accept that H0 is true.

    I The cut-off level 0.05 (0.1) is the significance level of the test.

  • 18/43

    Statistical table with probabilities

    z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010

    -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048

    -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183

    -1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559

    -1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379

    -0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776

    -0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.42470.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641

    Standard Normal Cumulative Probability Table

    Cumulative probabilities for NEGATIVE z-values are shown in the following table:

  • 19/43

    Statistical table with probabilities

    Standard Normal Cumulative Probability Table

    z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010

    -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048

    -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183

    -1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559

    -1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379

    -0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776

    -0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.42470.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641

    Standard Normal Cumulative Probability Table

    Cumulative probabilities for NEGATIVE z-values are shown in the following table:

    z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010

    -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048

    -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183

    -1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559

    -1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379

    -0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776

    -0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.42470.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641

    Standard Normal Cumulative Probability Table

    Cumulative probabilities for NEGATIVE z-values are shown in the following table:

  • 20/43

    Statistical table with probabilities

    I Example with number of hours college students sleep:H0 : = 8 vs H1 : < 8x = 6.5, SE=1:

    Test statistic =6.5 8

    1= 1.5,p value = 0.0668 > = 0.05

    if we assume that the test statistic has a normal distribution;when we change the alternative hypothesis to H1 : 6= 8, thenpvalue=2*0.0668=0.1336

  • 21/43

    Hypothesis testing step by step

    1. Define the null and alternative hypotheses under study.2. Collect relevant data from a sample of individuals.3. Calculate the value of the test statistic.4. Compare the value of the observed test statistic to a critical value

    or pvalue of the distribution of the test statistic under H0.5. Interpret the results.

  • 22/43

    Type I and Type II errors

    I In a hypothesis test we draw a conclusion about H0 (there isstrong/weak evidence to reject/not reject) based on a sample.We can make mistakes!

    DecisionState of nature Reject H0 Do not reject H0

    H0 true Type I error () Correct decision (1-)H0 false Correct decision (1 ) Type II error ()

    I : significance level of the test.I Probability of rejecting H0 when H0 is true.

    I : Probability of not rejecting H0 when H0 is false.I 1 : power of the test that is probability of rejecting H0 when it is false.

  • 23/43

    Power

    I Essential to know the power of a proposed test.I We want to be confident that our statistical method can correctly reject

    the null hypothesis.I It is accepted that power should be 0.8 or greater.

    I Study with a smaller power might waste our time and resources.I If the power is smaller than 0.8 we might want to replicate our study.

    I Factors relevantly related to power:I sample size power.I variability of observations power.I effects of interest power :

    A hypothesis test has a greater chance of detecting a large true effectthan a small one.

    I significance level power

  • 24/43

    Power

    Samplingdistributionassociated

    with H0

    Samplingdistributionassociated

    with H1

    power

    reject H0

  • 25/43

    Power

    Samplingdistributionassociated

    with H0

    Samplingdistributionassociated

    with H1

    power

    =0 =5

  • 26/43

    Power

    Samplingdistributionassociated

    with H0

    Samplingdistributionassociated

    with H1

    power

    =0 =4

  • 27/43

    Power

    Samplingdistributionassociated

    with H0

    Samplingdistributionassociated

    with H1

    power

    =0 =3

  • 28/43

    Power

    I Statistical significance versus Clinical relevanceI Obtaining significant results does not mean that the effect it measures is

    meaningful or important.I Small underpowered studies fail to detect clinically relevant effects as

    statistically significant.I Clinical trial comparing octreotide and sclerotherapy in patients with variceal

    bleeding (Sung et al., 1993): Sample of 100, 5% power (while reportedcalculation suggested a sample of 1800 was needed).

    I Large sample sizes with large power might show small clinicallyirrelevant effects to be statistically significant.I Reduction of blood pressure by two points between treatment and placebo.I An expensive new psychiatric treatment technique reduces the average

    hospital stay from 60 days to 59 days.

  • 29/43

    Power

    I Power analysis calculation should be used at the planning stage ofour investigation to ensure that the sample size is big enough.

    I For any power calculation we need to know:I Type of a test (e.g., independent t-test, paired t-test, ANOVA,

    regression, etc.)I The significance levelI The expected effect sizeI The sample size

    I Fixing the significance level, the expected effect size and sample size wecan calculate power

    I Fixing the significance level, the expected effect size and power we cancalculate sample size

  • 30/43

    Power

    I How large should the sample be in order to detect a specific effectsize?I One-sample case example:

    I H0 : = m vs H1 : < m or vs H1 : > m n (x

    z1+z1xm

    )2I H0 : = m vs H1 : 6= m n

    (x

    z1/2+z1xm

    )2I Two-sample case example:

    I H0 : 1 = 2 vs H1 : 1 < 2 or vs H1 : 1 > 2 n1

    (x1

    2 + x22/k

    ) ( z1+z1x1x2

    )2I H0 : 1 = 2 vs H1 : 1 6= 2 n1

    (x1

    2 + x22/k

    ) ( z1/2+z1x1x2

    )2x : the observed value of the sample meanx : the standard error of the meank = n1/n2: ratio between the sample sizes of the two groups

  • 31/43

    Power

    I z1, z1 : critical values from the standard normal distributionI = 5% z1 = 1.65, z1/2 = 1.96I = 1% z1 = 2.33, z1/2 = 2.58I power= 80% z1 = 0.84I power= 85% z1 = 1.04I power= 90% z1 = 1.28I power= 95% z1 = 1.65

  • 32/43

    Power

    I ExampleI A family sociologist is interested in the difference in marital satisfaction

    between working and notworking wives. How many women should beasked about their marital satisfaction to be able to say that workingwives are more happy than not working wives?I H0 : working wives are as happy as notworking wives;

    H1 : working wives are more happy than notworking wivesI expected effect size = 7.5I standard deviation = 25I equally sized groups, k = 1

    n1 (x1

    2 + x22/k

    ) ( z1+z1x1x2

    )2= (252 + 252/1)

    ( 1.65+0.847.5

    )2=

    137.78

  • 33/43

    PowerI Example

  • 34/43

    One-tailed vs. two-tailed

    I If e.g. we wish to test parameter :I H0 : = 5 vs. H1 : 6= 5 will require a two-tailed test (two-sided).

    I We test for the possibility of deviation from H0 in both directions.I If = 0.05, then half of it is used to test statistical significance in one direction

    ( > 5) and the other half for the other direction ( < 5).I H0 : = 5 vs. H1 : > 5 will require a one-tailed test (one-sided).

    I We completely disregard the possibility of < 5.I This provides more power to detect an effect in direction > 5 by not testing

    < 5.I Hardly ever appropriate with biomedical data.

  • 35/43

    One-tailed vs. two-tailed

    I When would a one-tailed test be appropriate?I E.g.: new drug that we believe improves other existing drug. Should we

    use a one-tailed test?I "Yes, because it has more power to detect an effect in that direction",

    "Yes, because a two-tailed test did not reject H0 but got very close to rejectingit, so a one-tailed test will do it",

    I THESE ARE NO GOOD ANSWERS...I Why? Because we fail to test that it could be actually (much) worse than the

    existing drug.

  • 36/43

    One-tailed vs. two-tailed

    I E.g.: we use a one-tailed test to test the efficacy of a cholesterollowering drug, because the drug cannot raise cholesterol (soindividuals taking the drug who have higher cholesterol level thancontrols are treated as failing to show a difference).I Imagine individuals taking the drug end up with average higher levels

    than those of the control group. Using a one-tailed test implies we saythis is just due to random variation! But what if there is an underlyingcause?

    I But, there is so much variation in medical data that nothing shouldbe excluded.

  • 37/43

    Multiple hypothesis testing

    I Example: We have three groups of participants and we want tocompare every pair of groups. We carry out three separateindependent tests with = 0.05:I 1 test: probability of a correct decision (given H0 is true) is 1-0.05=0.95,

    overall = 1 0.95 = 0.05I 2 tests: probability of a correct decision (given H0 is true) is

    (1-0.05)(1-0.05)=0.9025, overall = 1 0.9025 = 0.0975I 3 tests: probability of a correct decision (given H0 is true) is

    (1-0.05)(1-0.05)(1-0.05)=0.8573, overall = 1 0.8573 = 0.1426I The probability of making a Type I error (probability of falsely rejecting the null

    hypothesis) increases from 5% to 14.3%!

  • 38/43

    Multiple hypothesis testing

    I Familywise error rate or experimentwise error rate or overall isthe error rate across statistical tests conducted on the same data;i.e., this is the probability that at least one test erroneously rejectsthe null hypothesis:

    familywise error = 1 (0.95)number of tests

    I E.g. We carry out 20 significance tests on a dataset. The probability thatat least one of them erroneously rejects H0 is 1 (1 )20 = 64%! Wecannot identify which one(s), if any, are false positives...

    I We can use some post-hoc adjustment, control the familywise errorrate, to account for how many tests we perform (will be coveredlater this week).I E.g. Bonferroni: divide by the number of comparisons to ensure that

    the overall is below 0.05.I For 10 tests we use 0.005 as our criterion for significance

  • 39/43

    Non-parametric tests

    I Hypothesis test:I Parametric: based on knowledge of probability distribution the data

    follow.I Non-parametric: replace data with their ranks (numbers describing

    position in the ordered dataset) and make no assumption aboutprobability distribution of the data.

    I Example:

    scores: 5, 2, 6, 8, 7, 1, 10ordered scores: 1, 2, 5, 6, 7, 8, 10

    ranks: 1, 2, 3, 4, 5, 6, 7,I Useful when sample size is small and/or data are measured on a

    categorical scale.I However: less power to detect a true effect than the equivalent parametric

    test (given assumptions of this are satisfied).

  • 40/43

    Exact vs. asymptotic tests

    I Asymptotic method: p-values estimated assuming the data, given asufficiently large sample size, conform to a particular parametricdistribution.

    I But if the data is small, sparse, heavily tied, unbalanced or poorlydistributed then asymptotic method may not yield reliable results.I Obtain instead exact p-value based on exact distribution of the test

    statistic.I If dataset is too large for calculating exact p-value then Monte Carlo

    algorithms are applied.I Both methods are implemented in SPSS. We will use them during

    Practicals 3 and 4.

  • 41/43

    Hypothesis tests vs. confidence intervals

    I A CI can be used to make a decision in a similar way to ahypothesis test:I In the sleeping hours example: if the hypothesized mean value 8 lies

    outside the corresponding 95% CI for the mean then hypothesized valueis implausible and enough evidence to reject H0 (no p-value is used):I x = 6.5, SEM=1: CI=[6.5-1.96*1; 6.5+1.96*1]=[4.54; 8.46]

    8 lies inside the 95%CI

    I If a hypothesis test and a CI can do the same, why not use just oneof them?

  • 42/43

    Hypothesis tests vs. confidence intervals

    I Hypothesis testingI Hypothesis testing provides no information about the probability of H0,

    just gives the probability of finding a specific or more extreme resultswhen the null hypothesis is true.

    I Statistical significance is not the same as medical importance orbiological relevance.

    I With large sample sizes we can find statistically significant small differencesthat are not interesting (the sample estimates fall near the parameter valuein H0), while in small samples effects that are clinically relevant might not bestatistically significant.

    I Hypothesis testing is used to calculate required sample size andstatistical power.

  • 43/43

    Hypothesis tests vs. confidence intervals

    I Confidence intervalI CI does not depend on a priori hypotheses.I CI displays the entire set of believable values of the parameter, so we

    can determine whether or not the difference between the true value andthe H0 value has practical importance.