CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg...

Post on 15-Jun-2018

223 views 0 download

Transcript of CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg...

Statistics and SteganalysisCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2008

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 1 / 42

Learning Outcomes

After this session, everyone shouldhow statistical methods apply to steganographyunderstand how a statistical hypothesis can be usedbe able to implement the basic χ2 test of steganalysis

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 2 / 42

Suggested Reading

Core Reading

Cox et al. Chapter 13.

Suggested Reading

«Higher-order statistical steganalysis of palette images»by Jessica Fridrich, Miroslav Goljan, David Soukal in Proc. SPIEElectronic Imaging, Jan 2003, pp. 178-190

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 3 / 42

General Introduction Statistical models

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 4 / 42

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is the image a stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

General Introduction Statistical models

The remit of statistics

Statistics can estimate ‘normal’ behaviourand compare behaviours

AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 7 / 42

General Introduction Statistical models

The remit of statistics

Statistics can estimate ‘normal’ behaviourand compare behaviours

AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 7 / 42

General Introduction Histogramme

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 8 / 42

General Introduction Histogramme

A typical image

Image histogram made by imhist in MatlabGives number of pixels per colour-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 9 / 42

General Introduction Histogramme

And a stego-image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 10 / 42

General Introduction Histogramme

And a stego-image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 10 / 42

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

General Introduction Histogramme

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 12 / 42

General Introduction Histogramme

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 12 / 42

General Introduction Histogramme

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 12 / 42

The χ2 test Pairs of Values

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 13 / 42

The χ2 test Pairs of Values

Pairs of ValuesThe statistic

Image X . Random variable Yk = #(x , y)|Xxy = kThe Yk -s is the Histogramme.

Recall that (2l , 2l + 1) is a pair of values.First 7 pixel bits determined by image colour.

i.e. which pairLast bit (LSB) determined by message

i.e. which half of the pair

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 14 / 42

The χ2 test Pairs of Values

Pairs of ValuesExpected behaviour

Sum Y2l + Y2l+1 unaffected by embedding.For a random message

Expect 50-50 2l and 2l + 1i.e. E(Y2l) = 1

2 (Y2l + Y2l+1)

Can we make a statistic out of this?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 15 / 42

The χ2 test Pairs of Values

Pairs of ValuesExpected behaviour

Sum Y2l + Y2l+1 unaffected by embedding.For a random message

Expect 50-50 2l and 2l + 1i.e. E(Y2l) = 1

2 (Y2l + Y2l+1)

Can we make a statistic out of this?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 15 / 42

The χ2 test Pairs of Values

The χ2 statistic

S =∑o∈Ω

(Fo − E(Fo))2

E(Fo), (general χ2 statistic),

S =127∑l∈0

(Y2l − 12(Y2l + Y2l+1))

2

12(Y2l + Y2l+1)

. (pairs of values)

Definition

SPoV =127∑l∈0

12(Y2l − Y2l+1)

2

Y2l + Y2l+1.

#Ω− 1 degrees of freedom

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 16 / 42

The χ2 test Pairs of Values

The χ2 PDF

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 17 / 42

The χ2 test Pairs of Values

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 18 / 42

The χ2 test Pairs of Values

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 18 / 42

The χ2 test Pairs of Values

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 18 / 42

The χ2 test Pairs of Values

χ2 in Matlab

Defined in the Statistics toolboxSimplified functions available on website:

chi2cdfchi2pdfchi2inv

You may have to exclude pixel values which do not occurthis may give fewer degrees of freedom

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 19 / 42

The χ2 test I visual approach

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 20 / 42

The χ2 test I visual approach

The p-value

Let S be a stochastic χ2 distributed variableLet s be the observed χ2 statisticDefine p-value:p = P(S < s)

I.e. low p-value ⇒ s is unusually smallImprobable if the image is a stegogramme.Conclusion: probably natural image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 21 / 42

The χ2 test I visual approach

PlotsNo message

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 22 / 42

The χ2 test I visual approach

Plots30% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 23 / 42

The χ2 test I visual approach

Plots60% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 24 / 42

The χ2 test I visual approach

Plots100% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 25 / 42

The χ2 test Hypothesis testing

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 26 / 42

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

The χ2 test Hypothesis testing

Common misconceptions

After the test, when we have or have not rejected H0The probability that H0 is correct is not α.The probability that H0 is false is not α either.

RemarkNo simple relation between level of significance and the probability ofany hypothesis being right or wrong.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 30 / 42

The χ2 test Hypothesis testing

In Matlab

Consider the relation Threshold — Level of Significance

Pr(X > T |H0) < α

α = 1− chi2cdf(T , 127)T = chi2inv(1− α, 127)

To plot the PDFX = [0:1:300]plot ( X, chi2pdf(X,127) )

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 31 / 42

The χ2 test Hypothesis testing

In Matlab

Consider the relation Threshold — Level of Significance

Pr(X > T |H0) < α

α = 1− chi2cdf(T , 127)T = chi2inv(1− α, 127)

To plot the PDFX = [0:1:300]plot ( X, chi2pdf(X,127) )

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 31 / 42

The χ2 test The error types

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 32 / 42

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

The χ2 test The error types

The weirdness of the steganalysis

H0: The message is a stegogramme.

We consider it (implicitely) serious to declare the messageinnocent when it is a stegogramme.Why?

Makes strong surveillance regime.Might be appropriate for prison scenario.

Real reasonProbability distribution known only for stegogrammes.We require known distribution under H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 35 / 42

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

The χ2 test The error types

A problem of the χ2 test

Accusing Alice of sending a stegogramme when she is not, iscalled false positive.Suppose false positives is a serious matter.How can we limit the risk of false positives?False positives are Type II Errors.Distribution when H0 is false is unknown

RemarkWe cannot (theoretically) bound the probability of false positives in theχ2 test.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 38 / 42

Postlogue Generalised χ2 test

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 39 / 42

Postlogue Generalised χ2 test

Randomised location

PoV assumes embedding in consecutive bitsGeneralised χ2 proposes a fixFridrich et al (2003) suggests an implementationNo rigid hypothesis test or statistical theory

works experimentally

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 40 / 42

Postlogue Summary

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 41 / 42

Postlogue Summary

Summary

Steganalysis can be cast as a problem of statisticsstandard statistical theory applies

The Pairs-of-Values χ2 test is a simple exampleThe weekly exercise is to implement and test this steganalysistechnique.

See website for detailed assignment.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 42 / 42