CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg...

98
Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey Spring 2008 Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 1 / 42

Transcript of CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg...

Page 1: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

Statistics and SteganalysisCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2008

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 1 / 42

Page 2: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

Learning Outcomes

After this session, everyone shouldhow statistical methods apply to steganographyunderstand how a statistical hypothesis can be usedbe able to implement the basic χ2 test of steganalysis

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 2 / 42

Page 3: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

Suggested Reading

Core Reading

Cox et al. Chapter 13.

Suggested Reading

«Higher-order statistical steganalysis of palette images»by Jessica Fridrich, Miroslav Goljan, David Soukal in Proc. SPIEElectronic Imaging, Jan 2003, pp. 178-190

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 3 / 42

Page 4: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 4 / 42

Page 5: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

Page 6: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is the image a stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

Page 7: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

Page 8: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

Page 9: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

Page 10: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

Page 11: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 5 / 42

Page 12: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

Page 13: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

Page 14: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

Page 15: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 6 / 42

Page 16: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The remit of statistics

Statistics can estimate ‘normal’ behaviourand compare behaviours

AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 7 / 42

Page 17: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Statistical models

The remit of statistics

Statistics can estimate ‘normal’ behaviourand compare behaviours

AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 7 / 42

Page 18: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 8 / 42

Page 19: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

A typical image

Image histogram made by imhist in MatlabGives number of pixels per colour-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 9 / 42

Page 20: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

And a stego-image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 10 / 42

Page 21: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

And a stego-image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 10 / 42

Page 22: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

Page 23: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

Page 24: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

Page 25: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

Page 26: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 11 / 42

Page 27: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 12 / 42

Page 28: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 12 / 42

Page 29: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

General Introduction Histogramme

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 12 / 42

Page 30: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 13 / 42

Page 31: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

Pairs of ValuesThe statistic

Image X . Random variable Yk = #(x , y)|Xxy = kThe Yk -s is the Histogramme.

Recall that (2l , 2l + 1) is a pair of values.First 7 pixel bits determined by image colour.

i.e. which pairLast bit (LSB) determined by message

i.e. which half of the pair

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 14 / 42

Page 32: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

Pairs of ValuesExpected behaviour

Sum Y2l + Y2l+1 unaffected by embedding.For a random message

Expect 50-50 2l and 2l + 1i.e. E(Y2l) = 1

2 (Y2l + Y2l+1)

Can we make a statistic out of this?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 15 / 42

Page 33: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

Pairs of ValuesExpected behaviour

Sum Y2l + Y2l+1 unaffected by embedding.For a random message

Expect 50-50 2l and 2l + 1i.e. E(Y2l) = 1

2 (Y2l + Y2l+1)

Can we make a statistic out of this?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 15 / 42

Page 34: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

The χ2 statistic

S =∑o∈Ω

(Fo − E(Fo))2

E(Fo), (general χ2 statistic),

S =127∑l∈0

(Y2l − 12(Y2l + Y2l+1))

2

12(Y2l + Y2l+1)

. (pairs of values)

Definition

SPoV =127∑l∈0

12(Y2l − Y2l+1)

2

Y2l + Y2l+1.

#Ω− 1 degrees of freedom

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 16 / 42

Page 35: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

The χ2 PDF

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 17 / 42

Page 36: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 18 / 42

Page 37: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 18 / 42

Page 38: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 18 / 42

Page 39: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Pairs of Values

χ2 in Matlab

Defined in the Statistics toolboxSimplified functions available on website:

chi2cdfchi2pdfchi2inv

You may have to exclude pixel values which do not occurthis may give fewer degrees of freedom

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 19 / 42

Page 40: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test I visual approach

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 20 / 42

Page 41: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test I visual approach

The p-value

Let S be a stochastic χ2 distributed variableLet s be the observed χ2 statisticDefine p-value:p = P(S < s)

I.e. low p-value ⇒ s is unusually smallImprobable if the image is a stegogramme.Conclusion: probably natural image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 21 / 42

Page 42: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test I visual approach

PlotsNo message

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 22 / 42

Page 43: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test I visual approach

Plots30% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 23 / 42

Page 44: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test I visual approach

Plots60% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 24 / 42

Page 45: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test I visual approach

Plots100% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 25 / 42

Page 46: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 26 / 42

Page 47: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

Page 48: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

Page 49: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

Page 50: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

Page 51: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

Page 52: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 27 / 42

Page 53: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 54: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 55: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 56: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 57: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 58: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 59: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 60: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 28 / 42

Page 61: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

Page 62: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

Page 63: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

Page 64: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

Page 65: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 29 / 42

Page 66: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

Common misconceptions

After the test, when we have or have not rejected H0The probability that H0 is correct is not α.The probability that H0 is false is not α either.

RemarkNo simple relation between level of significance and the probability ofany hypothesis being right or wrong.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 30 / 42

Page 67: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

In Matlab

Consider the relation Threshold — Level of Significance

Pr(X > T |H0) < α

α = 1− chi2cdf(T , 127)T = chi2inv(1− α, 127)

To plot the PDFX = [0:1:300]plot ( X, chi2pdf(X,127) )

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 31 / 42

Page 68: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test Hypothesis testing

In Matlab

Consider the relation Threshold — Level of Significance

Pr(X > T |H0) < α

α = 1− chi2cdf(T , 127)T = chi2inv(1− α, 127)

To plot the PDFX = [0:1:300]plot ( X, chi2pdf(X,127) )

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 31 / 42

Page 69: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 32 / 42

Page 70: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

Page 71: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

Page 72: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

Page 73: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

Page 74: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

Page 75: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

Page 76: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 33 / 42

Page 77: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

Page 78: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

Page 79: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

Page 80: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

Page 81: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 34 / 42

Page 82: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

The weirdness of the steganalysis

H0: The message is a stegogramme.

We consider it (implicitely) serious to declare the messageinnocent when it is a stegogramme.Why?

Makes strong surveillance regime.Might be appropriate for prison scenario.

Real reasonProbability distribution known only for stegogrammes.We require known distribution under H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 35 / 42

Page 83: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

Page 84: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

Page 85: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

Page 86: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

Page 87: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

Page 88: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

Page 89: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 36 / 42

Page 90: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

Page 91: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

Page 92: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

Page 93: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 37 / 42

Page 94: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

The χ2 test The error types

A problem of the χ2 test

Accusing Alice of sending a stegogramme when she is not, iscalled false positive.Suppose false positives is a serious matter.How can we limit the risk of false positives?False positives are Type II Errors.Distribution when H0 is false is unknown

RemarkWe cannot (theoretically) bound the probability of false positives in theχ2 test.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 38 / 42

Page 95: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

Postlogue Generalised χ2 test

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 39 / 42

Page 96: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

Postlogue Generalised χ2 test

Randomised location

PoV assumes embedding in consecutive bitsGeneralised χ2 proposes a fixFridrich et al (2003) suggests an implementationNo rigid hypothesis test or statistical theory

works experimentally

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 40 / 42

Page 97: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

Postlogue Summary

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 41 / 42

Page 98: CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey ... Miroslav Goljan, David Soukal in Proc. SPIE Electronic

Postlogue Summary

Summary

Steganalysis can be cast as a problem of statisticsstandard statistical theory applies

The Pairs-of-Values χ2 test is a simple exampleThe weekly exercise is to implement and test this steganalysistechnique.

See website for detailed assignment.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 42 / 42