Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more...

106
Workshop on belief functions Statistical Inference Thierry Denœux Université de Technologie de Compiègne, France HEUDIASYC (UMR CNRS 7253) https://www.hds.utc.fr/˜tdenoeux Chiang Mai University July-August 2017 Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 1 / 106

Transcript of Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more...

Page 1: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Workshop on belief functionsStatistical Inference

Thierry Denœux

Université de Technologie de Compiègne, FranceHEUDIASYC (UMR CNRS 7253)

https://www.hds.utc.fr/˜tdenoeux

Chiang Mai UniversityJuly-August 2017

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 1 / 106

Page 2: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation vs. prediction

Consider an urn with an unknown proportion θ of black ballsAssume that we have drawn n balls with replacement from the urn, y ofwhich were blackProblems

1 What can we say about θ? (estimation)2 What can we say about the color Z of the next ball to be drawn from the urn?

(prediction)

Classical approachesFrequentist: gives an answer that is correct most the time (over infinitelymany replications of the random experiment)Bayesian: assumes prior knowledge on θ and computes a posteriorpredictive probabilities f (θ|y) and P(black |y)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 2 / 106

Page 3: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Criticism of the frequentist approach

The frequentist approach makes a statement that is correct, say, for 95%of the samplesThe confidence level is often interpreted as a measure of confidence inthe statement for a particular sampleHowever, this interpretation poses some logical problems

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 3 / 106

Page 4: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Example

Suppose X1 and X2 are iid with probability mass function

Pθ(Xi = θ − 1) = Pθ(Xi = θ + 1) =12, i = 1,2, (1)

where θ ∈ R is an unknown parameter.Consider the following confidence set for θ,

C(X1,X2) =

12 (X1 + X2) if X1 6= X2

X1 − 1 otherwise.(2)

The corresponding confidence level is Pθ(θ ∈ C(X1,X2)) = 0.75Now, let (x1, x2) be a given realization of the random sample (X1,X2).

If x1 6= x2, we know for sure that θ = (x1 + x2)/2If x1 = x2, we know for sure that θ is either x1 − 1 or x1 + 1, but we have noreason to favor any of these two hypotheses in particular.

This problem is known as the problem of relevant subsets (there arerecognizable situations in which the coverage probability is different fromthe stated one)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 4 / 106

Page 5: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

The relevant subset problem

This phenomenon happens in the usual problem of interval estimation ofthe mean of a normal sample: “wide” CIs in some sense have largercoverage probability than the stated confidence level, and vice versa for“short” intervals.Specifically, let X1, . . . ,Xn be an iid sample from N (µ, σ2) with bothparameters unknown, and

C =

(x1, . . . , xn) ∈ Rn| s

|x |> k

for some kthe standard CI for µ is x ± tn−1;1−α/2s/

√n

It can be shown that, for some ε > 0,

P(µ ∈ CI|C) > (1− α) + ε

for all µ and σ“The existence of certain relevant subsets is an embarrassment toconfidence theory” (Lehmann, 1986)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 5 / 106

Page 6: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Criticism of the Bayesian approach

In the Bayesian approach, y , z and θ are seen as random variablesEstimation: compute the posterior pdf of θ given y

f (θ|y) ∝ p(y |θ)f (θ)

where f (θ) is the prior pdf on θPrediction: compute the predictive posterior distribution

p(z|y) =

∫p(z|θ)f (θ|y)dθ

We need the prior f (θ)!We have seen that the uniform prior is dependent on the parametrization;consequently, it is not truly noninformative (wine and water paradox)Another solution: Jeffrey’s prior

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 6 / 106

Page 7: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Jeffrey’s prior

The Jeffreys prior is defined objectively as being proportional to thesquare root of the determinant of the Fisher information

π(θ) ∝√

det I(θ),

where the component (i , j) of the information matrix I(θ)ij is

I(θ)ij = Eθ[∂ log fθ(x)

∂θi

∂ log fθ(x)

∂θj

].

The motivation for this definition is that the Jeffreys prior is invariantunder reparameterization: if ϕ is a one-to-one transformation andν = ϕ(θ), then the Jeffreys prior on ν is proportional to

√det I(ν).

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 7 / 106

Page 8: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Problems with Jeffrey’s prior

However, there are still some issues with this approach:First, the Jeffreys prior is sometimes improper.Secondly, and maybe more importantly, the Jeffreys prior can hardly beconsidered to be truly noninformative.

For instance, consider an iid sample X1, . . . ,Xn from a Bernoullidistribution B(θ). The Jeffreys prior on θ is the beta distributionB(0.5,0.5) whose pdf is displayed below. We can see that extremevalues of θ are considered a priori more probable that central values,which does represent non vacuous knowledge about θ.

0.0 0.2 0.4 0.6 0.8 1.0

1.0

1.5

2.0

2.5

3.0

theta

f(th

eta)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 8 / 106

Page 9: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Main ideas

None of the classical approaches to statistical inference (frequentist andBayesian) is fully satisfactory, from a conceptual point of viewProposal of a new approach based on belief functionsThe new approach boils down to Bayesian inference when a probabilisticprior is available, but it does not require the user to provide such a priorYou will apply this approach to different econometric modelsBefore applying belief functions to statistical inference, we need to definebelief functions on infinite spaces

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 9 / 106

Page 10: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 10 / 106

Page 11: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Definition

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 11 / 106

Page 12: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Definition

Belief function: general definition

Let Ω be a set (finite or not) and B be an algebra of subsets of Ω

A belief function (BF) on B is a mapping Bel : B → [0,1] verifyingBel(∅) = 0, Bel(Ω) = 1 and the complete monotonicity property: for anyk ≥ 2 and any collection B1, . . . ,Bk of elements of B,

Bel

(k⋃

i=1

Bi

)≥

∑∅6=I⊆1,...,k

(−1)|I|+1Bel

(⋂i∈I

Bi

)

A function Pl : B → [0,1] is a plausibility function iff Bel : B → 1− Pl(B)is a belief function

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 12 / 106

Page 13: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Definition

Source

s Γ(s)

Γ(S,A,P) (Ω,B)

Let S be a state space, A an algebra of subsets of S, P a finitely additiveprobability on (S,A)

Let Ω be a set and B an algebra of subsets of Ω

Γ a multivalued mapping from S to 2Ω

The four-tuple (S,A,P, Γ) is called a sourceUnder some conditions, it induces a belief function on (Ω,B)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 13 / 106

Page 14: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Definition

Strong measurability

Γ"(S,A,P)" (Ω,B)"

B B*#B*#

Lower and upper inverses: for all B ∈ B,

Γ∗(B) = B∗ = s ∈ S|Γ(s) 6= ∅, Γ(s) ⊆ B

Γ∗(B) = B∗ = s ∈ S|Γ(s) ∩ B 6= ∅

Γ is strongly measurable wrt A and B if, for all B ∈ B, B∗ ∈ A(∀B ∈ B,B∗ ∈ A)⇔ (∀B ∈ B,B∗ ∈ A)

A strongly measurable multi-valued mapping Γ is called a random set

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 14 / 106

Page 15: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Definition

Belief function induced by a sourceLower and upper probabilities

Γ"(S,A,P)" (Ω,B)"

B B*#B*#

Lower and upper probabilities:

∀B ∈ B, P∗(B) =P(B∗)P(Ω∗)

, P∗(B) =P(B∗)P(Ω∗)

= 1− Bel(B)

P∗ is a BF, and P∗ is the dual plausibility functionConversely, for any belief function, there is a source that induces it(Shafer’s thesis, 1973)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 15 / 106

Page 16: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Definition

Interpretation

s Γ(s)

Γ(S,A,P) (Ω,B)

Typically, Ω is the domain of an unknown quantity ω, and S is a set ofinterpretations of a given piece of evidence about ωIf s ∈ S holds, then the evidence tells us that ω ∈ Γ(s), and nothing moreThen

Bel(B) is the probability that the evidence supports BPl(B) is the probability that the evidence is consistent with B

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 16 / 106

Page 17: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Practical models

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 17 / 106

Page 18: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Practical models

Consonant belief functionSource

ω"

π(ω)#

Γ(s)#

s#

1#

0#

Let π be a mapping from Ω = Rp to S = [0,1] s.t. supπ = 1Let Γ be the multi-valued mapping from S to 2Ω defined by

∀s ∈ [0,1], Γ(s) = ω ∈ Ω|π(ω) ≥ s

Let B([0,1]) be the Borel σ-field on [0,1], and P the uniform probabilitymeasure on [0,1]

We consider the source ([0,1],B([0,1]),P, Γ)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 18 / 106

Page 19: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Practical models

Consonant belief functionProperties

Let Bel and Pl be the belief and plausibility functions induced by([0,1],B([0,1]),P, Γ)

The focal sets Γ(s) are nested, i.e., for any s and s′,

s ≥ s′ ⇒ Γ(s) ⊆ Γ(s′)

The belief function is said to be consonant.The corresponding contour function pl is equal to πThe corresponding plausibility function is a possibility measure: for anyB ⊆ Ω,

Pl(B) = supω∈B

pl(ω)

Bel(B) = infω 6∈B

(1− pl(ω))

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 19 / 106

Page 20: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Practical models

Random closed interval

(S,A,P)'

U(s)'

V(s)'

s'

(U,V)'

Let (U,V ) be a bi-dimensional random vector from a probability space(S,A,P) to R2 such that U ≤ V a.s.Multi-valued mapping:

Γ : s → Γ(s) = [U(s),V (s)]

The source (S,A,P, Γ) is a random closed interval. It defines a BF on(R,B(R))

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 20 / 106

Page 21: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Practical models

Random closed intervalProperties

Lower/upper cdfs:

Bel ((−∞, x ]) = P([U,V ] ⊆ (−∞, x ]) = P(V ≤ x) = FV (x)

Pl ((−∞, x ]) = P([U,V ] ∩ (−∞, x ] 6= ∅) = P(U ≤ x) = FU(x)

Lower/upper expectation:

E∗(Γ) = E(U)

E∗(Γ) = E(V )

Lower/upper quantiles

q∗(α) = F−1U (α),

q∗(α) = F−1V (α).

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 21 / 106

Page 22: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Combination and propagation

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 22 / 106

Page 23: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Combination and propagation

Dempster’s ruleDefinition

s1!

Γ1(s1)!

Γ1!

(S1,A1,P1)!

(Ω,B)!

s2!

Γ2!

(S2,A2,P2)! Γ2(s2)!

Let (Si ,Ai ,Pi , Γi ), i = 1,2 be two sources representing independent itemsof evidence, inducing BF Bel1 and Bel2The combined BF Bel = Bel1 ⊕ Bel2 is induced by the source(S1 × S2,A1 ⊗A2,P1 ⊗ P2, Γ∩) with

Γ∩(s1, s2) = Γ1(s1) ∩ Γ2(s2)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 23 / 106

Page 24: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Combination and propagation

Dempster’s ruleDefinition

For each B ∈ B, Bel(B) is the conditional probability that Γ∩(s) ⊆ B,given that Γ∩(s) 6= ∅:

Bel(B) =P ((s1, s2) ∈ S1 × S2|Γ∩(s1, s2) 6= ∅, Γ∩(s1, s2) ⊆ B)

P((s1, s2) ∈ S1 × S2|Γ∩(s1, s2) 6= ∅)

It is well defined iff the denominator is non nullAs in the finite case, the degree of conflict between the belief functionscan be defined as one minus the denominator in the above equation.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 24 / 106

Page 25: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Combination and propagation

Approximate computationMonte Carlo simulation

Require: Desired number of focal sets Ni ← 0while i < N do

Draw s1 in S1 from P1Draw s2 in S2 from P2Γ∩(s1, s2)← Γ1(s1) ∩ Γ2(s2)if Γ∩(s1, s2) 6= ∅ then

i ← i + 1Bi ← Γ∩(s1, s2)

end ifend whileBel(B)← 1

N #i ∈ 1, . . . ,N|Bi ⊆ BPl(B)← 1

N #i ∈ 1, . . . ,N|Bi ∩ B 6= ∅

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 25 / 106

Page 26: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Combination and propagation

Combination of dependent evidence

sΓ1(s1)

Γ1

(S,A,P)(Ω,B)

Γ2Γ2(s2)

The case of complete dependence between two pieces of evidence canbe modeled by two sources formed by different multivalued mappings Γ1and Γ2 from the same probability space.The combined BF is induced by the source (S,A,P, Γ∩)

This combination rule preserves consonance: the combination of twoconsonant BFs is still consonant.This is the rule used in Possibility Theory.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 26 / 106

Page 27: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Belief functions on infinite spaces Combination and propagation

Propagation of belief functions

Assume that a quantity Z is defined as function of two other quantities Xand Y

Z = ϕ(X ,Y )

Given BFs BelX and BelY on X and Y , what is the BF BelZ on Z?Solution:

BelZ = (BelX↑XYZ ⊕ BelY↑XYZ ⊕ Belϕ)↓Z

For any A ⊆ ΩX and B ⊆ ΩY ,

(A ↑ ΩXYZ ) ∩ (B ↑ ΩXYZ ) ∩ Rϕ = ϕ(A,B)

Consequently, if BelX and BelY are induced by random sets Γ(U) andΛ(V ), where U and V are independent rvs, then BelZ is induced by theRS

ϕ(Γ(U),Λ(V ))

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 27 / 106

Page 28: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 28 / 106

Page 29: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation

Parameter estimation

Let y ∈ Y denote the observed data and fθ(y) the probability mass ordensity function describing the data-generating mechanism, where θ ∈ Θis an unknown parameterHaving observed y , how to quantify the uncertainty about Θ, withoutspecifying a prior probability distribution?Different approaches have been proposed by Dempster (1968), Shafer(1976) and more recently, Martin and Liu (2016)Here, I will emphasize Shafer’s Likelihood-based solution (Shafer, 1976;Wasserman, 1990; Denœux, 2014), which is (much) simpler toimplement, and connects nicely with the “likelihoodist” approach tostatistical inference.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 29 / 106

Page 30: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Justification

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 30 / 106

Page 31: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Justification

The likelihood principleDefinition (Birnbaum, 1962)

Let E denote a statistical model representing an experimental situation.Typically, E is composed of the parameter space Θ, the sample space Xand a probability mass or density function f (x ; θ) for each θ ∈ Θ.Let us denote by Ev(E , x) the evidential meaning of the specifiedinstance (E , x) of statistical evidence.The likelihood Principle (L) can be stated as follows:

If E and E ′ are any two experiments with the same parameter space Θ,represented by probability mass or density functions fθ(x) and gθ(y), and if xand y are any two respective outcomes which determine likelihood functionssatisfying fθ(x) = cgθ(y) for some positive constant c = c(x , y) and all θ ∈ Θ,

then Ev(E , x) = Ev(E ′, y).

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 31 / 106

Page 32: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Justification

Frequentist methods violate (L)

For instance, consider an urn with a proportion θ of black balls, and thefollowing two experiments:

Experiment 1: a fixed number n of balls are drawn with replacement from theurn and the number X of black balls is observed; X has a binomialdistribution B(n, θ).Experiment 2: balls are drawn with replacement from the urn until a fixednumber x of black balls have been drawn; we observe the number N ofdraws, which has a negative binomial distribution.

Confidence intervals computed in these two cases are different, althoughthe likelihood functions for these two experiments are identical.This is because confidence intervals (and significance tests) depend notonly on the likelihood, but also on the sample space.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 32 / 106

Page 33: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Justification

Justification of (L) (Birnbaum, 1962)

Birnbaum (1962) showed that (L) can be derived from the principles ofsufficiency (S) and conditionality (C), which can be stated as follows:

The principle of sufficiency (S) Let E be an experiment, with samplespace x, and let t(x) is any sufficient statistic (i.e., any statistic suchthat the conditional distribution of x given t does not depend on θ). Let E ′

be an experiment, derived from E , having the same parameter space,such that when any outcome x of E is observed the correspondingoutcome t = t(x) of E ′ is observed. Then for each x ,Ev(E , x) = Ev(E ′, t), where t = t(x).The principle of conditionality (C) If E is mathematically equivalent to amixture of component experiments Eh, with possible outcomes (Eh, xh),then Ev(E , (Eh, xh)) = Ev(Eh, xh).

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 33 / 106

Page 34: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Justification

Meaning of (C)

(C) means that component experiments that might have been carried out,but in fact were not, are irrelevant once we know that Eh has been carriedout.For instance, assume that two measuring instruments providemeasurements x1 and x2 of an unknown quantity θ. An instrument ispicked at random (experiment E). Assume we know that the firstinstrument (h = 1) is selected and we observe x1. Then, the fact that thesecond instrument could have been selected is irrelevant and the over-allstructure of the original experiment E can be ignored.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 34 / 106

Page 35: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 35 / 106

Page 36: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Likelihood-based belief functionRequirements

Let BelΘy be a belief function representing our knowledge about θ after

observing y . We impose the following requirements:1 Likelihood principle: BelΘ

y should be based only on the likelihood function

θ → Ly (θ) = fθ(y)

2 Compatibility with Bayesian inference: when a Bayesian prior P0 isavailable, combining it with BelΘ

y using Dempster’s rule should yield theBayesian posterior:

BelΘy ⊕ P0 = P(·|y)

3 Principle of minimal commitment: among all the belief functions satisfyingthe previous two requirements, BelΘ

y should be the least committed (leastinformative)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 36 / 106

Page 37: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Likelihood-based belief functionSolution (Denœux, 2014)

BelΘy is the consonant belief function induced by the relative likelihood

functionply (θ) =

Ly (θ)

Ly (θ)

where θ is a MLE of θ, and it is assumed that Ly (θ) < +∞Corresponding plausibility function

PlΘy (H) = sup

θ∈Hply (θ), ∀H ⊆ Θ

θ

pl_y(θ)&

H&

Pl_y(H)&

1&

0&

θ&

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 37 / 106

Page 38: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Source

Corresponding random set:

Γy (s) =

θ ∈ Θ|

Ly (θ)

Ly (θ)≥ s

with s uniformly distributed in [0,1]

θ

pl_y(θ)&

Γy(s)&

s

1&

0&

If Θ ⊆ R and if Ly (θ) is unimodal and upper-semicontinuous, then BelΘy

corresponds to a random closed interval

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 38 / 106

Page 39: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Binomial example

In the urn model, Y ∼ B(n, θ) and

ply (θ) =θy (1− θ)n−y

θy (1− θ)n−y=

θ

)nθ (1− θ1− θ

)n(1−θ)

for all θ ∈ Θ = [0,1], where θ = y/n is the MLE of θ

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

θ

plx(θ

)

n=10n=20n=100

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 39 / 106

Page 40: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Uniform example

Let y = (y1, . . . , yn) be a realization from an iid random sample fromU([0, θ])The likelihood function is

Ly (θ) = θ−n1[y(n),+∞)(θ)

The likelihood-based BF is induced by the random closed interval[y(n), y(n)S−1/n], with S ∼ U([0,1])

0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

θ

ply(θ

)

n=5n=10n=50

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 40 / 106

Page 41: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Profile likelihood

Assume that θ = (ξ,ν) ∈ Ωξ ×Ων , where ξ is a parameter of interest andν is a nuisance parameterThen, the marginal contour function for ξ is

ply (ξ) = Pl(ξ × Ων) = supν∈Ων

ply (ξ,ν),

which is the profile relative likelihood functionThe profiling method for eliminating nuisance parameter thus has anatural justification in our approachWhen the quantities ply (ξ) cannot be derived analytically, they have to becomputed numerically using an iterative optimization algorithm

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 41 / 106

Page 42: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Relation with likelihood-based inference

The approach to statistical inference outlined here is very close to the“likelihoodist” approach advocated by Birnbaum (1962), Barnard (1962),and Edwards (1992), among othersThe main difference resides in the interpretation of the likelihood functionas defining a belief functionThis interpretation allows us to quantify the uncertainty in statements ofthe form θ ∈ H, where H may contain multiple values. This is in contrastwith the classical likelihood approach, in which only the likelihood ofsingle hypotheses is definedThe belief function interpretation provides an easy and natural way tocombine statistical information with other information, such as expertjudgements

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 42 / 106

Page 43: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Relation with the likelihood-ratio test statistics

We can also notice that PlΘy (H) is identical to the likelihood ratio statistic

for HFrom Wilk’s theorem, we have asymptotically (under regularityconditions), when H holds,

−2 ln Ply (H) ∼ χ2r

where r is the number of restrictions imposed by HConsequently,

rejecting hypothesis H if its plausibility is smaller than exp(−χ2r ;1−α/2) is a

testing procedure with significance level approximately equal to αThe sets Γ(exp(−χ2

r ;1−α/2)) are approximate 1− α confidence regions

However, these properties are coincidental, as the approach outlinedhere is not based on frequentist inference

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 43 / 106

Page 44: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Combination with a Bayesian prior

The likelihood-based method described here does not require any priorknowledge of θ.However, by construction, this approach boils down to Bayesian inferenceif a prior probability g(θ) is provided and combined with BelΘ

y byDempster’s rule.As it will usually not be possible to compute the analytical expression ofthe resulting posterior distribution, it can be approximated by Monte Carlosimulation. (see next slide)We can see that this is just the rejection sampling algorithm with the priorg(θ) as proposal distribution.The rejection sampling algorithm can thus be seen, in this case, as aMonte Carlo approximation to Dempster’s rule of combination.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 44 / 106

Page 45: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Likelihood-based belief function

Combination with a Bayesian prior (continued)

Monte Carlo algorithm for combining the likelihood-based belief function witha Bayesian prior by Dempster’s rule

Require: Desired number of focal sets Ni ← 0while i < N do

Draw s in [0,1] from the uniform probability measure λ on [0,1]Draw θ from the prior probability distribution g(θ)if ply (θ) ≥ s then

i ← i + 1θi ← θ

end ifend while

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 45 / 106

Page 46: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 46 / 106

Page 47: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Behrens-Fisher problem

Let the observed data y be composed of two independent normalsamples y1 = (y11, . . . , y1n1 ) and y2 = (y21, . . . , y2n2 ) from N (µ1, σ

21) and

N (µ2, σ22), respectively.

We wish to compare the means µ1 and µ2.Using the frequentist approach, this is done by computing a p-value forthe hypothesis H0 : µ1 = µ2 of equality of means, or a confidence intervalon µ1 − µ2. This problem, known as the Behrens-Fisher problem, onlyhas approximate solutionsUsing our approach, the means are compared by computing theplausibility of H0 or, more generally, of Hδ : µ1 − µ2 = δ

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 47 / 106

Page 48: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Belief function solution

The marginal contour function for (µ1, µ2) is

ply (µ1, µ2) = supσ1,σ2

ply (θ)

=

∏2k=1 Lyk

(µk , σk (µk ))∏2k=1 Lyk

(yk , sk ),

where

σk (µk ) =1nk

nk∑i=1

(yki − µk )2.

The plausibility of Hδ = (µ1, µ2) ∈ R2|µ1 − µ2 = δ can then becomputed by maximizing ply (µ1, µ2) under the constraint µ1 − µ2 = δ, i.e.,

Ply (Hδ) = maxµ1

ply (µ1, µ1 − δ)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 48 / 106

Page 49: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Example (Lehman, 1975)We consider the following driving times from a person’s house to workmeasured for two different routes: y1 = (6.5,6.8,7.1,7.3,10.2) andy2 = (5.8,5.8,5.9,6.0,6.0,6.0,6.3,6.3,6.4,6.5,6.5). Are the mean travelingtimes equal?

µ1

µ 2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

6 7 8 9

5.9

6.0

6.1

6.2

6.3

6.4

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 49 / 106

Page 50: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Linear regressionModel

We consider the following standard regression model

y = Xβ + ε

wherey = (y1, . . . , yn)′ is the vector of n observations of the dependent variableX is the fixed design matrix of size n × (p + 1)

ε = (ε1, . . . , εn)′ ∼ N (0, In) is the vector of errorsThe vector of coefficients is θ = (β′, σ)′

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 50 / 106

Page 51: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Likelihood-based belief function

The likelihood function for this model is

Ly (θ) = (2πσ2)−n/2 exp[− 1

2σ2 (y − Xβ)′(y − Xβ)

]The contour function can thus be readily calculated as

ply (θ) =Ly (θ)

Ly (θ)

with θ = (β′, σ)′, where

β = (X ′X )−1X ′y is the ordinary least squares estimate of βσ is the standard deviation of residuals

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 51 / 106

Page 52: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Plausibility of linear hypotheses

Assertions (hypotheses) H of the form Aβ = q, where A is a r × (p + 1)constant matrix and q is a constant vector of length r , for some r ≤ p + 1Special cases: βj = 0, βj = 0,∀j ∈ 1, . . . ,p, or βj = βk, etc.The plausibility of H is

PlΘy (H) = sup

Aβ=qply (θ) =

Ly (θ∗)

Ly (θ)

where θ∗ = (β′∗, σ∗)

′ (restricted LS estimates) with

β∗ = β − (X ′X )−1A′[A(X ′X )−1A′]−1(Aβ − q)

σ∗ =

√(y − X β∗)′(y − X β∗)/n

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 52 / 106

Page 53: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Example: movie Box office data

Dataset about 62 movies released in 2009 (from Greene, 2012)Dependent variable: logarithm of Box Office receipts11 covariates:

3 dummy variables (G, PG, PG13) to encode the MPAA (Motion PictureAssociation of America) rating, logarithm of budget (LOGBUDGET), starpower (STARPOWR),a dummy variable to indicate if the movie is a sequel (SEQUEL),four dummy variables to describe the genre ( ACTION, COMEDY,ANIMATED, HORROR)one variable to represent internet buzz (BUZZ)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 53 / 106

Page 54: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Some marginal contour functions

−0.5 0.0 0.5 1.0

0.0

0.2

0.4

0.6

0.8

1.0

LOGBUDGET

Pl

−0.8 −0.6 −0.4 −0.2 0.0 0.2

0.0

0.2

0.4

0.6

0.8

1.0

BUZZ

Pl

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 54 / 106

Page 55: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Regression coefficients

Estimate Std. Error t-value p-value Pl(βj = 0)

(Intercept) 15.400 0.643 23.960 < 2e-16 1.0e-34G 0.384 0.553 0.695 0.49 0.74PG 0.534 0.300 1.780 0.081 0.15PG13 0.215 0.219 0.983 0.33 0.55LOGBUDGET 0.261 0.185 1.408 0.17 0.30STARPOWR 4.32e-3 0.0128 0.337 0.74 0.93SEQUEL 0.275 0.273 1.007 0.32 0.54ACTION -0.869 0.293 -2.964 4.7e-3 6.6e-3COMEDY -0.0162 0.256 -0.063 0.95 0.99ANIMATED -0.833 0.430 -1.937 0.058 0.11HORROR 0.375 0.371 1.009 0.32 0.54BUZZ 0.429 0.0784 5.473 1.4e-06 4.8e-07

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 55 / 106

Page 56: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Adaptation of flood defense structures to sea level rise

Commonly, flood defenses in coastal areas are designed to withstand atleast 100 years return period events.However, due to climate change, they will be subject during their life timeto higher loads than the design estimations.The main impact is related to the increase of the mean sea level, whichaffects the frequency and intensity of surges.For adaptation purposes, we need to combine

statistics of extreme sea levels derived from historical dataexpert judgement about the future sea level rise (SLR)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 56 / 106

Page 57: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Model

The annual maximum sea level Z at a given location is often assumed tohave a Gumbel distribution

P(Z ≤ z) = exp[−exp

(−z − µ

σ

)]with mode µ and scale parameter σCurrent design procedures are based on the return level zT associated toa return period T , defined as the quantile at level 1− 1/T . Here,

zT = µ− σ log[− log

(1− 1

T

)]Because of climate change, it is assumed that the distribution of annualmaximum sea level at the end of the century will be shifted to the right,with shift equal to the SLR:

z ′T = zT + SLR

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 57 / 106

Page 58: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Approach

1 Represent the evidence on zT by a likelihood-based belief function usingpast sea level measurements

2 Represent the evidence on SLR by a belief function describing expertopinions

3 Combine these two items of evidence to get a belief function onz ′T = zT + SLR

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 58 / 106

Page 59: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Sea level data at Le Havre, France (15 years)

830 840 850 860 870 880 890 900 9100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F

z

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 59 / 106

Page 60: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Contour functions

pl(z100, µ) pl(z100)

0.01

0.01

0.01

0.01

0.01 0.01

0.1

0.1

0.1

0.10.

3

0.3

0.3

0.5

0.5

0.7

0.7

0.9

µ

z T

8.4 8.45 8.5 8.55 8.6 8.658.7

8.8

8.9

9

9.1

9.2

9.3

9.4

9.5

9.6

9.7

8.8 9 9.2 9.4 9.6 9.8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

zT

pl(z

T)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 60 / 106

Page 61: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Representation of expert opinions about the SLR

From a review of the literature (in 2007)The interval [0.5, 0.79] = [0.18, 0.79] ∩ [0.5, 1.4] seems to be fully supportedby the available evidenceValues outside the interval [0, 2] are considered as practically impossible

Three representations:Consonant random intervals with core [0.5, 0.79], support [0, 2] and differentcontour functions π;p-boxes with same cumulative belief and plausibility functions as above;Random sets [U,V ] with independent U and V and same cumulative beliefand plausibility functions as above.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 61 / 106

Page 62: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Representation of expert opinions about the SLR

Contour functions Cumulative Bel and Pl

0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SLR

π(S

LR)

0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SLR

F *(SLR

), F

* (SLR

)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 62 / 106

Page 63: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

CombinationPrinciple

Let [UzT ,VzT ] and [USLR ,VSLR] be the independent random intervalsrepresenting evidence on zT and SLR, respectively.The random interval for z ′T = zT + SLR is

[UzT ,VzT ] + [USLR ,VSLR] = [UzT + USLR ,VzT + VSLR]

The corresponding belief and plausibility functions are

Bel(A) = P([UzT + USLR ,VzT + VSLR] ⊆ A)

Pl(A) = P([UzT + USLR ,VzT + VSLR] ∩ A 6= ∅)

for all A ∈ B(R).Bel(A) and Pl(A) can be estimated by Monte Carlo simulation.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 63 / 106

Page 64: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Examples

Result

8.5 9 9.5 10 10.5 11 11.5 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

z’T

pl(z

’ T)

linearconvexconcaveconstant

8.5 9 9.5 10 10.5 11 11.5 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

z

Bel

(z’ T

<=

z),

Pl(z

’ T <

= z

)

linearconvexconcaveconstant

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 64 / 106

Page 65: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Consistency

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 65 / 106

Page 66: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Consistency

Consistency of the likelihood-based belief function

Assume that the observed data y = (y1, . . . , yn) is a realization of an iidsample Y = (Y1, . . . ,Yn) from Y ∼ fθ(y)

From Fraser (1968):

Theorem

If Eθ0 [log fθ(Y )] exists, is finite for all θ, and has a unique maximum at θ0,then, for any θ 6= θ0, pln(θ)→ 0 almost surely under the law determined by θ0

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 66 / 106

Page 67: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Estimation Consistency

Consistency of the likelihood-based belief function(continued)

The property pln(θ0)→ 1 a.s. does not hold in general (under regularityassumptions, −2 log pln(θ0) converges in distribution to χ2

p)But we have the following theorem:

TheoremUnder some assumptions (Fraser, 1968), for any neighborhood N of θ0,BelΘ

n (N)→ 1 and PlΘn (N)→ 1 almost surely under the law determined by θ0

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 67 / 106

Page 68: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 68 / 106

Page 69: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 69 / 106

Page 70: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Prediction problem

Observed (past) data: y from Y ∼ fθ(y)

Future data: Z |y ∼ Fθ,y (z) (real random variable)Problem: quantify the uncertainty of Z using a predictive belief function

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 70 / 106

Page 71: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Outline of the approach (1/2)

Let us come back to the urn exampleLet Z ∼ B(θ) be defined as

Z =

1 if next ball is black0 otherwise

We can write Z as a function of θ and a pivotal variable W ∼ U([0,1]),

Z =

1 if W ≤ θ0 otherwise

= ϕ(θ,W )

0" 1"θW"

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 71 / 106

Page 72: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Outline of the approach (2/2)

The equalityZ = ϕ(θ,W )

allows us to separate the two sources of uncertainty on Z1 uncertainty on W (random/aleatory uncertainty)2 uncertainty on θ (estimation/epistemic uncertainty)

Two-step method:1 Represent uncertainty on θ using a likelihood-based belief function BelΘ

y

constructed from the observed data y (estimation problem)2 Combine BelΘ

y with the probability distribution of W to obtain a predictivebelief function BelZy

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 72 / 106

Page 73: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

ϕ-equation

W"

Z=Fθ&1(W)"

z"

Fθ(z)"

0"

1"

We can always write Z as a function of θ and W as

Z = F−1θ,y (W ) = ϕy (θ,W )

where W ∼ U([0,1]) and F−1θ,y is the generalized inverse of Fθ,y ,

F−1θ,y (W ) = infz|Fθ,y (z) ≥W

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 73 / 106

Page 74: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Main result

Z" W"

θ

Z=ϕy"(θ,W)"

Likelihood1based"BF"BelyΘ

Uniform"dist."λ

After combination by Dempster’s rule and marginalization on Z, we obtain thepredictive BF on Z induced by the multi-valued mapping

(s,w)→ ϕy (Γy (s),w).

with (s,w) uniformly distributed in [0,1]2

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 74 / 106

Page 75: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Graphical representation

θ

ply(θ)

s%

Γy(s)%

w%

ϕy%

z%1%

0%

ϕy(Γy(s),w)%s%

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 75 / 106

Page 76: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Practical computation

Analytical expression when possible (simple cases), orMonte Carlo simulation:

1 Draw N pairs (si ,wi ) independently from a uniform distribution2 compute (or approximate) the focal sets ϕy (Γy (si ),wi )

The predictive belief and plausibility of any subset A ⊆ Z are thenestimated by

BelZy (A) =

1N

#i ∈ 1, . . . ,N|ϕy (Γy (si ),wi ) ⊆ A

PlZy (A) =

1N

#i ∈ 1, . . . ,N|ϕy (Γy (si ),wi ) ∩ A 6= ∅

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 76 / 106

Page 77: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Example: the urn model

Here, Y ∼ B(n, θ). The likelihood-based BF is induced by a randominterval

Γ(s) = θ : ply (θ) ≥ s = [θ(s), θ(s)]

We have

Z = ϕ(θ,W ) =

1 if W ≤ θ0 otherwise

Consequently,

ϕ (Γ(s),W ) = ϕ([θ(s), θ(s)

],W)

=

1 if W ≤ θ(s)0 if θ(s) < W0,1 otherwise

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 77 / 106

Page 78: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Example: the urn modelProperties

We have

Bely (1) = E(θ(S)) =

∫ θ

0ply (θ)dθ

Ply (1) = E(θ(S)) = θ +

∫ 1

θ

ply (θ)dθ

So

m(0,1) =

∫ 1

0ply (θ)dθ

As n→∞, m(1)→ 1 and m(0,1)→ 0 in probability.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 78 / 106

Page 79: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Example: the urn modelGeometric representation

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 79 / 106

Page 80: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Example: the urn modelBelief/plausibility intervals

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

proportion

belie

f/pla

usib

ility

n=5n=10n=100n=inf

2 5 10 20 50 100 200 500 1000

0.2

0.3

0.4

0.5

0.6

0.7

0.8

sample size

belie

f/pla

usib

ility

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 80 / 106

Page 81: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Uniform example

Assume that Y1, . . . ,Yn,Z is iid from U([0, θ])

Then Fθ(z) = z/θ for all 0 ≤ z ≤ θ and we can write Z = θW withW ∼ U([0,1])

We have seen that the belief function BelΘy after observing Y = y is

induced by the random interval [y(n), y(n)S−1/n]

Each focal set of BelZy is an interval

ϕ(Γy (s),w) = [y(n)w , y(n)s−1/nw ]

The predictive belief function BelZy is induced by the random interval

[Zy∗, Z ∗y ] = [y(n)W , y(n)S−1/nW ]

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 81 / 106

Page 82: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Uniform exampleLower and upper cdfs

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

z

low

er/u

pper

cdf

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 82 / 106

Page 83: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Uniform exampleConsistency

From the consistency of the MLE, Y(n) converges in probability to θ0, so

ZY∗ = Y(n)Wd−→ θ0W = Z

We have E(S−1/n) = n/(n − 1), and

Var(S−1/n) =n

(n − 2)(n − 1)2

Consequently, E(S−1/n)→ 1 and Var(S−1/n)→ 0, so S−1/n P−→ 1Hence,

Z ∗Y = Y(n)S−1/nW d−→ θ0W = Z

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 83 / 106

Page 84: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Predictive belief function

Consistency (general case)

Assume thatThe observed data y = (y1, . . . , yn) is a realization of an iid sampleY = (Y1, . . . ,Yn)The likelihood function Ln(θ) is unimodal and upper-semicontinuous, so thatits level sets Γn(s) are closed and connected, and that function ϕ(θ,w) iscontinuous

Under these conditions, the random set ϕ(Γn(S),W ) is a closed randominterval [Z∗n, Z ∗n ]

Then:

TheoremAssume that the conditions of the previous theorem hold, and that thepredictive belief function BelZn is induced by a random closed interval [Z∗n, Z ∗n ].Then Z∗n and Z ∗n both converge in distribution to Z when n tends to infinity.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 84 / 106

Page 85: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Outline

1 Belief functions on infinite spacesDefinitionPractical modelsCombination and propagation

2 EstimationJustificationLikelihood-based belief functionExamplesConsistency

3 PredictionPredictive belief functionExamples

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 85 / 106

Page 86: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Linear regression

Let z be a not-yet observed value of the dependent variable for a vectorx0 of covariates:

z = x ′0β + ε0,

with ε0 ∼ N (0, σ2)

We can write, equivalently,

z = x ′0β + σΦ−1(w) = ϕx0,y (θ,w),

where w has a standard uniform distributionThe predictive belief function on z can then be approximated using MonteCarlo simulation

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 86 / 106

Page 87: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Linear model: prediction

Let z be a not-yet observed value of the dependent variable for a vectorx0 of covariates:

z = x ′0β + ε0,

with ε0 ∼ N (0, σ2)

We can write, equivalently,

z = x ′0β + σΦ−1(w) = ϕx0,y (θ,w),

where w has a standard uniform distributionThe predictive belief function on z can then be approximated using MonteCarlo simulation

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 87 / 106

Page 88: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Movie exampleBO success of an action sequel film rated PG13 by MPAA, withLOGBUDGET=5.30, STARPOWER=23.62 and BUZZ= 2.81?

15 16 17 18 19 20

0.0

0.2

0.4

0.6

0.8

1.0

Lower and upper cdfs

z

Cum

ulat

ive

belie

f/pla

usib

ility

15 16 17 18 19 20 21

0.0

0.2

0.4

0.6

0.8

1.0

z

Pl(z

−de

lta <

= z

<=

z+

delta

)

0

0.5

1

1.5

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 88 / 106

Page 89: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Ex ante forecastingProblem and classical approach

Consider the situation where some explanatory variables are unknown atthe time of the forecast and have to be estimated or predictedClassical approach: assume that x0 has been estimated with somevariance, which has to be taken into account in the calculation of theforecast varianceAccording to Green (Econometric Analysis, 7th edition, 2012)

“This vastly complicates the computation. Many authors view it as simplyintractable”“analytical results for the correct forecast variance remain to be derivedexcept for simple special cases”

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 89 / 106

Page 90: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Ex ante forecastingBelief function approach

In contrast, this problem can be handled very naturally in our approach bymodeling partial knowledge of x0 by a belief function BelX in the samplespace X of x0

We then haveBelZy =

(BelΘ

y ⊕ BelZ×Θy ⊕ BelX

)↓ZAssume that the belief function BelX is induced by a source (Ω,A,PΩ,Λ),where Λ is a multi-valued mapping from Ω to 2X

The predictive belief function BelZy is then induced by the multi-valuedmapping

(ω, s,w)→ ϕy (Λ(ω), Γy (s),w)

BelZy can be approximated by Monte Carlo simulation

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 90 / 106

Page 91: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Monte Carlo algorithm

Require: Desired number of focal sets Nfor i = 1 to N do

Draw (si ,wi ) uniformly in [0,1]2

Draw ω from PΩ

Search for z∗i = minθ ϕy (x0,θ,wi ) such that ply (θ) ≥ si and x0 ∈ Λ(ω)Search for z∗i = maxθ ϕy (x0,θ,wi ) such that ply (θ) ≥ si and x0 ∈ Λ(ω)Bi ← [z∗i , z∗i ]

end for

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 91 / 106

Page 92: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Movie exampleLower and upper cdfs

BO success of an action sequel film rated PG13 by MPAA, withLOGBUDGET=5.30, STARPOWER=23.62 and BUZZ= (0,2.81,5) (triangularpossibility distribution)?

15 16 17 18 19 20

0.0

0.2

0.4

0.6

0.8

1.0

Lower and upper cdfs

z

Cum

ulat

ive

belie

f/pla

usib

ility

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 92 / 106

Page 93: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Movie examplePl-plots

Certain inputs Uncertain inputs

15 16 17 18 19 20 21

0.0

0.2

0.4

0.6

0.8

1.0

z

Pl(z

−de

lta <

= z

<=

z+

delta

)

0

0.5

1

1.5

15 16 17 18 19 20 21

0.2

0.4

0.6

0.8

1.0

z

Pl(z

−de

lta <

= z

<=

z+

delta

)

0

0.5

1

1.5

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 93 / 106

Page 94: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Innovation diffusion

Forecasting the diffusion of an innovation has been a topic ofconsiderable interest in marketing researchTypically, when a new product is launched, sale forecasts have to bebased on little data and uncertainty has to be quantified to avoid makingwrong business decisions based on unreliable forecastsOur approach uses the Bass model (Bass, 1969) for innovation diffusiontogether with past sales data to quantify the uncertainty on future salesusing the formalism of belief functions

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 94 / 106

Page 95: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Bass model

Fundamental assumption (Bass, 1969): for eventual adopters, theprobability f (t) of purchase at time t , given that no purchase has yet beenmade, is an affine function of the number of previous buyers

f (t)1− F (t)

= p + qF (t)

where p is a coefficient of innovation, q is a coefficient of imitation andF (t) =

∫ t0 f (u)du.

Solving this differential equation, the probability that an individual takenat random from the population will buy the product before time t is

Φθ(t) = cF (t) =c(1− exp[−(p + q)t ])

1 + (p/q) exp[−(p + q)t ]

where c is the probability of eventually adopting the product andθ = (p,q, c)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 95 / 106

Page 96: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Parameter estimation

Data: y1, . . . , yT−1, where yi = observed number of adopters in timeinterval [ti−1, ti )The number of individuals in the sample of size M who did not adopt theproduct at time tT−1 is yT = M −

∑T−1i=1 yi

The probability of adopting the innovation between times ti−1 and ti ispi = Φθ(ti )− Φθ(ti−1) for 1 ≤ i ≤ T − 1, and the probability of notadopting the innovation before tT−1 is pT = 1− Φθ(tT−1)

Consequently, y = (y1, . . . , yT ) is a realization of Y ∼M(M,p1, . . . ,pT )and the likelihood function is

Ly (θ) ∝T∏

i=1

pyii =

(T−1∏i=1

[Φθ(ti )− Φθ(ti−1)]yi

)[1− Φθ(tT−1)]yT

The belief function on θ is defined by ply (θ) = Ly (θ)/Ly (θ)

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 96 / 106

Page 97: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Results

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.2

0.2

0.20.2

0.2

0.2

0.3

0.3

0.3

0.3

0.3

0.4

0.4

0.4

0.4

0.4

0.5

0.50.5

0.5

0.6

0.6

0.6

0.6

0.7

0.7

0.7

0.8

0.8 0.9

0.9

p

q

3 4 5 6 7 8 9 10 11 12

x 10−3

0.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

0.52

0.1

0.1

0.1

0.1

0.1

0.2

0.2

0.2

0.2

0.2

0.3

0.3

0.3

0.3

0.4

0.4

0.4

0.4

0.5

0.5

0.5

0.6

0.6

0.6

0.7

0.7 0.8

0.80.9

p

c

3 4 5 6 7 8 9 10 11 12

x 10−3

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0.1

0.1

0.1

0.1

0.1

0.1

0.2

0.2

0.2

0.2

0.2

0.3

0.3

0.3

0.30.30.4

0.4

0.4

0.4

0.4

0.5

0.5

0.5

0.5

0.6

0.6

0.60.7

0.7

0.7

0.8

0.8

0.9

0.9

q

c

0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 0.520.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 97 / 106

Page 98: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Sales forecasting

Let us assume we are at time tT−1 and we wish to forecast the number Zof sales between times τ1 and τ2, with tT−1 ≤ τ1 < τ2

Z has a binomial distribution B(Q, πθ), whereQ is the number of potential adopters at time T − 1πθ is the probability of purchase for an individual in [τ1, τ2], given that nopurchase has been made before tT−1

πθ =Φθ(τ2)− Φθ(τ1)

1− Φθ(tT−1)

Z can be written as Z = ϕ(θ,W ) =∑Q

i=1 1[0,πθ ](Wi ) where

1[0,πθ ](Wi ) =

1 if Wi ≤ πθ0 otherwise

and W = (W1, . . . ,WQ) has a uniform distribution in [0,1]Q .

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 98 / 106

Page 99: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Predictive belief functionMulti-valued mapping

The predictive belief function on Z is induced by the multi-valuedmapping (s,w)→ ϕ(Γy (s),w) with

Γy (s) = θ ∈ Θ : ply (θ) ≥ s

When θ varies in Γy (s), the range of πθ is [πθ(s), πθ(s)], with

πθ(s) = minθ|ply (θ)≥s

πθ, πθ(s) = maxθ|ply (θ)≥s

πθ

We haveϕ(Γy (s),w) = [Z (s,w),Z (s,w)],

where Z (s,w) and Z (s,w) are, respectively, the number of wi ’s that areless than πθ(s) and πθ(s)

For fixed s, Z (s,W ) ∼ B(Q, πθ(s)) and Z (s,W ) ∼ B(Q, πθ(s))

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 99 / 106

Page 100: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Predictive belief functionCalculation

The belief and plausibilities that Z will be less than z are

BelZy ([0, z]) =

∫ 1

0FQ,πθ(s)(z)ds

PlZy ([0, z]) =

∫ 1

0FQ,πθ(s)(z)ds

where FQ,p denotes the cdf of the binomial distribution B(Q,p)

The contour function of Z is

ply (z) =

∫ 1

0

(FQ,πθ(s)(z)− FQ,πθ(s)(z − 1)

)ds

Theses integrals can be approximated by Monte-Carlo simulation

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 100 / 106

Page 101: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Ultrasound dataData collected from 209 hospitals through the U.S.A. (Schmittlein andMahajan, 1982) about adoption of an ultrasound equipment

1966 1968 1970 1972 1974 1976 19780

5

10

15

20

25

30

year

num

ber

of a

dopt

ers

observed

fitted

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 101 / 106

Page 102: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

ForecastingPredictions made in 1970 for the number of adopters in the period 1971-1978,with their lower and upper expectations

1966 1968 1970 1972 1974 1976 19780

5

10

15

20

25

30

year

num

ber

of a

dopt

ers

observed

forecast

lower/upper expectation

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 102 / 106

Page 103: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Cumulative belief and plausibility functionsLower and upper cumulative distribution functions for the number of adoptersin 1971, forecasted in 1970

0 5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

number of adopters

cum

ulat

ive

belie

f/pla

usib

ility

Belx([0,y])

Plx([0,y])

lower/upper expectation

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 103 / 106

Page 104: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Pl-plotPlausibilities PlYy ([z − r , z + r ]) as functions of z, from r = 0 (lower curve) tor = 5 (upper curve), for the number of adopters in 1971, forecasted in 1970:

0 5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

number of adopters

plau

sibi

lity

r=0

r=5

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 104 / 106

Page 105: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

Prediction Examples

Conclusions

Uncertainty quantification is an important component of any forecastingmethodology. The approach introduced in this lecture allows us torepresent forecast uncertainty in the belief function framework, based onpast data and a statistical modelThe proposed method is conceptually simple and computationallytractableThe belief function formalism makes it possible to combine informationfrom several sources (such as expert opinions and statistical data)The Bayesian predictive probability distribution is recovered when a prioron θ is availableThe consistency of the method has been established under someconditions

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 105 / 106

Page 106: Workshop on belief functionstdenoeux/dokuwiki/_media/... · values of are considered a priori more probable that central values, which does represent non vacuous knowledge about .

References

References Icf. https://www.hds.utc.fr/˜tdenoeux

T. Denoeux.Likelihood-based belief function: justification and some extensions to low-qualitydataInternational Journal of Approximate Reasoning, 55(7):1535–1547, 2014

O. Kanjanatarakul, S. Sriboonchitta and T. DenœuxForecasting using belief functions. An application to marketing econometrics.International Journal of Approximate Reasoning, 55(5):1113–1128, 2014

O. Kanjanatarakul, T. Denœux and S. SriboonchittaPrediction of future observations using belief functions: a likelihood-basedapproachInternational Journal of Approximate Reasoning, Vol. 72, pages 71-94, 2016.

Thierry Denœux (UTC/HEUDIASYC) Workshop on belief functions CMU, July-August 2017 106 / 106