Empirical Bayes Quantile-Prediction - Statistics

26
1 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania BFF4, May 2, 2017 Joint work with Gourab Mukherjee and Paat Rusmevichientong Marshall School, U. S. C. Preprint available on ArXiv. NOTE: In several places the symbol ν should read ν 2 . This correction should be clear from the context.

Transcript of Empirical Bayes Quantile-Prediction - Statistics

Page 1: Empirical Bayes Quantile-Prediction - Statistics

1

EmpiricalBayesQuantile-Prediction

akaE-BPredictionunderCheck-loss;

LawrenceD.BrownWhartonSchool,Univ.ofPennsylvania

BFF4,May2,2017Jointworkwith

GourabMukherjeeandPaatRusmevichientong MarshallSchool,U.S.C.

PreprintavailableonArXiv.NOTE:Inseveralplacesthesymbolν shouldreadν 2 .Thiscorrectionshouldbeclearfromthecontext.

Page 2: Empirical Bayes Quantile-Prediction - Statistics

2

MultipleIndependentGaussianProblems• nindependentproblems,indexedbyi =1,..,n

• Observe Xi ∼N θi ,vPi2( )

[θi unknown;vP , i known][Inapplications,Xi mightbe Xi = Xiiwith

Xik ∼ iid N θi ,σ i2( ),k =1,..,Ki ,vP ,i

2 =σ i2 Ki .]

Page 3: Empirical Bayes Quantile-Prediction - Statistics

3

QuantilePrediction• Foreachi ∃potentialfutureobservation Yi ∼N θi ,νF ,i( ).[Againthismightbeaverageofseveraliidvar’s.]

• Notethatforeachi,themeanθi issameforbothPastandFuture.

• Fixbi ∈ 0,1( ),i =1,..,n.Goalistopredictthebi -thquantileofdist.ofYi foreachindex,i.Thisquantileis

q0,i =θi + vF ,iΦ−1 bi( ).

• Thenaïvecoordinate-wisepredictoris q0,i = Xi + vF ,iΦ

−1 bi( ).NOTE:ThisisformalBayesforuniformprior

Page 4: Empirical Bayes Quantile-Prediction - Statistics

4

ShrinkageEstimation• Generallybeneficialinmulti-meanproblems(andmanyothermultivariatesettings).

• Buttheshrinkageneedstobeproperlyimplemented.• Forhomoscedasticproblemsthisiswellunderstood.(Steinestimation)

• Butthecurrentproblemisquiteheteroskedastic:

vP ,i , vF ,i , bi canalldependoni.• ForsuchproblemsminimaxshrinkagemaydramaticallydifferfromEmpiricalBayesshrinkage;

• And,well-implementedE-Bshrinkageisusuallypreferable

SeeEfronandMorris(1973,1975)……andXie,Kou,Brown(2012,2015)

Page 5: Empirical Bayes Quantile-Prediction - Statistics

5

ParametricEmpiricalBayes• Wefollowthehierarchicalconjugate-priorparadigm

EfronandMorris(op.cit.),Stein(1962),Lindley(1962),LindleyandSmith(1972)….

• θ1 ,..,θn ∼ iid N η ,τ 2( )• η ,τ 2areunknownhyper-parameterstobeestimatedfromthedata.

• Letη ,τ 2denotetheestimates.

Forsimplicityandnotationalsimplicity:• Consider(forthetalk)thecasewhereη =0isknown.(Paperallowsunknownη andestimatesitalongwithτ .)

Page 6: Empirical Bayes Quantile-Prediction - Statistics

6

QuantilePredictionforKnownHyper-parameters

• Ifτ isknownthenposteriordistributionof !θ known(&

Gaussian)• Yieldsknown,GaussianpredictivedistributionforeachfutureYi : Letα i = τ

2 τ 2 + vP ,i( ),then FYi y τ

2 ,Xi( ) =N α i Xi ,vF ,i +α ivP ,i( ).Soquantilepredictionis q Xi ;τ 2( ) =α i Xi + vF ,i +α ivP ,i( )Φ−1 bi( ).

• TheroleoftheE-Bpriorstructureisonlytomotivatethisfamilyofquantilepredictors.

Page 7: Empirical Bayes Quantile-Prediction - Statistics

7

“Direct”E-Bquantileprediction• Couldusedata+anyplausibleestimateofτ 2,thehyper-parameter(s),andplugin.

• i.e.,couldusemarginalMLEτ 2andgetprediction q Xi ;τ 2( ).• OrcouldsimilarlyuseMofMestimateinsteadofMLE.• These(andother)plausibleestimatesofτ 2aredifferent&givedifferentpredictors.

• They’renot“bad”.• BUTnoneoftheseneedstobethebestchoice.

SeeXie,Kou,Brown(2012)aboutthemorefamiliarestimationofmeansproblem.

Page 8: Empirical Bayes Quantile-Prediction - Statistics

8

FormulationwithLossFunction

Toinvestigateoptimalchoiceof ⌢τ 2in q Xi ;⌢τ 2( )imposea

predictivelossfunctionunderwhichqτ = q Xi ;τ 2( )isthenaturalpredictorofqwhenτ 2isknown.

Thenfind‘best’(or,just‘good’) ⌢τ 2 = ⌢τ 2 "X( )underthis

loss.Use ⌢q = q Xi ;

⌢τ 2( )foreachi.

• Check-Loss(aka,pinballlossorquantileloss).Letb,h>0;normalizebyb+h=1(w.l.o.g).i-thcomponentofpredictivelossis ℓ i Yi ,q( ) = bi q−Yi( )+ +hi Yi −q( )+

Page 9: Empirical Bayes Quantile-Prediction - Statistics

9

Notethatbi ,hi areknownandallowedtodependoni.

PlotofCheckLoss

b=0.7

Yqaxis

Page 10: Empirical Bayes Quantile-Prediction - Statistics

10

ThislosshasbeenintroducedhereasadevicetoevaluatepotentialQuantile-predictors.Butithasmotivationinvariousapplications.Forexample:

Page 11: Empirical Bayes Quantile-Prediction - Statistics

11

Page 12: Empirical Bayes Quantile-Prediction - Statistics

12

“Improved”E-BquantilepredictionConceptualIdea

• Defineaveragepredictiveriskas R

!θ , !q( ) = n−1 Eθi

ℓ i Yi , qi!X( )( )( )∑ .

• Whenusingaparticularhyper-parameterestimator,

!τ2 = !τ 2

"X( ),thisis

R!θ ; "τ 2( ) = n−1 Eθi

ℓ i Yi , qi Xi ; "τ 2!X( )( )( )( )∑ .

• Goalistocreateagood/bestestimator !τ .

Page 13: Empirical Bayes Quantile-Prediction - Statistics

13

• KEYistofindagoodand(nearly)unbiasedestimatorof n

−1 Eθiℓ i Yi , qi Xi ;τ 2( )( )( )∑ [Notewhathappenedtoτ 2 .]

• Callit RE! "

X ;τ 2( ).[ RE! isafnct.of !X ,butnotof

!Y .]

• Thiswillhave(1)

Eθi

RE!"X ;τ 2( )( )≈n−1 Eθi

ℓ i Yi , qi Xi ;τ 2( )( )( )∑ .

• Thenchoose

⌢τ RE2 "X( ) = argminτ 2

RE#"X ;τ 2( ){ }.

Page 14: Empirical Bayes Quantile-Prediction - Statistics

14

Whydoesthisyieldthebest ⌢τ 2(asn→∞)?Becausethereisabest ⌢τ 2(asn→∞).

Oracle“Predictor”

• Forevery !θ let

τOR2 = τOR

2 !X ;!θ( ) = argminτ 2

R θ ;τ 2( ).• Thenshow[asn→∞ forevery(L1b’nded)seq

!θ ]

(2) ⌢τ RE2 "X( )→τ 2OR

"X ;"θ( )inprob,&

(3) R!θ ; ⌢τ RE2( )→R

!θ ;τOR2( ).

• Thisconceptualschemeinvolvestwomajorsteps:

Page 15: Empirical Bayes Quantile-Prediction - Statistics

15

TwoMajorSteps• Step1:Createthe(asymptotic)riskestimator RE

! τ 2( )toyieldasuitableapproximation(1).

• Step2:Proveithasthedesiredconvergenceandriskproperties(2)and(3).

Thetwostepsareinterrelated:• RE! τ 2( )needstoproduceasatisfactoryapproximationin(1),andbecomputationallyfeasible.

• ANDitalsoneedstoenableStep2.

Page 16: Empirical Bayes Quantile-Prediction - Statistics

16

Step1:AsymptoticRiskEstimator• Needtosatisfy

(1) Eθi

RE!"X ;τ 2( )( )≈n−1 Eθi

ℓ i Yi , qi Xi ;τ 2( )( )( )∑ ,

where ℓ i ischeck-loss.• If ℓ i weresquarederrorthencanuseSURE.• But ℓ i isnotdifferentiable.SosomethinglikeSURE

seemsabigstretch!

Page 17: Empirical Bayes Quantile-Prediction - Statistics

17

• HOWEVER,

ℓ i* θi ,q( )" Eθi

Yi ℓ i Yi ,q( )( ) = vF ,i Gq−θivF ,i

,bi⎛

⎝⎜⎜

⎠⎟⎟∍

G w , b( ) =ϕ w( )+wΦ w( )−bw .• Gisasmooth,C∞ ,function.And ℓ i

* θi ,q( )≥0playstheroleofaconventionallossfunction.

• Here’sapictureofGforb=0.7:

Page 18: Empirical Bayes Quantile-Prediction - Statistics

18

PlotofthefunctionG

Approxima)ontheorybasednon-linearapproxima)on

Oftheloss

LinearTailapproxima)on

LinearTailapproxima)onb=0.7

0

b=0.7

Page 19: Empirical Bayes Quantile-Prediction - Statistics

19

CreationoftheAsymptoticRiskEstimator ARE! ForvP ,i =1(forsimplicity):(a) ApproximateG w ,bi( )byTaylorexpansion(“TE”)to

K(i)terms.(b) Substituteqi Xi ;τ( )−θi =w .(c) Theresultingexpectationis,say,Eθi

TE Xi ,θi( )( ).Itincludespowersofθi timesfunctionsofXi .

(d) IntegratebypartstocreateanequivalentexpectationthatcontainsonlyfunctionsofXi &notermswithθi .

(e) Step(d)couldbeunderstoodasaversionofSUREgeneralizedtohigherpowersofθi .

Page 20: Empirical Bayes Quantile-Prediction - Statistics

20

(f) ActualcomputationisgreatlyfacilitatedbyuseofHermitepolynomials.

(g) TheresultingexpectandisafunctionofXi thatisanunbiasedestimateoftheexpectedTaylorapproximation.That’sthecoreofour ARE! .

(h) Therearesomeadditional(clever)stepsneededtohandlelargevaluesofW,forwhichTaylorexpansionisnotgood.

(i) Justtoimpressyou,here’sthecoreof ARE! withouttheadditionalstepsin(h):(InthefollowingUi τ( )isaminortruncation/modificationofXi anddi τ( )isasimplerationalfunctionofτ ,vP ,i ,bi .Thisiscopiedfromthepaper,whichusesthenotation“τ ”inplaceof“τ 2”.)

Page 21: Empirical Bayes Quantile-Prediction - Statistics

21

(j) ChooseKn(i),thenumberoftermsintheTaylor

expansion.Thisvarieswithi,butisO logn( ).(k) Thenaddoveritoget ARE! .

(l) Finally,findthedesired argminτ ARE!( )viaadiscrete

gridsearch.

Page 22: Empirical Bayes Quantile-Prediction - Statistics

22

Step2:Provethedesiredasymptoticproperties

TherearesomecluesinXie,KouandBrown(2012).Butpartsoftheproofheredifferinkeyrespectsfromwhat’sthere,andthedetailsherearemuchmorecomplex.

Page 23: Empirical Bayes Quantile-Prediction - Statistics

23

Simulation#1Homoscedastic(σ 2 =1).Twotypesofθi ,bi( )pairs: 0.58,0.51( )withprob0.9 5.1,0.9( )withprob0.1

Comparisonofpred.risksofthreepred.methods#coord’s→ n=20 n=20 n=50 n=50Method↓ Efficiency Ave.ofτ 2 Efficiency% Ave.ofτ 2ARE 86% 1.21 99% 0.344ML/MM 68% 0.037 69% 0.037Oracle 100% 0.296 0.296Note:Becausemodelishomoscedastic,MaxLikandMethMomarethesame

Page 24: Empirical Bayes Quantile-Prediction - Statistics

24

Simulations#3 Thiswasasetof6simulationsinvolvingdifferent

scenarios,allwithheteroscedasticdataandrandomchoicesforparametersandbi .e.g.,Inthefirstsetup(thesimplest)

νP2 ∼U 0.1,0.33( ) ,νF2 =1, θ ∼U 0,1( ), b∼U 0.5,0.99( ),allindep. Inthese6scenariostheefficiencyofourAREpredictor

relativetotheoraclewasForn=20: 88%<Eff.<98% Forn=100: 91%<Eff.<99%.

[Note:Thelastscenarioinvolveduniformlydistributedobservations,ratherthannormallydistributedones.Butitisn’treportedinthemanuscriptwhethertheoracleknewandusedthatfact.I’llneedtoaskGourabandPaat.

Page 25: Empirical Bayes Quantile-Prediction - Statistics

25

Recap• Probleminvolvesmultivariatenormalmeans• Goalisquantilepredictionsoffutureobservations• Ashrinkagepredictorisproduced

o Shrinkageisproducedbyahierarchicalconjugatesetup.o Qualityofquantilepredictionmeasuredthroughpredictivecheck-loss,whichisthenaturallossforquantileprediction.

• MethodologyinvolvescreationanduseofacomplicatedbutcomputableAsymptoticRiskEstimate( ARE! )involvingHermitepolynomials.

• Methodisasymptoticallyjustified.And• Appearstoworkwellinsimulationsformoderaten.

Page 26: Empirical Bayes Quantile-Prediction - Statistics

26

ConcludingRemarksforBFF4

1. QuantilepredictormethodologyismotivatedfromaBayesianhierarchicalstructure.

2. Theformofthepredictorthendrivesthemethodology

3. TheAREmethodisaCONDITIONALFREQUENTISTnotion.

4. BUT:TheproposedAREpredictorisNOTBAYESIAN–Itdoesn’tresultfromanypriorIcanthinkof.notevenasymptotically

5. Wellknown,butoftenoverlooked–Predictionisdifferentfromparameterestimation.Anestimateofthesamplequantiledoesn’tdirectlyleadtoquantileprediction.