Indexing and Preprocessing - Cornell Universityadith/CfactSIGIR2016/Evaluation2.pdf · • Offline...

Outline

• Offline Evaluation of Online Metrics

• Counterfactual Estimation

• Advanced Estimators

1. Self-Normalized Estimator

2. Doubly Robust Estimator

3. Slates Estimator

• Case Studies & Demo

• Summary

IPS: Issues

𝑈ips 𝜋 =1

𝜋(𝑦𝑖|𝑥𝑖)

𝜋0(𝑦𝑖|𝑥𝑖)𝛿𝑖 Suppose δ ≡ 1

𝐸[𝐂𝐨𝐧𝐬𝐭𝐚𝐧𝐭] ≠ 𝐂𝐨𝐧𝐬𝐭𝐚𝐧𝐭

Two variables in IPS

Τπ(𝑦|𝑥) π0(𝑦|𝑥)

Fix: Control Variates

Use correlated quantities to control Τπ(𝑦|𝑥) π0(𝑦|𝑥) variability

𝐸 𝑠𝑖 = θ known

Multiplicative Additive

𝐸[𝑠]𝐸[

π(𝑦|𝑥)

π0(𝑦|𝑥)δ] 𝐸

𝜋 𝑦 𝑥π0 𝑦 𝑥

δ − 𝑠 + θ

e.g. Self-Normalization e.g. Doubly Robust Estimator

Self-Normalized Estimator

Use expected sample size as multiplicative control variate

Ƹ𝑠 π =1

π(𝑦𝑖|𝑥𝑖)

π0(𝑦𝑖|𝑥𝑖)𝐸 Ƹ𝑠 π = 1

𝑈SNips π = ൚

π(𝑦𝑖|𝑥𝑖)π0(𝑦𝑖|𝑥𝑖)

δ𝑖

π(𝑦𝑖|𝑥𝑖)π0(𝑦𝑖|𝑥𝑖)

[Trotter & Tukey, 1952] [Hesterberg, 1983] [Swaminathan & Joachims, 2016]

Self-Normalization: Properties

𝐸[𝐂𝐨𝐧𝐬𝐭𝐚𝐧𝐭] = 𝐂𝐨𝐧𝐬𝐭𝐚𝐧𝐭

Equivariant

Asymptotically consistent Pr lim𝑛→∞

𝑈SNips 𝜋 = 𝑈 𝜋 = 1

Small bias which decays 𝑂(1

𝑛) while variance decays 𝑂(

News Recommender: Results

snIPS often achieves a better bias-variance trade-off

Doubly Robust Estimator

Reward Prediction IPS

𝑈rp π =1

𝑖𝐸𝑦~π|𝑥𝑖[

መ𝛿 𝑥𝑖 , 𝑦 ] 𝑈ips π =1

π(𝑦𝑖|𝑥𝑖)

π0(𝑦𝑖|𝑥𝑖)δ𝑖

Low variance, High bias High variance, No bias

𝑈dr π =1

𝜋 𝑦𝑖 𝑥𝑖π0 𝑦𝑖 𝑥𝑖

δ𝑖 − መ𝛿 𝑥𝑖 , 𝑦𝑖 + 𝐸𝑦~π|𝑥𝑖[መ𝛿 𝑥𝑖 , 𝑦 ]

[Langford, Li, Dudik, 2011] [Jiang & Li, 2016]

Doubly Robust: Properties

Useful when using estimated propensities ො𝑝𝑖 ≈ π0(𝑦𝑖|𝑥𝑖)

Unbiased if, eitherመ𝛿(𝑥, 𝑦) = δ(𝑥, 𝑦)

ො𝑝𝑖 = π0(𝑦𝑖|𝑥𝑖)

Default in Vowpal Wabbit http://hunch.net/~vw/

DR dominates IPS even with a noisy መ𝛿(𝑥, 𝑦)

Evaluating rankings (slates)

𝑦𝑖 π(𝑥𝑖)

Exact match of composite actions in logs unlikely

Idea: Count per-slot matches

Slates Estimator

If π0 samples 𝑙 documents from a multinomial μ(𝑑|𝑥),

with replacement

𝑈slates π =1

1 − 𝑙 +

𝑗=1

𝑙𝕀 𝑦𝑖 𝑗 = 𝜋 𝑥𝑖 𝑗

μ 𝑦𝑖 𝑗 𝑥𝑖)𝛿𝑖

For general 𝜋0, need to record π0(𝑦 𝑗 = 𝑑, 𝑦 𝑘 = 𝑑′|𝑥)

Slates Estimator: General 𝜋0Define: Γπ0 𝑥 𝑑, 𝑗 ; 𝑑′, 𝑘 = π0 𝑦 𝑗 = 𝑑, 𝑦 𝑘 = 𝑑′ 𝑥

𝟙𝑦 𝑑, 𝑗 = 𝕀 𝑦 𝑗 = 𝑑

𝑈slates π =1

𝐸𝑦~π|𝑥𝑖 𝟙𝑦𝑇 Γπ0(𝑥𝑖)

† 𝟙𝑦𝑖δ𝑖

Can also develop self-normalized/doubly robust variants

[Swaminathan et al, 2016]

Slates Estimator: Properties

• Typically, exponentially better sample complexity than IPS

• Unbiased if reward decomposes per-slot

∃𝛷𝑥 s. t. 𝛿 𝑥, 𝑦 ≔ = 𝛷𝑥 +

𝛷𝑥 +𝛷𝑥 +𝛷𝑥 …

Can capture higher-order interactions with suitable Γπ𝟎(𝒙)

(sn)Slates better than IPS but can have asymptotic bias

Outline

1. Yahoo Front Page [Li,Chu,Langford,Wang; 2011]

2. Bing Speller [Li,Chen,Kleban,Gupta; 2014]

3. Search Ranking [Swaminathan et al; 2016]

• Summary

Yahoo Front Page

𝑦~𝜋(𝑥) Pick STORY ∈ F1,…F20

to highlight for different users

Metric 𝛿: CTR

Logging 𝜋0: Uniform random

Setup: Deploy policies, check online ↔ IPS correlation

Yahoo Front Page: Case Study

IPS is quite accurate for several (spatio-temporal) policies

Yahoo Front Page: Case Study

IPS indeed gives unbiased CTR estimates for different articles

Bing Speller

Pick (possibly many) reformulation candidates

Logging 𝜋0: Independent Bernoulli per rank

Pr Pick 𝑑𝑗 = Τ1 1 + α exp β score 𝑑𝑗 − score(𝑑1)

Setup: Deploy policies, check online ↔ IPS correlation

Bing Speller: Case Study

IPS reliably estimates CTR, etc. despite non-uniform logging

Search Ranking

Re-rank 5 out of 8 candidates

Metric 𝛿: Time-to-success

Utility rate

Logging 𝜋0: Bootstrap from

Uniform/Plackett-Luce

Setup: Report RMSE vs. bootstrap sample size

Plackett-Luce for Slates

SoftMax/Multinomial without replacement

(renormalize probabilities after each draw)

0.400.200.200.060.060.060.02

Search Ranking: Case Study

Slates estimator dominates IPS

Slates RP

Outline

• Summary

Evaluation: Key Questions

What system 𝜋0 What should we

should we deploy? record in our logs?

How can we estimate 𝑈(π) using logged data from 𝜋0?

𝑦~π0(𝑥)

What system 𝜋0 to deploy?

To enable reliable counterfactual estimation

If possible, stochastic with logged randomization

If not, log enough to estimate propensities ො𝑝𝑖

How to explore?

Uniform? typically, bad

“Around current system”? “less risk” & “better targeted”

[Also see “Online contextual Bandit Algorithms”]

What should we record?

Log EVERYTHING! 𝑥𝑖 , 𝑦𝑖 , δ𝑖 , 𝑝𝑖

To reliably “replay” 𝜋 on logged data,

Candidate set of actions 𝑌𝑖 Features for each candidate 𝑓(𝑥𝑖 , 𝑦)

Action at the point of randomization 𝑦𝑖

[Bottou et al, 2013] [Li,Chen,Kleban,Gupta, 2015]

How can we evaluate 𝑈(π)?

– Related: Test collections for offline metrics

• “Model the world” Reward Prediction

– Related: Click models; Collaborative filtering

• “Model the bias in data” Off-policy estimator

– Randomization is essential

[Carterette et al, 2010] [Aslam et al, 2009] [Schnabel et al, 2016b]…

[Chuklin et al, 2015] [Schnabel et al, 2016a]

Summary: Off-policy Estimators

• IPS Estimator

Simple, effective fix for non-uniform (biased) data

• Self-Normalized Estimator

• Doubly Robust Estimator

• Slates Estimator

Demo: Code Samples

Visit http://www.cs.cornell.edu/~adith/CfactSIGIR2016/

Download Code_Data.zip

(Recommend) Install Anaconda-Python3; joblib

• Run experiment:

python EvalExperiment.py

• Plot results:

python plot_sigir.py --mode [estimate/error] --path ../../Logs/ssynth…

QUESTIONS?

Indexing and Preprocessing - Cornell Universityadith/CfactSIGIR2016/Evaluation2.pdf · • Offline...

Documents

Transcript of Indexing and Preprocessing - Cornell Universityadith/CfactSIGIR2016/Evaluation2.pdf · • Offline...

S.K.Donaldson February15,2011 - arxiv.org · S.K.Donaldson February15,2011 ... gular metrics of this kind have been considered before by Jeﬀres [6 ... This degree of regularity

Einstein Metrics, Harmonic Forms, & Symplectic Four ...parton/NTDG2014/lucidi/lebrun.pdfFour Dimensions is Exceptional When n=4, Einstein metrics are genuinely non-trivial: not typically

Offline Discussion M. Moulson 22 October 2004 Datarec status Reprocessing plans MC status MC development plans Linux Operational issues Priorities AFS/disk.

Offline LED driver with primary-sensing and high power ... · 4.8 Burst-mode operation at no load or very light load ... The pin voltage is also sampled-and-held right at the end

Discovering Metrics and Scale Space - Preliminary Examinationcs.tulane.edu/~brittany/research/docs/fasy_prelim_slides.pdf · Discovering Metrics and Scale Space Preliminary Examination

Offline Discussion

Counterfactual Model Interactive System Schematic for ......–Long turnaround time 𝑈 Evaluating Online Metrics Offline •Online: On-policy A/B Test •Offline: Off-policy Counterfactual

ansa meta for nvh brochurecoupling in NASTRAN - Optionally generated PANELS for panel participation analysis in NASTRAN The preprocessing of the cavity for exterior Acoustic analyses

Scalability and performance metrics Matrix …cseweb.ucsd.edu/classes/sp03/cse160/Lectures/Lec06/Lec06.pdfScalability and performance metrics Communicators Matrix multiplication 4/17/2003

Offline LED driver with primary-sensing and high power ... › files › ds › stm › hvled807pf.pdf · This is information on a product in full production. November 2012 Doc ID

Performance Metrics and Decorator Patterns

Metrics of Poincaré type with constant scalar curvature: a ...auvray/TopObstr.pdf · Metrics of Poincaré type with constant scalar curvature: a topological constraint This is still

Offline LED driver with primary-sensing and high power ...

Distance metrics for Sequential Data

Finding Skyline Nodes in Large Networks. Evaluation Metrics: Distance from the query node. (John) Coverage of the Query Topics. (Big Data, Cloud Computing,

Student Investment Advisory Services | Simon Fraser University · 2017-08-16 · 7 Student Investment Advisory Services | Simon Fraser University Risk Metrics Risk Summary 2010:Q2

Montek Singh COMP790-084 Oct 11, 2011. Today’s topics: ◦ more on error metrics ◦ more applications ◦ architectures and design tools ◦ challenges and.

Distortion Metrics

Indexing and Preprocessing - Cornell Universityadith/CfactSIGIR2016/Learning2.pdf · • Methods for large-scale propensity estimation? – Not a typical ML prediction problem •

z ;z ;:::;z - Annals of Mathematicsannals.math.princeton.edu/wp-content/uploads/Jeffres_Mazzeo... · K ahler{Einstein metrics with edge singularities By Thalia Jeffres, Rafe Mazzeo,