Download - Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Transcript

Page 1: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

QueryExpansionwithLocally-TrainedWordEmbeddings

Fernando Diaz Bhaskar Mitra NickCraswell

Microsoft

July21, 2016

1 / 22

Page 2: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

wordembedding: discriminativelytrainedvectorrepresentation

2 / 22

Page 3: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

L =T∑

t=1

ωxt︸︷︷︸termweight

∑y∈Vt

c

logσ(ϕ(xt) · ϕ(y))︸︷︷︸observedcontext

+∑y∈Vt

n

logσ(−ϕ(xt) · ϕ(y))︸︷︷︸negativecontext

3 / 22

Page 4: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

ωxt needstoreflecttheimportanceofthetermatevaluationtime.

4 / 22

Page 5: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

T∑t=1

ωxt=w ∝ p(w|C)

5 / 22

Page 6: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

whattermsareimportantatquerytime?

6 / 22

Page 7: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

p(w|R) probabilityofthetermintherelevantdocuments.

7 / 22

Page 8: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

howdifferentis p(w|R) from p(w|C)?

8 / 22

Page 9: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

KL(R, C)w = p(w|R) log p(w|R)

p(w|C)

9 / 22

Page 10: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

KL

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

rank

10 / 22

Page 11: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

howmuchbettercanwedoifwetrainwith∑Tt=1 ωxt ∝ p(w|R)?

11 / 22

Page 12: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

LanguageModelScoring

score(d, q) = KL(θq, θd)

θq maximumlikelihoodquerylanguagemodelθd documentlanguagemodel

12 / 22

Page 13: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

QueryExpansionwithWordEmbeddings

θ̃q = UUTθq

U |V| × k termembeddingmatrix

13 / 22

Page 14: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

QueryExpansionwithWordEmbeddings

Uglobal embeddingtrainedwith p(w|C)Ulocal embeddingtrainedwith p(w|R)

14 / 22

Page 15: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Getting p(w|R)

p(d) =exp(−KL(θq, θd))∑d′ exp(−KL(θq, θd′))

p̃(w|R) =∑d

p(w|θd)p(d)

15 / 22

Page 16: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Getting p(w|R)

p(d) =exp(−KL(θq, θd))∑d′ exp(−KL(θq, θd′))

p̃(w|R) =∑d

p(w|θd)p(d)

15 / 22

Page 17: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Experiments

16 / 22

Page 18: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Data

docs words queriestrec12 469,949 438,338 150robust 528,155 665,128 250web 50,220,423 90,411,624 200giga 9,875,524 2,645,367 -wiki 3,225,743 4,726,862 -

17 / 22

Page 19: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Embeddings

• global• publicembeddings(GloVe, word2vec)• word2vecontargetcorpus

• local: word2vecwithdocumentssampledby p(d)

18 / 22

Page 20: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

• ten-foldcross-validation• metric: NDCG@10

19 / 22

Page 21: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Results

global localwiki+giga gnews target target giga wiki

QL 50 100 200 300 300 400 400 400 400trec12 0.514 0.518 0.518 0.530 0.531 0.530 0.545 0.535 0.563* 0.523robust 0.467 0.470 0.463 0.469 0.468 0.472 0.465 0.475 0.517* 0.476web 0.216 0.227 0.229 0.230 0.232 0.218 0.216 0.234 0.236 0.258*

20 / 22

Page 22: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

global

local

topwordsby p̃(w|R) (blue: query; red: topwordsby p(w|R))

21 / 22

Page 23: Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)

Summary

• localembeddingprovidesastrongerrepresentationthanglobalembedding

• potentialimpactforothertopic-specificnaturallanguageprocessingtasks

• futurework• effectivenessimprovements• efficiencyimprovements

22 / 22

Top Related

Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr.

Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος [email protected].

Http://ncbi.nlm.nih.gov/entrez/query Books Molecular Cell Biology Lodish 20. Cell-to-Cell Signaling: Hormones and Receptors .

Http://ncbi.nlm.nih.gov/entrez/query Books Molecular Cell Biology Lodish 20. Cell-to-Cell Signaling: Hormones and Receptors .

IBM Cognos Dynamic Query Analyzer 1021 Gn Upublic.dhe.ibm.com/software/data/cognos/... · ]to IBM Cognos Dynamic Query Analyzer sWS MμCªi≤UzW M {íípªñAH ∩ ÷V DC p ÷ΩTA

IBM Cognos Dynamic Query Analyzer 1021 Gn Upublic.dhe.ibm.com/software/data/cognos/... · ]to IBM Cognos Dynamic Query Analyzer sWS MμCªi≤UzW M {íípªñAH ∩ ÷V DC p ÷ΩTA

PROD UCT CATALOGUE · transformed to an enterprise with a multi-faceted, highly trained and effective team, while their products travel, on a daily basis, ... • Heavy duty construction

PROD UCT CATALOGUE · transformed to an enterprise with a multi-faceted, highly trained and effective team, while their products travel, on a daily basis, ... • Heavy duty construction

Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.

Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.

IBM Tivoli Storage Manager for Windows: …publib.boulder.ibm.com/tividd/td/TSMC/GC32-0788-04/zh_TW/...Query RSM 421 Query Schedule 422 Query Session 423 Query Sysfiles 424 Query Systeminfo

IBM Tivoli Storage Manager for Windows: …publib.boulder.ibm.com/tividd/td/TSMC/GC32-0788-04/zh_TW/...Query RSM 421 Query Schedule 422 Query Session 423 Query Sysfiles 424 Query Systeminfo

arXiv:2011.14260v1 [math.RT] 29 Nov 2020structures, which are two kinds of collections of open embeddings of algebraic tori ac-companied with weighted quivers, related by two kinds

arXiv:2011.14260v1 [math.RT] 29 Nov 2020structures, which are two kinds of collections of open embeddings of algebraic tori ac-companied with weighted quivers, related by two kinds

ΜΟΥΣΕΙΟ ΜUSEUM ΒΥΖΑΝΤΙΝΟΥ OFBYZANTINE ΠΟΛΙΤΙΣΜΟΥ · 2016-04-07 · rolf Sachsse (Bonn 1949) is a trained photographer and studied art history, communication

ΜΟΥΣΕΙΟ ΜUSEUM ΒΥΖΑΝΤΙΝΟΥ OFBYZANTINE ΠΟΛΙΤΙΣΜΟΥ · 2016-04-07 · rolf Sachsse (Bonn 1949) is a trained photographer and studied art history, communication