2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Post on 11-May-2015

768 views 0 download

description

This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system is built using Mahout to do off-line analysis and Solr to provide real-time recommendations. The presentation will also include enough theory to provide useful working intuitions for those desiring to adapt this design. The entire system including a data generator, off-line analysis scripts, Solr configurations and sample web pages will be made available on github for attendees to modify as they like.

Transcript of 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

SYSTEM TEARDOWN: SOLR AS A PRACTICAL RECOMMENDATION ENGINE Michael Hausenblas Chief Data Engineer EMEA, MapR Technologies Twitter: @mhausenblas

What does Machine Learning look like?

A1 A2!"

#$TA1 A2

!"

#$=

A1T

A2T

!

"

%%

#

$

&&

A1 A2!"

#$

=A1

TA1 A1TA2

AT2A1 AT

2A2

!

"

%%

#

$

&&

r1r2

!

"%%

#

$&&=

A1TA1 A1

TA2

AT2A1 AT

2A2

!

"

%%

#

$

&&

h1h2

!

"%%

#

$&&

r1 = A1TA1 A1

TA2!"%

#$&h1h2

!

"%%

#

$&&

What does Machine Learning look like?

O(κ  k  d  +  k3  d)  =  O(k2  d  log  n  +  k3  d)  for  small  k,  high  quality  O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  k,  looser  quality  

•  Observation of interactions between users taking actions and items for input data to recommender model

•  Goal: suggest additional appropriate or desirable interactions

•  Example applications: –  similar movie, music, books (topic, style, etc.) –  map-based restaurant choices –  suggesting sale items for e-stores or cash-register

receipts

Recommendations as Machine Learning

Recommendations

Recap:  Behavior  of  a  crowd  helps  us  understand  what  individuals  will  do  

Recommendations

Alice  got  an  apple  and  a  puppy  

Charles  got  a  bicycle  

Alice  

Charles  

Recommendations

Charles  got  a  bicycle  

Bob  got  an  apple  

Alice  

Bob  

Charles  

Alice  got  an  apple  and  a  puppy  

Recommendations

What  else  would  Bob  like?  

Alice  

Bob  

Charles  

?  

Recommendations

A  puppy,  of  course!  

Alice  

Bob  

Charles  

You  get  the  idea  of  how  recommenders  work  …      

Recommendations

What  if  everybody  gets  a  pony?    

?  

Alice  

Bob  

Charles  

Amelia    What  else  would  you  recommend  for  Amelia?  

Recommendations

?  

Alice  

Bob  

Charles  

Amelia  If  everybody  gets  a  pony,  it’s  not  a  very  good  indicator  of  what  to  else  predict  ...  

•  Very popular items co-occur with everything –  Examples: welcome document; elevator music

•  Very widespread occurrence is not interesting as a way to generate indicators

–  Unless you want to offer an item that is constantly desired, such as razor blades

•  What we want is anomalous co-occurrence –  This is the source of interesting indicators of preference on which to base

recommendation

Problems with Raw Co-occurrence

1.  Use log files to build history matrix of users x items –  Remember: this history of interactions will be sparse compared to all potential

combinations

2.  Transform to a co-occurrence matrix of items x items

3.  Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix –  Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with

confidence be used as indicators of preference –  RowSimilarityJob in Apache Mahout uses LLR

Get Useful Indicators from Behaviors

Log Files

Alice  

Bob  

Charles  

Alice  

Bob  

Charles  

Alice  

Log Files

u1  

u3  

u2  

u1  

u3  

u2  

u1  

t1  

t4  

t3  

t2  

t3  

t3  

t1  

Log Files and Dimensions

u1  

u3  

u2  

u1  

u3  

u2  

u1  

t1  

t4  

t3  

t2  

t3  

t3  

t1  

t1  

t2  

t3  

t4  

Things  

u1   Alice  

Bob  Charles  

u3  u2  

Users  

History Matrix: Users by Items

Alice  

Bob  

Charles  

✔   ✔   ✔  ✔   ✔  

✔   ✔  

Co-occurrence Matrix: Items by Items

-­‐  

1   2  1   1  

1  1  

2   1  

How  do  you  tell  which  co-­‐occurrences  are  useful?  

0  0  

0   0  Use  LLR  test  to  turn  co-­‐occurrence  into  indicators…  

Co-occurrence Binary Matrix

1  1  not  

not  

1  

Spot the Anomaly

A   not  A  

B   13   1000  

not  B   1000   100,000  

A   not  A  

B   1   0  

not  B   0   2  

A   not  A  

B   1   0  

not  B   0   10,000  

A   not  A  

B   10   0  

not  B   0   100,000  

What  conclusion  do  you  draw  from  each  situa9on?  

•  Root LLR is roughly like standard deviations •  In Apache Mahout, RowSimilarityJob uses  LLR

Spot the Anomaly

A   not  A  

B   13   1000  

not  B   1000   100,000  

A   not  A  

B   1   0  

not  B   0   2  

A   not  A  

B   1   0  

not  B   0   10,000  

A   not  A  

B   10   0  

not  B   0   100,000  

0.90   1.95  

4.52   14.3  

Indicator Matrix: Anomalous Co-cccurrence

✔  ✔  

Result:  The  marked  row  will  be  added  to  the  indicator  field  in  the  item  document  …    

Significant  co-­‐occurrences!  indicators    

Indicator Matrix

✔  id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1)

That  one  row  from  indicator  matrix  becomes  the  indicator  field  in  the  Solr  document  used  to  deploy  the  recommenda@on  engine  

Note:  data  for  the  indicator  field  is  added  directly  to  meta  data  for  a  document  in  Solr  index.    You  don’t  need  to  create  a  separate  index  for  the  indicators.  

Demo time!

Internals of the Recommender Engine

27  

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry”

Looking Inside LucidWorks

28  

Metrics and logs (5)

Cooccurrence analysis (7)

Post to search

engine (8)

Search engine (4)

Presentation tier (2)

User behavior generator (1)

Session collector

(3)

History collector (6)

Diagnostic browsing (9)

http://bita.ly/18vbbaT    

Example: search based recommendation

•  Sample Query –  Current location –  Recent merchant descriptions –  Recent merchant id’s –  Recent SIC codes –  Recent accepted offers –  Local Top40

•  Sample Document –  Merchant Id –  Field for text description –  Phone –  Address –  Location

–  Indicator merchant id’s –  Indicator industry (SIC) id’s –  Indicator offers –  Indicator text –  Local Top40

original  data  and  meta-­‐data  

derived  from  co-­‐occurrence  analysis  

recommendaRon  query  

Search-based recommendation

SolR  Indexer  SolR  

Indexer  Solr  

indexing  Co-­‐occurrence  

(Mahout)  

Item  meta-­‐data   Index  shards  

complete  history  

Analyze with MapReduce

SolR  Indexer  SolR  

Indexer  Solr  

search  Web  Rer  

Item  meta-­‐data   Index  shards  

user  history  

Deploy with Conventional Search System

•  Kudos to Ted Dunning, Grant Ingersoll and LucidWorks, for the idea & the demo!

•  Get in touch: Twitter—@mhausenblas, @MapR

•  Ah, and, btw: we’re hiring ;)

Outro