2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

35

description

This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system is built using Mahout to do off-line analysis and Solr to provide real-time recommendations. The presentation will also include enough theory to provide useful working intuitions for those desiring to adapt this design. The entire system including a data generator, off-line analysis scripts, Solr configurations and sample web pages will be made available on github for attendees to modify as they like.

Transcript of 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Page 1: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
Page 2: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

SYSTEM TEARDOWN: SOLR AS A PRACTICAL RECOMMENDATION ENGINE Michael Hausenblas Chief Data Engineer EMEA, MapR Technologies Twitter: @mhausenblas

Page 3: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

What does Machine Learning look like?

Page 4: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

A1 A2!"

#$TA1 A2

!"

#$=

A1T

A2T

!

"

%%

#

$

&&

A1 A2!"

#$

=A1

TA1 A1TA2

AT2A1 AT

2A2

!

"

%%

#

$

&&

r1r2

!

"%%

#

$&&=

A1TA1 A1

TA2

AT2A1 AT

2A2

!

"

%%

#

$

&&

h1h2

!

"%%

#

$&&

r1 = A1TA1 A1

TA2!"%

#$&h1h2

!

"%%

#

$&&

What does Machine Learning look like?

O(κ  k  d  +  k3  d)  =  O(k2  d  log  n  +  k3  d)  for  small  k,  high  quality  O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  k,  looser  quality  

Page 5: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

•  Observation of interactions between users taking actions and items for input data to recommender model

•  Goal: suggest additional appropriate or desirable interactions

•  Example applications: –  similar movie, music, books (topic, style, etc.) –  map-based restaurant choices –  suggesting sale items for e-stores or cash-register

receipts

Recommendations as Machine Learning

Page 6: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Recommendations

Recap:  Behavior  of  a  crowd  helps  us  understand  what  individuals  will  do  

Page 7: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Recommendations

Alice  got  an  apple  and  a  puppy  

Charles  got  a  bicycle  

Alice  

Charles  

Page 8: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Recommendations

Charles  got  a  bicycle  

Bob  got  an  apple  

Alice  

Bob  

Charles  

Alice  got  an  apple  and  a  puppy  

Page 9: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Recommendations

What  else  would  Bob  like?  

Alice  

Bob  

Charles  

?  

Page 10: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Recommendations

A  puppy,  of  course!  

Alice  

Bob  

Charles  

Page 11: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

You  get  the  idea  of  how  recommenders  work  …      

Page 12: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Recommendations

What  if  everybody  gets  a  pony?    

?  

Alice  

Bob  

Charles  

Amelia    What  else  would  you  recommend  for  Amelia?  

Page 13: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Recommendations

?  

Alice  

Bob  

Charles  

Amelia  If  everybody  gets  a  pony,  it’s  not  a  very  good  indicator  of  what  to  else  predict  ...  

Page 14: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

•  Very popular items co-occur with everything –  Examples: welcome document; elevator music

•  Very widespread occurrence is not interesting as a way to generate indicators

–  Unless you want to offer an item that is constantly desired, such as razor blades

•  What we want is anomalous co-occurrence –  This is the source of interesting indicators of preference on which to base

recommendation

Problems with Raw Co-occurrence

Page 15: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

1.  Use log files to build history matrix of users x items –  Remember: this history of interactions will be sparse compared to all potential

combinations

2.  Transform to a co-occurrence matrix of items x items

3.  Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix –  Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with

confidence be used as indicators of preference –  RowSimilarityJob in Apache Mahout uses LLR

Get Useful Indicators from Behaviors

Page 16: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Log Files

Alice  

Bob  

Charles  

Alice  

Bob  

Charles  

Alice  

Page 17: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Log Files

u1  

u3  

u2  

u1  

u3  

u2  

u1  

t1  

t4  

t3  

t2  

t3  

t3  

t1  

Page 18: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Log Files and Dimensions

u1  

u3  

u2  

u1  

u3  

u2  

u1  

t1  

t4  

t3  

t2  

t3  

t3  

t1  

t1  

t2  

t3  

t4  

Things  

u1   Alice  

Bob  Charles  

u3  u2  

Users  

Page 19: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

History Matrix: Users by Items

Alice  

Bob  

Charles  

✔   ✔   ✔  ✔   ✔  

✔   ✔  

Page 20: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Co-occurrence Matrix: Items by Items

-­‐  

1   2  1   1  

1  1  

2   1  

How  do  you  tell  which  co-­‐occurrences  are  useful?  

0  0  

0   0  Use  LLR  test  to  turn  co-­‐occurrence  into  indicators…  

Page 21: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Co-occurrence Binary Matrix

1  1  not  

not  

1  

Page 22: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Spot the Anomaly

A   not  A  

B   13   1000  

not  B   1000   100,000  

A   not  A  

B   1   0  

not  B   0   2  

A   not  A  

B   1   0  

not  B   0   10,000  

A   not  A  

B   10   0  

not  B   0   100,000  

What  conclusion  do  you  draw  from  each  situa9on?  

Page 23: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

•  Root LLR is roughly like standard deviations •  In Apache Mahout, RowSimilarityJob uses  LLR

Spot the Anomaly

A   not  A  

B   13   1000  

not  B   1000   100,000  

A   not  A  

B   1   0  

not  B   0   2  

A   not  A  

B   1   0  

not  B   0   10,000  

A   not  A  

B   10   0  

not  B   0   100,000  

0.90   1.95  

4.52   14.3  

Page 24: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Indicator Matrix: Anomalous Co-cccurrence

✔  ✔  

Result:  The  marked  row  will  be  added  to  the  indicator  field  in  the  item  document  …    

Significant  co-­‐occurrences!  indicators    

Page 25: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Indicator Matrix

✔  id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1)

That  one  row  from  indicator  matrix  becomes  the  indicator  field  in  the  Solr  document  used  to  deploy  the  recommenda@on  engine  

Note:  data  for  the  indicator  field  is  added  directly  to  meta  data  for  a  document  in  Solr  index.    You  don’t  need  to  create  a  separate  index  for  the  indicators.  

Page 26: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Demo time!

Page 27: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Internals of the Recommender Engine

27  

Page 28: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry”

Looking Inside LucidWorks

28  

Page 29: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
Page 30: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Metrics and logs (5)

Cooccurrence analysis (7)

Post to search

engine (8)

Search engine (4)

Presentation tier (2)

User behavior generator (1)

Session collector

(3)

History collector (6)

Diagnostic browsing (9)

http://bita.ly/18vbbaT    

Page 31: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Example: search based recommendation

Page 32: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

•  Sample Query –  Current location –  Recent merchant descriptions –  Recent merchant id’s –  Recent SIC codes –  Recent accepted offers –  Local Top40

•  Sample Document –  Merchant Id –  Field for text description –  Phone –  Address –  Location

–  Indicator merchant id’s –  Indicator industry (SIC) id’s –  Indicator offers –  Indicator text –  Local Top40

original  data  and  meta-­‐data  

derived  from  co-­‐occurrence  analysis  

recommendaRon  query  

Search-based recommendation

Page 33: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

SolR  Indexer  SolR  

Indexer  Solr  

indexing  Co-­‐occurrence  

(Mahout)  

Item  meta-­‐data   Index  shards  

complete  history  

Analyze with MapReduce

Page 34: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

SolR  Indexer  SolR  

Indexer  Solr  

search  Web  Rer  

Item  meta-­‐data   Index  shards  

user  history  

Deploy with Conventional Search System

Page 35: 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

•  Kudos to Ted Dunning, Grant Ingersoll and LucidWorks, for the idea & the demo!

•  Get in touch: Twitter—@mhausenblas, @MapR

•  Ah, and, btw: we’re hiring ;)

Outro