Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · •...

32
How taxonomies and facets bring endusers closer to big data Anna Divoli @annadivoli Boston Oct 2012

Transcript of Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · •...

Page 1: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

How  taxonomies  and  facets    bring  end-­‐users  closer  to  big  data  

   

Anna  Divoli  @annadivoli  

 Boston  Oct  2012  

Page 2: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

   Taxonomies    

•  τάξις/τάξη  +  νομία  (arrangement/class  +  method/rule/law)  •  hierarchical  classificaIon    •  formal  nomenclature    •  varied  dimensions    •  evaluaIon/measures/metrics  •  types:  manually  constructed,  social,  auto-­‐generated  •  purposes:  auto-­‐indexing,  search  facilitaIon,  navigaIon,  

knowledge  management,  organizaIon….  •  it  is  OK  to  change  the  classificaIon  systems  to  adjust  to  new  

knowledge  –  not  just  adding  new  concepts    •  the  data  have  become  “big”  and  available  but  not  accessible  •  many  “end  users”  

 Boston  Oct  2012  

Page 3: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

User  Studies  Types    Specialized  domain  studies:    

1.    Facets  (HCIR):  Biomedical  ScienIsts  

2.    Expert  needs  (media  group)    

 UI  preferred  features  studies:    

3.    ExisIng  popular  systems  (EuroHCIR)  

4.    Mock  ups  of  specific  features  (survey)        

 Boston  Oct  2012  

Anna  Divoli  and  Alyona  Medelyan    Search  interface  feature  evalua5on  in  biosciences,  HCIR  2011,  Google,  Mountain  View,  CA  

MaDhew  Pike,  Max  L.  Wilson,  Anna  Divoli  and  Alyona  Medelyan  CUES:  Cogni5ve  Usability  Evalua5on  System,  EuroHCIR  2012,  Nijmegen,  Netherlands    

Page 4: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

 Boston  Oct  2012  

Our  studies            

                                 1.    Facets  (HCIR):  Biomedical  ScienIsts  

     

Anna  Divoli  and  Alyona  Medelyan    Search  interface  feature  evalua5on  in  biosciences,  HCIR  2011,  Google,  Mountain  View,  CA  

Page 5: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Facets  –  favorite  feature  for  search  systems  

 Boston  Oct  2012  

Anna  Divoli  and  Alyona  Medelyan,    Search  interface  feature  evalua5on  in  biosciences,  HCIR  2011,  Google,  Mountain  View,  CA,  USA    

Page 6: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

 Boston  Oct  2012  

Facets  (in  search  systems)  

animal  models  hunIngton  disease  

Page 7: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Bio-­‐Facets                                        Most  liked                                                                                                                                                                                                                                                                                Least  liked                    

 Boston  Oct  2012  

animal  models  hunIngton  disease  

Page 8: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Facets  as  search  features  for  biomedical  scienKsts:  Findings    •  Faceted  search  is  the  most  important  stand  alone  feature  in  a  search  

interface  for  bioscienIsts.  

•  Few,  query-­‐oriented  facets  presented  as  checkboxes  work  best.  •  Overly  simple  aestheIcs,  although  not  desirable,  do  not  hurt  overall  

UI  score.  

•  Complex  aestheIcs  turn  users  away  from  the  systems.  

•  BioscienIsts  prefer  tools  that  help  them  narrow  their  search,  not  expand  it.  

•  For  generic  search:  doc-­‐based  facets.              For  domain-­‐specific  search:  query-­‐based  facets.  

 Boston  Oct  2012  

Page 9: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Facets  as  search  feature:  likes  &  dislikes  

Autocomplete

Search expansions★

Facetted refinement

Related searches

Results preview★

+ positive comments - negative comments italics comments on aesthetics

same ranking for both baseline & own query ★ not many systems tested so no low rank ratings

-  unhelpful symbols -  too complex - too much info - hard to read

Semedico

- no diversity - too complex + highlighting - hard to read

NextBio

- too general + font color & size - unclear presentation - noisy, many symbols

GoPubMed

+ relevant suggestions + good coverage

- blue font color - small font size

PubMed

+ diff types of info + simple

+ highlighting

Bing

+ “review” suggestion +/- simple

+ highlighting + used to

Google

- too general - not useful

+ overall look - color

Semedico

+ good functionality -  specialized

- unclear functionality - too complex

PubMed

+ useful - redundancies

+ simple & clean + less options

Pingar

+ useful categories - slow functionality - too complex/busy - too many colors

Semedico

+ “reviews” category - limited functional. - poor design

PubMed

+ quick paper access + simple

+ vertical list - nothing special

Solr

+ “top terms” useful - too many symbols

- too busy - colors

GoPubMed

+ useful categories + simple

+ vertical list -  not special, colorless

Pingar DB

+ useful categories + simple

+ vertical list -  not special, colorless

Pingar QB

- not scientific + colors - too small - too busy

Bing

+ relevant - poor context - no variety

PubMed

- limited options + clickable + font size

+ few options

Pingar

+ good suggestions - redundancies

+ font color /blue links -  too busy

Google

+ specific keywords - snippets lack context

+color + font color

Solr

+ helpful keywords + mouseover

- pale - font size & style

Pingar

positive neutral negative 1 participant !

br: browsing ff: fact finding ig: information gathering

br ff ig

br ff ig

br ff ig

br ff ig

br ff ig

Legend

Ranked Last Ranked First

 Boston  Oct  2012  

•  Useful  categories  •  Simple  •  VerIcal  list  

•  Too  complex/busy  •  Too  many  colors  •  Poor  design  •  Limited  funcIonality    •  Too  many  symbols  •  Not  special/  Colorless  

Page 10: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

 Boston  Oct  2012  

Our  studies            

 2.    Expert  needs  (media  group)    

 

Page 11: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Case  Study:  Media  Group    They  have  a  system/”taxonomy”  in  place  that  nobody  maintains  or  uses…  

~  10,000  arIcles  /  week,  ~5  million  in  their  archives  ~  21  years,  10,000  authors  Handful  of  top  categories  

Main  reasons/uses:    -­‐  AdverIsement  -­‐  Packing  up  stories  and  selling  them  -­‐  Readers  finding  stories  &  related  stories  -­‐  Journalists  finding  related  stories  

   

 Boston  Oct  2012  

Page 12: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Expert  content  needs  -­‐  Case  Study:  Media  Group      à  Ideally  update  the  taxonomy  daily/weekly  à Must  be  dynamic  &  handle  new  cases/concepts  à  Deep  nesIng  is  OK  à  If  mulIple  inheritance,  need  to  disambiguate  where  a  

parIcular  arIcle  belongs  to    à  Be  able  to  edit  (be  able  to  verify  ,  in  case  of  anomalies  

based  on  automaIon  &  move  nodes  around)  

 Boston  Oct  2012  

Page 13: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

 Boston  Oct  2012  

Our  studies              

 3.    ExisIng  popular  systems  (EuroHCIR)  

 MaDhew  Pike,  Max  L.  Wilson,  Anna  Divoli  and  Alyona  Medelyan  CUES:  Cogni5ve  Usability  Evalua5on  System,  EuroHCIR  2012,  Nijmegen,  Netherlands    

Page 14: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Exploring  UI  features  -­‐  Systems  Tested:  Yippy,  Carrot,  MeSH,  ESD  

   

 Boston  Oct  2012  

Page 15: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Exploring  UI  features  -­‐  Systems  Tested:  Yippy,  Carrot,  MeSH,  ESD  

   

 Boston  Oct  2012  

Page 16: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Exploring  UI  features  -­‐  Systems  Tested:  Yippy,  Carrot,  MeSH,  ESD  

   

 Boston  Oct  2012  

Page 17: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Exploring  UI  features  -­‐  Systems  Tested:  Yippy,  Carrot,  MeSH,  ESD      

 Boston  Oct  2012  

Page 18: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

 Boston  Oct  2012  

Exploring  UI  features  -­‐  Systems  Tested:  Yippy,  Carrot,  MeSH,  ESD      

A  

A B C D E F

B  

C  

D  E  

F  

A B C D E F A B C D E F A B C D E F A B C D E F

Page 19: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Exploring  UI  features  (Yippy,  Carrot,  MeSH,  ESD):  likes  &  dislikes  

 Boston  Oct  2012  

•  Menu  highlighIng  •  Hierarchical  folder  layout  •  Expand  hierarchy  with  “+”  and  “–”  •  Dual  view  (tree  on  ler,  results  on  right)  •  Ability  to  change  visualisaIons  of  taxonomy  •  Search  funcIon  is  important  •  Familiar  interface  with  folders  

•  Too  simple  or  too  much  wriIng  -­‐  would  be  nice  to  have  color  •  Lots  of  scrolling    •  Dots  in  carrot  circle  –  confusing  •  Double  click  on  foam  tree  is  unintuiIve  •  Too  broad  taxonomies  

Page 20: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

 Boston  Oct  2012  

Our  studies            

 4.    Mock  ups  of  specific  features  (survey)        

Page 21: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Taxonomy  UI  preferences  (ongoing  survey):                                    The  (51)  parKcipants  

 Boston  Oct  2012  

60.0%  26-­‐40  12.7%  41-­‐60  0%  61  or  older  

27.3%  25  or  younger  Age:  

52.7%  College/University  43.6%  Graduate  School  

3.6%  High  School  Highest  level  of  educaKon:  

47.3%  Yes,  but  very  liYle  21.8%  Yes  

30.9%  No  

Do  you  have  experience  using  taxonomies?  

47.3%  Very  47.3%  Second  nature  

5.5%  Somewhat  

How  comfortable  you  are  with  computers?  

bit.ly/pingar_taxonomies  

Page 22: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Concept  sorKng  

 Boston  Oct  2012  

44.2%  popularity  (A)  42.3%  alphabeKcally  (B)  13.5%  no  preference  

Page 23: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Displaying  Counts  

 Boston  Oct  2012  

42.3%  A  51.9%  B  5.8%  no  preference  

Page 24: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Using  Labels  

 Boston  Oct  2012  

72.5%  in  frames  (A)  23.5%  with  labels  (B)  3.9%  no  preference  

Page 25: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Plus/minus  signs  or  arrows  

 Boston  Oct  2012  

47.1%  A  37.3%  B  15.7%  no  preference  

Page 26: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Search  Results  Display  

 Boston  Oct  2012  

11.8%  B  70.6%  C  3.9%  no  preference  

13.7%  A  

Page 27: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Search  FuncKonality  

 Boston  Oct  2012  

�����'''�#%"&�)� ���)���$��)�%"&�)����$ " %����#!(#�� ������ ��)� �(������"��"�����*

����!���������

����#!!�"'&��#$'�#"� ��

����������

�������� �������������� ��#� ���& ����!� ���$�!�� ���������&

����!���������

������"�+#(�&��%�����'�*#"#!+��#%���'�%!���#�+#(�)�"'�'#�%�'(%"��(&'��*��'�!�'���&�#%�+#(��%���"'�%�&'����"�$�%'�� �!�'���&��"�������"�!�'���&�'##

� ��&�������������'��'��$$ +�

������������

%��!���!��� ����

���!������!��� ����

��������!��� ����

������������

�� ��#� ���& ����!���!��"� !��� '

��#� ���& ����!���!��"� !��� '

�� �����"� !��������� ��#� ���& ����!���!��"� !��� '

��!��������!��� �'

�������"� !��� '

�������"� !��� '

�������"� !��� '

�������"� !��� '

74.5%  parKal  64.7%  hidden  2.0%  no  preference  

Page 28: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Where  we  stand    

Our  team  works  on  automaIc  generated  taxonomies  but  we  realized  the  need  for  customizaIon  for  specific  needs  

 Boston  Oct  2012  

Page 29: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

“Taxonomy  is  described  someImes  as  a  science  and  someImes  as  an  art,  but  really  it’s  a  bayleground.”                                                    Bill  Bryson,  A  Short  History  of  Nearly  Everything    

 Boston  Oct  2012  

Taxonomy    

Page 30: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

                                   A  rt                                                                                                S  cience  

       T  echnology                          A  rt                    a  X  iomaIc          phil  O  sophy      desig  N                      l  O  gic              hu  M  aniIes          lingu  I  sIcs                            E  thnonology                            S  cience  

 Boston  Oct  2012  

Page 31: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

Summary    •  There  is  a  place  for  manually,  socially  and  automaIcally  

generated  taxonomies  (as  well  as  hybrids).  •  Text  is  “big”  and  in  many  fields  dynamic.  •  “End-­‐users”  (not  InformaIon  Management  experts)  need  

access  to  “big  text”.  •  Auto-­‐generated  taxonomies  with  manual  ediIng  faciliIes  

is  now  possible  &  makes  sense.  •  Domain  specific  background  knowledge  is  vital  for  the  

quality  and  detail  required  per  soluIon.  •  User  friendly  systems  are  very  important  for  end  users.  

 Boston  Oct  2012  

Page 32: Howtaxonomiesandfacets bring$enduserscloser$tobig$data … · 2012-10-04 · • Faceted!search!is!the!mostimportantstand!alone!feature !in!asearch! ... Results preview ... Solr +

 Boston  Oct  2012  

Acknowledgements    Alyona  Medelyan  (Pingar)  Max  L.  Wilson  (Swansea/No{ngham)  Mayhew  Pike  (Swansea/Pingar)    Pingar  Brains    All  65+  anonymous  studies  parIcipants!  

pingar.com