Social Role Discovery in Human...

of 1 /1
Bride Priest Brides maid Grooms man Results: Role Clusters Social Role Model Results: Clustering Accuracy Social Role Discovery in Human Events Vignesh Ramanathan, Bangpeng Yao and Li Fei-Fei {vigneshr, bangpeng, feifeili}@cs.stanford.edu Computer Science Department, Stanford University Introduction i v Ψ Σ Ψ Σ bride groom priest grooms men brides maid b’day person parent friends guest presenter recipient host distributor instructor presenter Color of cross represents ground-truth role for wrong assignments - Unary feature weight - Interaction feature weight - Social role assignment - Reference role assignment Jointly infer by variational inference Method Birthday Wedding Award Function Physical Training prior 29.32% 20.17% 62.97% 65.93% K -means 33.88% 29.43% 31.97% 57.67% Only unary 38.25% 39.22% 69.31% 76.69% Interaction as context 41.53% 38.83% 77.75% 77.91% No spatial 43.72% 36.41% 79.54% 82.82% No proxemic 43.72% 39.32% 79.80% 77.91% Full Model 44.81% 42.72% 83.12% 82.82% New YouTube dataset: ~40 videos with 160-240 people per event Human tracks and ground-truth roles annotated Unary features Ψ - HOG3D and Trajectory to capture action - Gender and Color Histogram features - Object interaction features Pairwise interaction features Ψ - Spatio-temporal trajectory features - Proxemic [2] interaction features Interaction restricted to reference role for tractable inference 3. Output: Cluster people into social roles 2. Extract unary and interaction features 1. Input: videos with human tracks Our Approach - Does not require role annotations - Clusters people into roles based on interactions as well as person- specific features Only unary – No interaction feature Interaction as context – Average interaction as unary No spatial – Only proxemic interaction No proxemic – Only spatial interaction groom bride gr. man br. maids Social Roles describe humans in an event Social roles of humans are dependent on - their actions in a social setting - their interactions with other roles Obtaining role annotations for training is expensive Goal: Discover role clusters in a social event based on role-specific interactions + + + + + [1] V. Ramanathan, B. Yao, L. Fei-Fei. Social Role Discovery in Human Events. In CVPR, 2013. [2] Y. Yang, S. Baker, A. Kannan, and D. Ramanan. Recognizing proxemics in personal photos. In CVPR, 2012. This work was supported in part by DARPA Minds Eye, NSF, Intel, Microsoft Research, Google Research and the Intelligence Advanced Research Projects Activity* (IARPA) via Department of Interior National Business Center contract number D11PC20069. * The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government. Spatial relations in wedding. Cross-arrow is the position of the reference role (groom)

Embed Size (px)

Transcript of Social Role Discovery in Human...

  • Bride Priest Brides maid Grooms man

    Results: Role Clusters

    Social Role Model

    Results: Clustering Accuracy

    Social Role Discovery in Human EventsVignesh Ramanathan, Bangpeng Yao and Li Fei-Fei

    {vigneshr, bangpeng, feifeili}@cs.stanford.eduComputer Science Department, Stanford University

    Introduction

    𝑚𝒔𝑖

    𝛽

    i vΨ𝑝

    Σ𝛼

    Ψ𝑢𝑖

    Σ𝛽

    𝛼

    bridegroompriest

    grooms menbrides maid

    b’day personparentfriendsguest

    presenterrecipient

    hostdistributor

    instructorpresenter

    • Color of cross represents ground-truth role for wrong assignments

    𝛼 - Unary feature weight- Interaction feature weight- Social role assignment- Reference role assignment

    𝛽𝒔𝑖𝑚

    Jointly infer by variational

    inference

    Method Birthday Wedding Award Function

    PhysicalTraining

    prior 29.32% 20.17% 62.97% 65.93%

    K-means 33.88% 29.43% 31.97% 57.67%

    Only unary 38.25% 39.22% 69.31% 76.69%

    Interactionas context

    41.53% 38.83% 77.75% 77.91%

    No spatial 43.72% 36.41% 79.54% 82.82%

    No proxemic 43.72% 39.32% 79.80% 77.91%

    Full Model 44.81% 42.72% 83.12% 82.82%

    • New YouTube dataset: ~40 videos with 160-240 people per event• Human tracks and ground-truth roles annotated

    Unary features

    Ψ𝑢𝑖

    - HOG3D and Trajectory to capture action- Gender and Color Histogram features- Object interaction features

    Pairwise interaction features

    Ψ𝑃 - Spatio-temporal trajectory features- Proxemic[2] interaction features

    • Interaction restricted to reference role for tractable inference

    3. Output: Cluster people into social roles

    2. Extract unary and interaction features

    1. Input: videos with human tracks

    Our Approach

    - Does not require role annotations

    - Clusters people into roles based on interactions as

    well as person-specific features

    • Only unary – No interaction feature

    • Interaction as context – Average interaction as unary

    • No spatial – Only proxemic interaction

    • No proxemic – Only spatial interaction

    groombride gr. manbr. maids• Social Roles describe humans in an event•Social roles of humans are dependent on

    - their actions in a social setting- their interactions with other roles

    • Obtaining role annotations for training is expensive

    •Goal: Discover role clusters in a social event based on role-specific interactions

    + + +

    +

    +

    [1] V. Ramanathan, B. Yao, L. Fei-Fei. Social Role Discovery in Human Events. In CVPR, 2013.[2] Y. Yang, S. Baker, A. Kannan, and D. Ramanan. Recognizing proxemics in personal photos. In CVPR, 2012.This work was supported in part by DARPA Minds Eye, NSF, Intel, Microsoft Research, Google Research and the Intelligence AdvancedResearch Projects Activity* (IARPA) via Department of Interior National Business Center contract number D11PC20069.* The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright thereon. Disclaimer: The views and conclusions contained herein arethose of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.

    • Spatial relations in wedding. Cross-arrow is the position of the reference role (groom)