Social Role Discovery in Human...

1
Bride Priest Brides maid Grooms man Results: Role Clusters Social Role Model Results: Clustering Accuracy Social Role Discovery in Human Events Vignesh Ramanathan, Bangpeng Yao and Li Fei-Fei {vigneshr, bangpeng, feifeili}@cs.stanford.edu Computer Science Department, Stanford University Introduction i v Ψ Σ Ψ Σ bride groom priest grooms men brides maid b’day person parent friends guest presenter recipient host distributor instructor presenter Color of cross represents ground-truth role for wrong assignments - Unary feature weight - Interaction feature weight - Social role assignment - Reference role assignment Jointly infer by variational inference Method Birthday Wedding Award Function Physical Training prior 29.32% 20.17% 62.97% 65.93% K -means 33.88% 29.43% 31.97% 57.67% Only unary 38.25% 39.22% 69.31% 76.69% Interaction as context 41.53% 38.83% 77.75% 77.91% No spatial 43.72% 36.41% 79.54% 82.82% No proxemic 43.72% 39.32% 79.80% 77.91% Full Model 44.81% 42.72% 83.12% 82.82% New YouTube dataset: ~40 videos with 160-240 people per event Human tracks and ground-truth roles annotated Unary features Ψ - HOG3D and Trajectory to capture action - Gender and Color Histogram features - Object interaction features Pairwise interaction features Ψ - Spatio-temporal trajectory features - Proxemic [2] interaction features Interaction restricted to reference role for tractable inference 3. Output: Cluster people into social roles 2. Extract unary and interaction features 1. Input: videos with human tracks Our Approach - Does not require role annotations - Clusters people into roles based on interactions as well as person- specific features Only unary – No interaction feature Interaction as context – Average interaction as unary No spatial – Only proxemic interaction No proxemic – Only spatial interaction groom bride gr. man br. maids Social Roles describe humans in an event Social roles of humans are dependent on - their actions in a social setting - their interactions with other roles Obtaining role annotations for training is expensive Goal: Discover role clusters in a social event based on role-specific interactions + + + + + [1] V. Ramanathan, B. Yao, L. Fei-Fei. Social Role Discovery in Human Events. In CVPR, 2013. [2] Y. Yang, S. Baker, A. Kannan, and D. Ramanan. Recognizing proxemics in personal photos. In CVPR, 2012. This work was supported in part by DARPA Minds Eye, NSF, Intel, Microsoft Research, Google Research and the Intelligence Advanced Research Projects Activity* (IARPA) via Department of Interior National Business Center contract number D11PC20069. * The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government. Spatial relations in wedding. Cross-arrow is the position of the reference role (groom)

Transcript of Social Role Discovery in Human...

Page 1: Social Role Discovery in Human Eventsai.stanford.edu/~vigneshr/cvpr_13/cvpr13_social_roles.pdfVignesh Ramanathan, Bangpeng Yao and Li Fei-Fei {vigneshr, bangpeng, feifeili}@cs.stanford.edu

Bride Priest Brides maid Grooms man

Results: Role Clusters

Social Role Model

Results: Clustering Accuracy

Social Role Discovery in Human EventsVignesh Ramanathan, Bangpeng Yao and Li Fei-Fei

{vigneshr, bangpeng, feifeili}@cs.stanford.eduComputer Science Department, Stanford University

Introduction

𝑚𝒔𝑖

𝛽

i vΨ𝑝

Σ𝛼

Ψ𝑢𝑖

Σ𝛽

𝛼

bridegroompriest

grooms menbrides maid

b’day personparentfriendsguest

presenterrecipient

hostdistributor

instructorpresenter

• Color of cross represents ground-truth role for wrong assignments

𝛼 - Unary feature weight- Interaction feature weight- Social role assignment- Reference role assignment

𝛽𝒔𝑖𝑚

Jointly infer by variational

inference

Method Birthday Wedding Award Function

PhysicalTraining

prior 29.32% 20.17% 62.97% 65.93%

K-means 33.88% 29.43% 31.97% 57.67%

Only unary 38.25% 39.22% 69.31% 76.69%

Interactionas context

41.53% 38.83% 77.75% 77.91%

No spatial 43.72% 36.41% 79.54% 82.82%

No proxemic 43.72% 39.32% 79.80% 77.91%

Full Model 44.81% 42.72% 83.12% 82.82%

• New YouTube dataset: ~40 videos with 160-240 people per event• Human tracks and ground-truth roles annotated

Unary features

Ψ𝑢𝑖

- HOG3D and Trajectory to capture action- Gender and Color Histogram features- Object interaction features

Pairwise interaction features

Ψ𝑃 - Spatio-temporal trajectory features- Proxemic[2] interaction features

• Interaction restricted to reference role for tractable inference

3. Output: Cluster people into social roles

2. Extract unary and interaction features

1. Input: videos with human tracks

Our Approach

- Does not require role annotations

- Clusters people into roles based on interactions as

well as person-specific features

• Only unary – No interaction feature

• Interaction as context – Average interaction as unary

• No spatial – Only proxemic interaction

• No proxemic – Only spatial interaction

groombride gr. manbr. maids• Social Roles describe humans in an event•Social roles of humans are dependent on

- their actions in a social setting- their interactions with other roles

• Obtaining role annotations for training is expensive

•Goal: Discover role clusters in a social event based on role-specific interactions

+ + +

+

+

[1] V. Ramanathan, B. Yao, L. Fei-Fei. Social Role Discovery in Human Events. In CVPR, 2013.[2] Y. Yang, S. Baker, A. Kannan, and D. Ramanan. Recognizing proxemics in personal photos. In CVPR, 2012.This work was supported in part by DARPA Minds Eye, NSF, Intel, Microsoft Research, Google Research and the Intelligence AdvancedResearch Projects Activity* (IARPA) via Department of Interior National Business Center contract number D11PC20069.

* The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright thereon. Disclaimer: The views and conclusions contained herein arethose of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.

• Spatial relations in wedding. Cross-arrow is the position of the reference role (groom)