Monte Carlo Tree Search for Scheduling Activity...
Embed Size (px)
Transcript of Monte Carlo Tree Search for Scheduling Activity...

Selection: select state 𝑆0:𝑡 .Simulation: update the expected utility for the simulations, given 𝑟 Π𝑠𝑖𝑚 .Backpropagation: propagate the expected utility back to the root node by updating allthe nodes on the path using:
𝑄′ 𝑆0:𝑡, 𝑎𝑡 ← 𝑄′ 𝑆0:𝑡, 𝑎𝑡 + 𝐶 𝑟 Π𝑠𝑖𝑚
∀ 𝑆0:𝑡 ∈ Πsim, 𝑡 = {1,2, … , 𝐵}
Results
References
AcknowledgmentNSF IIS 1018490, ONR MURI N00014-10-1-0933,
DARPA MSEE FA 8650-11-1-7149
Monte Carlo Tree Search for Scheduling Activity RecognitionMohamed R. Amer, Sinisa Todorovic, Alan Fern
Oregon State University, Corvallis, OR, USASong-Chun Zhu
University of California, Los Angeles, CA, USA
Problem StatementGiven: a large video with high resolution and multipleactivities happening at the same time, and a query.Goal: perform cost-sensitive parsing to answer the query.
Approach
Activity Parsing as a Search ProblemMotivationQuerying surveillance videos requires running manydetectors of:• group activities,• individual actions,• objects.Running all the detectors is inefficient as they mayprovide little to no information for answering the query.
Parsing spatiotemporal And-Or Graphs can be defined in
terms of (α, β, ɣ, ω):• α: detecting objects, individual actions, group activities
directly from video features.
• β: predicting an activity from its detected parts by
bottom-up composition.
• ɣ: top-down prediction of an activity using its context.• ω: predicting an activity at a given time based on
tracking across the video.
log 𝑝 𝒑𝒈𝑙𝜏 ∝ 𝑤𝛼
𝑇 𝜙 𝑥𝑙𝜏, 𝑦, ℎ + 𝑤𝛾
𝑇 𝜙 𝑥𝑙−𝜏 , 𝑦, ℎ
+ 𝑖,𝑗𝑤𝛽
𝑇 𝜙 𝑥𝑖𝑙+𝜏 , 𝑥𝑗𝑙+
𝜏 , 𝑦, ℎ + 𝑤𝛼𝑇 𝜙 𝑥𝑙
𝜏, 𝑥𝑙𝜏−, 𝑦, ℎ
α detector of activity ɣ context of activity
ω tracking of activityβ relations between parts of the activities
Selection Simulation Backpropagation
Learning[4] M. Amer, D. Xie, M. Zhao, S. Todorovic, S-C. Zhu. “Cost-sensitive top-down/bottom-up inference for multi-scale activity recognition” ECCV12.[6] W. Choi, S. Savarse. “A unified frame work for multi-target tracking and collective activity recognition” ECCV12.[7] W. Choi, K. Shahid, S. Savarse. “What are they doing?: Collective activity classification using spatio-temporal relationship among people.” ICCV-W09[9] S. Khamis, V. Morariu, L.S. Davis. ”Combining per-frame and per-track cues for multi-person action recognition.” ECCV12
Classification accuracy on New Collective Activity dataset
Classification accuracy on UCLA Courtyard dataset
Limited budget
Infinite budget
Classification accuracy on Collective Activity dataset
New from [4]
[6] Tracking detections:
(V3), (QL), [4] Parse graph with no tracking:
(V2), [9] Parse graph with tracking the query:
(V1) Parse graph with tracking all: NEW from [4]
Action: 𝑎𝑡 ∈ 𝑝𝑟𝑜𝑐𝑒𝑠𝑠, 𝑑𝑒𝑡𝑒𝑐𝑡𝑜𝑟, 𝑠𝑝𝑎𝑡𝑖𝑜𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 ,State: 𝑆0:𝑡 = 𝑠0 𝑎0, 𝑎1, … , 𝑎𝑡−1 ,Policy: Π = 𝑠0, 𝑎0:𝑡 𝑡 = 1,2, … , 𝐵 , where 𝐵 is the inference budget,Reward: 𝑟(Π) = I log 𝑝 𝒑𝒈(Π) > 𝜃 , where 𝜃 is an input threshold.
The cost-sensitive inference for a policy, Π, is conducted by taking the set of actions, 𝑎0:𝐵, that provides the maximum utility, 𝑄, for states, 𝑆0:𝐵:
Π = argmaxa𝑄 𝑆0:𝑡, 𝑎𝑡 , ∀ 𝑡 = 1,2, … , 𝐵 ,
𝑖𝑓 𝑟 Π = 1 𝑎𝑐𝑐𝑒𝑝𝑡,0 𝑟𝑒𝑗𝑒𝑐𝑡.
New from [4]