Odds & Ends

Administrivia•Reminder: Q3 Nov 10

•CS outreach:

•UNM SOE holding open house for HS seniors

•Want CS dept participation

•We want to show off the coolest things in CS

•Come demo your P1 and P2 code!

•Contact me or Lynne Jacobson

The bird of time...•Last time:

•Eligibility traces

•The SARSA(λ) algorithm

•Design exercise

•This time:

•Tip o’ the day

•Notes on exploration

•Design exercise, cont’d.

Tip o’ the day•Micro-experiments

•Often, often, often when hacking:

•“How the heck does that function work?”

•“The docs don’t say what happens when you hand null to the constructor...”

•“Uhhh... Will this work if I do it this way?”

•“WTF does that mean?”

•Could spend a bunch of time in the docs

•Or...

•Could just go and try it

Tip o’ the day•Answer: micro-experiments

•Write a very small (<50 line) test program to make sure you understand what the thing does

•Think: homework assignment from CS152

•Quick to write

•Answers question better than docs can

•Builds your intuition about what the machine is doing

•Using the debugger to watch is also good

Action selection in RL

Q learning in code...public class MyAgent implements Agent {

public void updateModel(SARSTuple s) {State2d start=s.getInitState();State2d end=s.getNextState();Action act=s.getAction();double r=s.getReward();Action nextAct=_policy.argmaxAct(end);double Qnow=_policy.get(start,act);double Qnext=_policy.get(end,nextAct);double Qrevised=Qnow+getAlpha()*(r+getGamma()*Qnext-Qnow);

_policy.set(start,act,Qrevised);}

The SARSA(λ) codepublic class SARSAlAgent implements Agent {

public void updateModel(SARSTuple s) {State2d start=s.getInitState();State2d end=s.getNextState();Action act=s.getAction();double r=s.getReward();Action nextAct=pickAction(end);double Qnow=_policy.get(start,act);double Qnext=_policy.get(end,nextAct);double delta=r+_gamma*Qnext-Qnow;setElig(start,act,getElig(start,act)+1.0);for (SAPair p : getEligiblePairs()) {

currQ=_policy.get(p.getS(),p.getA());_policy.set(p.getS(),p.getA(),

currQ+getElig(p.getS(),p.getA())*_alpha*delta);setElig(p.getS(),p.getA(),

getElig(p.getS(),p.getA())*_gamma*_lambda);}

Q & SARSA(λ): Key diffs•Use of eligibility traces

•Q updates single step of history

•SARSA(λ) keeps record of visited state/action pairs: e(s,a)

•Updates Q(s,a) value in proportion to e(s,a)

•Decays e(s,a) by λ each step

Q & SARSA(λ): Key diffs•How “next state” action is picked

•Q: nextAct=_policy.argmaxAct(end)

•Picks “best” next state

•SARSA: nextAct=RLAgent.pickAction(end)

•Picks next state that agent would pick

•Huh? What’s the difference?

Exploration vs. exploitation•Sometimes, agent wants to do something other

than “best currently known action”

•Why?

•If agent never tries anything new, it may never discover that there’s a better answer out there...

•Called the “exploration vs. exploitation” tradeoff

•Is it better to “explore” to find new stuff, or to “exploit” what you already know?

ε-Greedy exploration•Answer:

•“Most of the time” do the best known thing

•act=argmaxa(Q(s,a))

•“Rarely” try something random

•act=pickAtRandom(allActionSet)

•ε-greedy exploration policies:

•“rarely”==prob ε

•“most of the time”==prob 1-ε

ε-Greedy in codepublic class eGreedyAgent implements RLAgent {// implements the e-greedy exploration policypublic Action pickAction(State2d s) {final double rVal=_rand.nextDouble();if (rVal<_epsilon) {return randPick(_ASet);

}return _policy.argmaxAct(s);

private final Set<Action> _ASet;private final double _epsilon;

Design Exercise:Experimental Rig

Design exercise•For M4/Rollout, need to be able to:•Train agent for many trials/steps per trial•Generate learning curves for agent’s learning•Run some trials w/ learning turned on•Freeze learning•Run some trials w/ learning turned off•Average steps-to-goal over those trials•Save average as one point in curve

•Design: objects/methods to support this learning framework

•Support: diff learning algs, diff environments, diff params, variable # of trials/steps, etc.

Odds & Ends

Documents

Transcript of Odds & Ends

Journal of Statistical Software : Odds Modelling and …...Lets review the Dixon & Coles 1996 model and t the decay parameter into our basic model. Figure 5: decay rates After simulation,

200:1 - Do You Trust Your Mobile Security Odds?

PM System Architecture Front-Ends, Servers, Triggering

Συστήματα Μικροπολογιστν”ιαφάνειες/mP7... · ενδεικτικά γιά τον masm 8086 τις segment/ends, assume, group, label καιprog/endp,

CISC 4090 Theory of Computationstorm.cis.fordham.edu/leeds/cisc4090S17/Lecture1_feb13b.pdf2/13/2017 4 What is the language of M4? (page 38, Ex. 1.11) L(M4)={w| w ends and begins with

ODDS WIZARDnewhavensoft.net/pdf/Oddswiz.el.pdf4.5 Διάφορς πιλογές 24 5 Εργαλία 25 5.1 Μ 0 2α 2ροπέας αποό 1 0ων 25 5.2 Οηγός Stake Wizard 4

Amplifier for MP3 Louds- peaker · 6. Take the connection cable from the MP3 player.Connect (solder ) it to the input on the circuit board as shown overleaf. The ends of the two cables

MANUEL D’UTILISATION FR BEDIENUNGSANLEITUNG DE ...€¦ · Do not pick up hard or sharp objects, matches, hot ashes, cigarette ends or other similar items. Do not spray with or

CHAPTER not to be republished © NCERT = vertical height of the frustum, l = slant height of the frustum and r 1 and r 2 are radii of the two bases (ends) of the frustum. • Solid

Praktikum 4 - eau.eektanel/DK_0007/DK_praks4.pdf · funktsioon fisher.test automaatselt ka šansside suhte (OR, odds ratio) ja selle 95%-usaldus-intevalli. Näiteks soovides testida,

Kappa, ODDS ratio, Relative risks: Measurement of agreement€¦ · Odds ratio It measures of association in case-control studies. H 0: OR=1 H A: OR≠1 An alternative measure of

Strength of Material - drshahpak.weebly.com 228 A 6-in.-long bronze tube, with closed ends, ... Stress strain diagram of material . ... the stress strain relationship is

Formal Languages - Southern · Discrete Mathematical Structures Formal Languages 2. String Operations ... If the NFA scans the complete string and ends in a non-ﬁnal state, ...

Standing waves - University of Toledoastro1.panet.utoledo.edu/~mheben/PHYS_2130/Chapter17-1_mh.pdf · Standing waves and resonance For a string clamped at both ends, at certain frequencies,

PVldPressure Vessel and Combined Loading - …abindustgroup.com/wp-content/uploads/2016/05/PressureVessel... · Cylindrical vessel with capped ends Spherical vessel – Constant gage

Hercules, CA 94547 USA fiLe Set Up At 50% Size. …...96-Well Plate Uniformity Average % CV of MFI Ends* 96 Wells Columns 1–12 Rows A–H Columns Rows IS 4 3 4 3 4 IL-1β 3 3 3 3

Maximum Likelihood Estimation for Proportional Odds ...

Tabel Kontingensi 2x2 (4) - pika silvianti · or absence of congenital sex organ malformations. ... Tabel 2x2 men women total Rasio odds dieting a b a + b not ... Hitung semua nilai

Loose Ends: stability, various definition9.520/spring10/Classes/class13_loose_2010.pdf · Title: Loose Ends: stability, various definition Author: Tomaso Poggio Subject: Stability

Data Patch Cords - Addimax Addison Cabling System 24, 26AWG 4 pairs UTP, FTP, STP patch cables. Cord Length: 1m to 15m for selection. Outer Jacket: PVC/FRPVC/LSZH. Two ends terminated