Learning, Volatility and the ACC Tim Behrens FMRIB + Psychology, University of Oxford FIL - UCL.

33
Learning, Volatility and the ACC Tim Behrens FMRIB + Psychology, University of Oxford FIL - UCL.

Transcript of Learning, Volatility and the ACC Tim Behrens FMRIB + Psychology, University of Oxford FIL - UCL.

Learning, Volatility and the ACC

Tim BehrensFMRIB + Psychology, University of Oxford

FIL - UCL.

B

Trials Into Past

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Rew

ard

His

tory

Wei

gh

t (β

)

CON

i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8Kennerley, et al., NatureNeuroscience, 2006

ACCs

B

Trials Into Past

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Rew

ard

His

tory

Wei

gh

t (β

)

CON

i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8Kennerley et al. NatureNeuroscience, 2006

Monkeys will sacrifice food opportunities to look at other monkeys

ACCGRudebeck,et al. Science 2005

Interest in other individualsis reduced after ACC gyrus lesion

ACCGRudebeck,et al. Science 2005

Anatomy - Differences in connections between ACCs and

ACCg.

•Connections unique to the sulcus are mainly with motor regions:• Primary motor cortex

• Premotor cortex

• Parietal motor areas

• Spinal Cord

• ACCs has information about our own actions

Anatomy - Differences in connections between ACCs and

ACCg.• Connections unique to the gyrus are mainly

with regions that process emotional and biological stimuli:

• Periacqueductal grey

• hypothalamus

• STS/STG

• Insula/Temporal pole connections are stronger to the gyrus

• ACCg has access to information about other agents.

Anatomy - shared connections between ACCs and ACCg.

•Some shared connections • Orbitofrontal cortex

• Amydala

• Ventral striatum

• ACCg and ACCs are strongly interconnected

• Both regions have access to and influence over reward and value processing.

ACC Sulcus and learning about your actions.

ACCs

B

Trials Into Past

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Rew

ard

His

tory

Wei

gh

t (β

)

CON

i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8Kennerley et al. NatureNeuroscience, 2006

Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005

Trials Into Past

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Rew

ard

His

tory

Wei

gh

t (β

)

CON

i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8

What determines the integration length?

Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005

VOLATILEReward probabilities changeapproximately every 25 trials

STABLEReward probabilities changeonly after hundreds of trials

Trials Into Past

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Rew

ard

His

tory

Wei

gh

t (β

)

CON

i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8

Reinforcement learning

• We need to continually re-appraise the value of an action based each new experience.

δprediction

(Vt)

outcome

αxδnew prediction

(Vt+1)

Updating beliefs on the basis of new information

14

Vt+1=Vt +( αxδ

The learning rate is the weight given to the current information

The prediction erroris the information

available from this event

The learning rate and the value of information.

Vt+1=Vt +( αxδ

The learning rate should represent the value of the current information

for guiding future beliefs.

α=0.01

α=0.4α=0.1

Relationship with integration length

stable

37

63

Behrens et al., Nature Neuroscience, 2007

Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007

Vt+1=Vt+αxδ

changes in reward estimates occur throughout the task…

Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007

…as do change in volatility estimates

DecideMonitor

Monitor x

Volatility

Behrens et al., Nature Neuroscience, 2007

ACC effect size predicts learning rate across subjects

Behrens, Woolrich, Walton &Rushworth Nat Neurosci 2007

ACC Gyrus and learning about your social

partners.

Interest in other individualsis reduced after ACC gyrus lesion

ACCGRudebeck et al. Science 2005

Rudebeck et al., Science, 2006

25

Learning about other agents

37

63

Behrens, Hunt, Woolrich, Rushworth Nature 2008

Sources of information

Probability that confederate advice is good Probability that correct colour is blue

Value of action information Value of social information

Behrens, Hunt, Woolrich, Rushworth Nature 2008

Social information is integrated over time - behaviour

Reward Prediction Error

Reward -Expectation

Vt+1=Vt +( αxδ

Outcome

Time

Eff

ect

siz

e

Behrens, Hunt, Woolrich, Rushworth Nature 2008

Prediction error on a social partner.

Lie event -Lie prediction

Vt+1=Vt +( αxδ

Outcome

Time

Eff

ect

siz

e

Behrens, Hunt, Woolrich, Rushworth Nature 2008

The value of information and the ACC

30

Value of reward informationValue of social information

Vt+1=Vt +( αxδ

Combining Information to drive behaviour

Vt+1=Vt +( αxδ

32

Conclusions

• ACC codes a learning signal when information is observed.

• This signal predicts the speed of learning.

• Learning from our own and others’ actions are processed in parallel in ACCs and ACCg.

• The outputs of these parallel learning processes are combined in the reward system.

33

Acknowledgments

• Matthew Rushworth

• Mark Woolrich

• Laurence Hunt

• Mark Walton

33