Neural Encoding Models · 2011. 11. 25. · STA. Non-linearity. STA unbiased for spherical...

Neural Encoding Models

Maneesh Sahanimaneesh@gatsby.ucl.ac.uk

Gatsby Computational Neuroscience UnitUniversity College London

Term 1, Autumn 2011

Studying sensory systems

x(t) y(t)

Decoding: x(t)= G[y(t)] (reconstruction)Encoding: y(t)= F [x(t)] (systems identification)

General approach

Goal: Estimate p(spike|x,H) [or λ(t|x[0, t), H(t))] from data.

• Naive approach: measure p(spike, H|x) directly for every setting of x.

– too hard: too little data and too many potential inputs.

• Estimate some functional f (p) instead (e.g. mutual information)

• Select stimuli efficiently

• Fit models with smaller numbers of parameters

Spikes, or rate?

Most neurons communicate using action potentials — statistically de-scribed by a point process:

P(spike ∈ [t, t + dt)

)= λ(t|H(t), stimulus, network activity)dt

To fully model the response we need to identify λ. In general this de-pends on spike history H(t) and network activity. Three options:

• Ignore the history dependence, take network activity as source of“noise” (i.e. assume firing is inhomogeneous Poisson or Cox process,conditioned on the stimulus).

• Average multiple trials to estimate

λ(t, stimulus) = limN→∞

λ(t|Hn(t), stimulus, networkn)

the mean intensity (or PSTH), and try to fit this.

• Attempt to capture history and network effects in simple models.

Spike-triggered average

Decoding: mean of P (x | y = 1)

Encoding: predictive filter

Linear regression

y(t) =

x(t− τ )w(τ )dτ

W (ω) =X(ω)∗Y (ω)

|X(ω)|2

x1 x2 x3 . . . xT xT+1 . . .x1 x2 x3 . . . xT︸︷︷︸x1 x2 x3 . . . xT xT︸︷︷︸

x1 x2 x3 . . . xT+1

x2 x3 x4 . . . xT+1

wt...w3

yTyT+1

XW = Y

W = (XTX)︸︷︷︸ΣSS

−1 (XTY )︸︷︷︸STA

Linear models

So the (whitened) spike-triggered average gives the minimum-squared-error linear model.

Issues:

• overfitting and regularisation

– standard methods for regression

• negative predicted rates

– can model deviations from background

• real neurons aren’t linear

– models are still used extensively– interpretable suggestions of underlying sensitivity– may provide unbiased estimates of cascade filters (see later)

How good are linear predictions?

We would like an absolute measure of model performance.

Measured responses can never be predicted perfectly:

• The measurements themselves are noisy.

Models may fail to predict because:

• They are the wrong model.

• Their parameters are mis-estimated due to noise.

Estimating predictable power

Psignal

Pnoise

response︸︷︷︸r(n)

= signal + noise

P(r(n)) = Psignal + Pnoise

P(r(n)) = Psignal +1

NPnoise

⇒Psignal =

N − 1

(NP(r(n))− P(r(n))

)Pnoise = P(r(n))− Psignal

Signal power in A1 responses

0 0.5 1 1.5 2 2.5 3 3.5 4

Noise power (spikes2/bin)

2 /bin

0 50 100 150Number of recordings

Testing a model

For a perfect prediction⟨P(trial)− P(residual)

⟩= P(signal)

Thus, we can judge the performance of a model by the normalized pre-dictive power

P(trial)− P(residual)

P(signal)

Similar to coefficient of determination (r2), but the denominator is thepredictable variance.

Predictive performance

0 0.5 1 1.5 2 2.50

Training Error

−2 −1 0 1−2

−1.5

−0.5

1Cross−Validation Error

normalised STA predictive power

Extrapolating the model performance

Jackknifed estimates

0 50 100 150−0.5

Normalized noise power

Extrapolated linearity

0 50 100 150−0.5

−5 0 5 10 15 20 25 30

−0.2

[extrapolated range: (0.19,0.39); mean Jackknife estimate: 0.29]

Simulated (almost) linear data

0 50 100 1500

−5 0 5 10 15 20 25 300.5

[extrapolated range: (0.95,0.97); mean Jackknife estimate: 0.97]

Linear fits to non-linear functions

(Stimulus dependence does not always signal response adaptation)

Approximations are stimulus dependent

(Stimulus dependence does not always signal response adaptation)

Consequences

Local fitting can have counterintuitive consequences on the interpreta-tion of a “receptive field”.

“Independently distributed” stimuli

Knowing stimulus power at any set of points in analysis space providesnoinformation about stimulus power at any other point.

Spectrotemporal

Ripple:

Independence is a property of stimulus and analysis space.

Christianson, Sahani, and Linden (2008)

Nonlinearity & non-independence distort RF estimates

Stimulus may have higher-order correlations in other analysis spaces— interaction with nonlinearities can produce misleading “receptive fields.”

What about natural sounds?

Multiplicative RF

Time (ms)

−30 −25 −20 −15 −10 −5

7Multiplicative RF

Time (ms)−30 −25 −20 −15 −10 −5

Finch Song

Time (ms)−30 −25 −20 −15 −10 −5

7Finch Song

Time (ms)−30 −25 −20 −15 −10 −5

Usually not independent in any space — so STRFs may not be conser-vative estimates of receptive fields.

Beyond linearity

Linear models often fail to predict well. Alternatives?

•Wiener/Volterra functional expansions

– M-series– Linearised estimation– Kernel formulations

• LN (Wiener) cascades

– Spike-trigger covariance (STC) methods– “Maximimally informative” dimensions (MID) ⇔ ML nonparametric

LNP models– ML Parametric GLM models

• NL (Hammerstein) cascades

– Multilinear formulations

Non-linear models

The LNP (Wiener) cascade

Rectification addresses negative firing rates. Possible biophysical justi-fication.

LNP estimation – the Spike-triggered ensemble

Single linear filter

Non-linearity.

STA unbiased for spherical (elliptical) data.

Bussgang.

Non-spherical inputs.

Biases.

Multiple filters

Distribution changes along relevant directions (and, usually, along alllinear combinations of relevant directions).

Proxies for distribution:

•mean: STA (can only reveal a single direction)

• variance: STC

• binned (or kernel) KL: MID “maximally informative directions” (equiv-alent to ML in LNP model with binned nonlinearity)

Project out STA:

X = X − (Xksta)kTsta; Cprior =XTX

N;Cspike =

XTdiag(Y )X

Nspike

Choose directions with greatest change in variance:k- argmax‖v‖=1

vT(Cprior − Cspike)v

⇒ find eigenvectors of (Cprior − Cspike) with large (absolute) eigvals.

Reconstruct nonlinearity (may assume separability)

Biases

STC (obviously) requires that the nonlinearity alter variance.

If so, subspace is unbiased if distribution

• radially (elliptically) symmetric

• AND independent

⇒ Gaussian.

May be possible to correct by transformation, subsampling or weighting(latter two at cost of variance).

More LNP methods

• Non-parametric non-linearities: “Maximally informative dimensions”(MID)⇔ “non-parametric” maximum likelihood.

– Intuitively, extends the variance difference idea to arbitrary differ-ences between marginal and spike-conditioned stimulus distribu-tions.

kMID = argmaxk

KL[P (k · x)‖P (k · x|spike)]

– Measuring KL requires binning or smoothing—turns out to be equiv-alent to fitting a non-parametric nonlinearity by binning or smooth-ing.

– Difficult to use for high-dimensional LNP models.

• Parametric non-linearities: the “generalised linear model” (GLM).

Generalised linear models

LN models with specified nonlinearities and exponential family noise.

In general (for monotonic g):

y ∼ ExpFamily[µ(x)]; g(µ) = βx

For our purposes easier to write

y ∼ ExpFamily[f (βx)]

(Continuous time) point process likelihood with GLM-like dependence ofλ on covariates is approached in limit of bins→ 0 by either Poisson orBernoulli GLM.

Mark Berman and T. Rolf Turner (1992) Approximating Point Process Likelihoods with GLIM

Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(1):31-38.

Poisson distribution⇒ f = exp() is canonical (natural params = βx).

Canonical link functions give concave likelihoods⇒ unique maxima.

Generalises (for Poisson) to any f which is convex and log-concave:

log-likelihood = c− f (βx) + y log f (βx)

Includes:

• threshold-linear

• threshold-polynomial

• “soft-threshold” f (z) = α−1 log(1 + eαz).

ML parameters found by

• gradient ascent

• IRLS

Regularisation by L2 (quadratic) or L1 (absolute value – sparse) penal-ties (MAP with Gaussian/Laplacian priors) preserves concavity.

Linear-Nonlinear-Poisson (GLM)

stimulus filter pointnonlinearity

Poissonspiking

stimulus

GLM with history-dependence

• rate is a product of stim- and spike-history dependent terms

• output no longer a Poisson process

• also known as “soft-threshold” Integrate-and-Fire model

exponentialnonlinearity

+post-spike filter

stimulus filter

(Truccolo et al 04)

Poissonspiking

conditional intensity(spike rate)

stimulus

filter output

traditional IF

filter output

“hard threshold”

“soft-threshold” IF

GLM with history-dependence

+post-spike filter

stimulus filter

Poissonspiking

• “soft-threshold” approximation to Integrate-and-Fire model

stimulus

GLM dynamic behaviors

time after spike time (ms)

0 50 100 0 100 200 300 400 500

stimulus x(t)

post-spike waveform

stim-induced

spike-history

induced

regular spiking

stimulus x(t)

post-spike waveform

stim-filter output

spike-history

filter output

regular spiking

0 10 20

0 100 200 300 400 500

irregular spiking

stimulus x(t)

burstingpost-spike waveform

0 20 40

0 100 200 300 400 500-10

adaptation

Generalized Linear Model (GLM)

post-spike filter

probabilisticspiking

stimulus

stimulus filter

multi-neuron GLM

stimulus

neuron 1

neuron 2

post-spike filter

stimulus filter

multi-neuron GLM

couplingfilters

stimulus

neuron 1

neuron 2

post-spike filter

stimulus filter

conditional intensity(spike rate)

time t

GLM equivalent diagram:

Multilinear models

Input nonlinearities (Hammerstein cascades) can be identified in a mul-tilinear (cartesian tensor) framework.

Input nonlinearities

The basic linear model (for sounds):

r(i)︸︷︷︸predicted rate

=∑jk

wtfjk︸︷︷︸

STRF weights

s(i− j, k)︸︷︷︸stimulus power

How to measure s? (pressure, intensity, dB, thresholded, . . . )

We can learn an optimal representation g(.):

r(i) =∑jk

wtfjkg(s(i− j, k)).

Define: basis functions {gl} such that g(s) =∑lw

llgl(s)

and stimulus array Mijkl = gl(s(i− j, k)). Now the model is

r(i) =∑j

wtfjkw

llMijkl or r = (wtf ⊗ wl) •M.

Multilinear models

Multilinear forms are straightforward to optimise by alternating least squares.

Cost function:E =

∥∥∥r− (wtf ⊗ wl) •M∥∥∥2

Minimise iteratively, defining matrices

B = wl •M and A = wtf •M

and updating

wtf = (BTB)−1BTr and wl = (ATA)−1ATr.

Each linear regression step can be regularised by evidence optimisa-tion (suboptimal), with uncertainty propagated approximately using vari-ational methods.

Some input non-linearities

25 40 55 70

l (dB−SPL)

Parameter grouping

Separable STRF: (time) ⊗ (frequency). The input nonlinearity model isseparable in another sense: (time, frequency) ⊗ (sound level).

intensitytime

intensityw

Other separations:

• (time, sound level) ⊗ (frequency): r = (wtl ⊗ wf) • M,

• (frequency, sound level) ⊗ (time): r = (wfl ⊗ wt) • M,

• (time) ⊗ (frequency) ⊗ (sound level): r = (wl ⊗ wf ⊗ wl) • M.

(time, frequency) ⊗ (sound level)

t (ms)

180 120 60 0

25 40 55 70

l (dB−SPL)w

(time, sound level) ⊗ (frequency)

t (ms)

180 120 60 0

25 50 100

f (kHz)

(frequency, sound level) ⊗ (time)

f (kHz)

2 4 8 16 32

180 120 60 0

t (ms)

Contextual influences

stimulus

wtfPSTH

frequency

rate(i) = c +∑jkl

ll gl(soundi−j,k)(

c2 +∑mnp

wτmwφn w

λp gp(soundi−j−m,k+n)

Contextual influences

Introduce extra dimensions:

• τ : time difference between contextual and primary tone,

•φ: frequency difference between contextual and primary tone,

•λ: sound level of the contextual tone.

Model the effective sound level of a primary tone by

Level(i, j)→ Level(i, j) · (const + Context(i, j)) .

and the context by

Context(i, j) =∑m,n,p

wτmwφn w

λp hp(s(i−m, j + n))

This leads to a multilinear model

r = (wt ⊗ wf ⊗ wl ⊗ wτ ⊗ wφ ⊗ wλ) •M.

Inseparable contexts

We can also allow inseparable contexts (and principal fields), droppingthe level-nonlinearity to reduce parameters.

r(i) = c +∑jk

wtfjk soundi−j,k

wτφmn soundi−j−m,k+n)

stimulus

wtfPSTH

frequency

Performance

Separable Context

Inseparable Context

test set

training set

Neural Encoding Models · 2011. 11. 25. · STA. Non-linearity. STA unbiased for spherical...

Documents

Transcript of Neural Encoding Models · 2011. 11. 25. · STA. Non-linearity. STA unbiased for spherical...

Encoding Normal Vectors using Optimized Spherical Coordinatesgpetrova/normal_paper.pdf · 2012-03-14 · Encoding Normal Vectors using Optimized Spherical Coordinates J. Smith, G.

DIKASTIKOI STA YPOYRGEIA.pdf

Astro 321 Set 7:Spherical Collapse & Halo Modelbackground.uchicago.edu/~whu/Courses/Ast321_11/ast321_7.pdf · Set 7:Spherical Collapse & Halo Model Wayne Hu. Closed Universe Friedmann

4.1 Schr odinger Equation in Spherical Coordinates · After some manipulation, the equations for the ... The solutions to this equation are Bessel functions, speci cally the spherical

STA 6857---Difference Equations & Autocorrelation and ...

High-seniority states in spherical nuclei: Triple pair breaking in tin isotopes

Mathematica - Spherical Harmonics

11 Cylindrical and Spherical Coordinates - Handout

Spherical Microphone Arrays - University of Maryland Institute for

Einstein’s Equations - uni-tuebingen.dekokkotas/Teaching/Field_Theory... · For a non-spherical distribution the term1=j~x x~0jcan be expanded as 1 ... Dicke’s Experiment ...

Part III - Snvhomesnvhome.net/ee-braude/introduction2eo/figures/figures 2... · Spherical waveform coming from a real point source (after [1]). Figure 48. Gaussian-spherical wave

Lattices, sphere packings, spherical codes

Self-similar Spherical collapse - MPA · 2017. 10. 6. · 2.1 Spherical collapse in an Einstein de Sitter (EdS) universe We ﬁrst consider the simplest case that was studied in the

Σου Μοιάζουν Για Εξωγήινοι;;;; Κι Όμως!!! Είναι Άνθρωποι (PHOTOS) _ Sta Paraskin

Spherical distributions (contd.)

Unitary Spherical Dual, Split Groupspi.math.cornell.edu/~barbasch/schney2015.pdf · 3 Spherical Modules De nition 3.0.5. A(n admissible) representation (ˇ;X) is called spherical

Lens Design OPTI 517 Seidel aberration coefficients...Prof. Jose Sasian OPTI 517 Spherical aberration We have a spherical surface of radius of curvature r, a ray intersecting the surface

Preparation and characterization of spherical β ...

αϊκά sta ndar ds - SIOP E

15.7. Triple Integrals in Cylindrical and Spherical ...faculty.etsu.edu/gardnerr/2110/notes-12E/c15s7.pdf · 15.7 Triple Integrals in Cylindrical and Spherical Coordinates 1 Chapter