Neural Encoding Models · 2011. 11. 25. · STA. Non-linearity. STA unbiased for spherical...
Embed Size (px)
Transcript of Neural Encoding Models · 2011. 11. 25. · STA. Non-linearity. STA unbiased for spherical...
-
Neural Encoding Models
Maneesh [email protected]
Gatsby Computational Neuroscience UnitUniversity College London
Term 1, Autumn 2011
-
Studying sensory systems
x(t) y(t)
Decoding: x̂(t)= G[y(t)] (reconstruction)Encoding: ŷ(t)= F [x(t)] (systems identification)
-
General approach
Goal: Estimate p(spike|x,H) [or λ(t|x[0, t), H(t))] from data.
• Naive approach: measure p(spike, H|x) directly for every setting of x.– too hard: too little data and too many potential inputs.
• Estimate some functional f (p) instead (e.g. mutual information)
• Select stimuli efficiently
• Fit models with smaller numbers of parameters
-
Spikes, or rate?
Most neurons communicate using action potentials — statistically de-scribed by a point process:
P(spike ∈ [t, t + dt)
)= λ(t|H(t), stimulus, network activity)dt
To fully model the response we need to identify λ. In general this de-pends on spike history H(t) and network activity. Three options:
• Ignore the history dependence, take network activity as source of“noise” (i.e. assume firing is inhomogeneous Poisson or Cox process,conditioned on the stimulus).
• Average multiple trials to estimate
λ(t, stimulus) = limN→∞
1
N
∑n
λ(t|Hn(t), stimulus, networkn)
the mean intensity (or PSTH), and try to fit this.
• Attempt to capture history and network effects in simple models.
-
Spike-triggered average
Decoding: mean of P (x | y = 1)Encoding: predictive filter
-
Linear regression
y(t) =
∫ T0
x(t− τ )w(τ )dτ
W (ω) =X(ω)∗Y (ω)
|X(ω)|2
x1 x2 x3 . . . xT xT+1 . . .x1 x2 x3 . . . xT︸ ︷︷ ︸x1 x2 x3 . . . xT xT︸ ︷︷ ︸
x1 x2 x3 . . . xT+1x2 x3 x4 . . . xT+1
...×
wt...w3w2w1
=
yTyT+1
...
XW = Y
W = (XTX)︸ ︷︷ ︸ΣSS
−1 (XTY )︸ ︷︷ ︸STA
-
Linear models
So the (whitened) spike-triggered average gives the minimum-squared-error linear model.
Issues:
• overfitting and regularisation– standard methods for regression
• negative predicted rates– can model deviations from background
• real neurons aren’t linear– models are still used extensively– interpretable suggestions of underlying sensitivity– may provide unbiased estimates of cascade filters (see later)
-
How good are linear predictions?
We would like an absolute measure of model performance.
Measured responses can never be predicted perfectly:
• The measurements themselves are noisy.
Models may fail to predict because:
• They are the wrong model.• Their parameters are mis-estimated due to noise.
-
Estimating predictable power
Psignal
Pnoise
response︸ ︷︷ ︸r(n)
= signal + noise
P(r(n)) = Psignal + Pnoise
P(r(n)) = Psignal +1
NPnoise
⇒P̂signal =
1
N − 1
(NP(r(n))− P(r(n))
)P̂noise = P(r(n))− P̂signal
-
Signal power in A1 responses
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.1
0.2
0.3
0.4
Noise power (spikes2/bin)
Sig
nal p
ower
(sp
ikes
2 /bi
n)
0 50 100 150Number of recordings
-
Testing a model
For a perfect prediction〈P(trial)− P(residual)
〉= P(signal)
Thus, we can judge the performance of a model by the normalized pre-dictive power
P(trial)− P(residual)P̂(signal)
Similar to coefficient of determination (r2), but the denominator is thepredictable variance.
-
Predictive performance
0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
norm
alis
ed B
ayes
pre
dict
ive
pow
er
Training Error
−2 −1 0 1−2
−1.5
−1
−0.5
0
0.5
1Cross−Validation Error
normalised STA predictive power
-
Extrapolating the model performance
-
Jackknifed estimates
0 50 100 150−0.5
0
0.5
1
1.5
2
2.5
3
Nor
mal
ized
line
arly
pre
dict
ive
pow
er
Normalized noise power
-
Extrapolated linearity
0 50 100 150−0.5
0
0.5
1
1.5
2
2.5
−5 0 5 10 15 20 25 30
−0.2
0
0.2
0.4
0.6
0.8
1
Normalized noise power
Nor
mal
ized
line
arly
pre
dict
ive
pow
er
[extrapolated range: (0.19,0.39); mean Jackknife estimate: 0.29]
-
Simulated (almost) linear data
0 50 100 1500
0.5
1
1.5
2
2.5
3
−5 0 5 10 15 20 25 300.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
Normalized noise power
Nor
mal
ized
line
arly
pre
dict
ive
pow
er
[extrapolated range: (0.95,0.97); mean Jackknife estimate: 0.97]
-
Linear fits to non-linear functions
-
Linear fits to non-linear functions
(Stimulus dependence does not always signal response adaptation)
-
Approximations are stimulus dependent
(Stimulus dependence does not always signal response adaptation)
-
Consequences
Local fitting can have counterintuitive consequences on the interpreta-tion of a “receptive field”.
-
“Independently distributed” stimuli
Knowing stimulus power at any set of points in analysis space providesnoinformation about stimulus power at any other point.
DRC:
Space
Spectrotemporal
Ripple:
Independence is a property of stimulus and analysis space.
Christianson, Sahani, and Linden (2008)
-
Nonlinearity & non-independence distort RF estimates
Stimulus may have higher-order correlations in other analysis spaces— interaction with nonlinearities can produce misleading “receptive fields.”
Christianson, Sahani, and Linden (2008)
-
What about natural sounds?
Multiplicative RF
Time (ms)
Fre
q. (
kHz)
−30 −25 −20 −15 −10 −5
1
2
3
4
5
6
7Multiplicative RF
Fre
q. (
kHz)
Time (ms)−30 −25 −20 −15 −10 −5
1
2
3
4
5
6
7
Finch Song
Fre
q. (
kHz)
Time (ms)−30 −25 −20 −15 −10 −5
1
2
3
4
5
6
7Finch Song
Fre
q. (
kHz)
Time (ms)−30 −25 −20 −15 −10 −5
1
2
3
4
5
6
7
Usually not independent in any space — so STRFs may not be conser-vative estimates of receptive fields.
Christianson, Sahani, and Linden (2008)
-
Beyond linearity
-
Beyond linearity
Linear models often fail to predict well. Alternatives?
•Wiener/Volterra functional expansions– M-series– Linearised estimation– Kernel formulations
• LN (Wiener) cascades– Spike-trigger covariance (STC) methods– “Maximimally informative” dimensions (MID) ⇔ ML nonparametric
LNP models– ML Parametric GLM models
• NL (Hammerstein) cascades– Multilinear formulations
-
Non-linear models
The LNP (Wiener) cascade
k n
Rectification addresses negative firing rates. Possible biophysical justi-fication.
-
LNP estimation – the Spike-triggered ensemble
-
Single linear filter
STA.
Non-linearity.
STA unbiased for spherical (elliptical) data.
Bussgang.
Non-spherical inputs.
Biases.
-
Multiple filters
Distribution changes along relevant directions (and, usually, along alllinear combinations of relevant directions).
Proxies for distribution:
•mean: STA (can only reveal a single direction)• variance: STC• binned (or kernel) KL: MID “maximally informative directions” (equiv-
alent to ML in LNP model with binned nonlinearity)
-
STC
Project out STA:
X̃ = X − (Xksta)kTsta; Cprior =X̃TX̃
N;Cspike =
X̃Tdiag(Y )X̃Nspike
Choose directions with greatest change in variance:k- argmax‖v‖=1
vT(Cprior − Cspike)v
⇒ find eigenvectors of (Cprior − Cspike) with large (absolute) eigvals.
-
STC
Reconstruct nonlinearity (may assume separability)
-
Biases
STC (obviously) requires that the nonlinearity alter variance.
If so, subspace is unbiased if distribution
• radially (elliptically) symmetric• AND independent
⇒ Gaussian.
May be possible to correct by transformation, subsampling or weighting(latter two at cost of variance).
-
More LNP methods
• Non-parametric non-linearities: “Maximally informative dimensions”(MID)⇔ “non-parametric” maximum likelihood.– Intuitively, extends the variance difference idea to arbitrary differ-
ences between marginal and spike-conditioned stimulus distribu-tions.
kMID = argmaxk
KL[P (k · x)‖P (k · x|spike)]
– Measuring KL requires binning or smoothing—turns out to be equiv-alent to fitting a non-parametric nonlinearity by binning or smooth-ing.
– Difficult to use for high-dimensional LNP models.
• Parametric non-linearities: the “generalised linear model” (GLM).
-
Generalised linear models
LN models with specified nonlinearities and exponential family noise.
In general (for monotonic g):
y ∼ ExpFamily[µ(x)]; g(µ) = βx
For our purposes easier to write
y ∼ ExpFamily[f (βx)]
(Continuous time) point process likelihood with GLM-like dependence ofλ on covariates is approached in limit of bins→ 0 by either Poisson orBernoulli GLM.
Mark Berman and T. Rolf Turner (1992) Approximating Point Process Likelihoods with GLIM
Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(1):31-38.
-
Generalised linear models
Poisson distribution⇒ f = exp() is canonical (natural params = βx).Canonical link functions give concave likelihoods⇒ unique maxima.
Generalises (for Poisson) to any f which is convex and log-concave:
log-likelihood = c− f (βx) + y log f (βx)
Includes:
• threshold-linear• threshold-polynomial• “soft-threshold” f (z) = α−1 log(1 + eαz).
-
Generalised linear models
ML parameters found by
• gradient ascent• IRLS
Regularisation by L2 (quadratic) or L1 (absolute value – sparse) penal-ties (MAP with Gaussian/Laplacian priors) preserves concavity.
-
Linear-Nonlinear-Poisson (GLM)
stimulus filter pointnonlinearity
Poissonspiking
stimulus
k(t)
-
GLM with history-dependence
• rate is a product of stim- and spike-history dependent terms
• output no longer a Poisson process
• also known as “soft-threshold” Integrate-and-Fire model
exponentialnonlinearity
+post-spike filter
h
(t)
stimulus filter
(Truccolo et al 04)
k
Poissonspiking
conditional intensity(spike rate)
stimulus
-
filter output
traditional IF
filter output
“hard threshold”
“soft-threshold” IF
spik
e ra
te
GLM with history-dependence
exponentialnonlinearity
+post-spike filter
h
(t)
stimulus filter
k
Poissonspiking
• “soft-threshold” approximation to Integrate-and-Fire model
stimulus
-
GLM dynamic behaviors
time after spike time (ms)
0 50 100 0 100 200 300 400 500
stimulus x(t)
post-spike waveform
0 0
stim-induced
spike-history
induced
regular spiking
-
GLM dynamic behaviors
stimulus x(t)
post-spike waveform
0 0
stim-filter output
spike-history
filter output
regular spiking
0 10 20
0
0 100 200 300 400 500
0
time after spike time (ms)
irregular spiking
-
GLM dynamic behaviors
stimulus x(t)
00
burstingpost-spike waveform
time after spike time (ms)
0 20 40
0
0 100 200 300 400 500-10
0
adaptation
-
Generalized Linear Model (GLM)
post-spike filter
exponentialnonlinearity
probabilisticspiking
stimulus
stimulus filter
+
-
multi-neuron GLM
exponentialnonlinearity
probabilisticspiking
stimulus
neuron 1
neuron 2
post-spike filter
stimulus filter
+
+
-
multi-neuron GLM
exponentialnonlinearity
probabilisticspiking
couplingfilters
stimulus
neuron 1
neuron 2
post-spike filter
stimulus filter
+
+
-
conditional intensity(spike rate)
...
time t
GLM equivalent diagram:
-
Multilinear models
Input nonlinearities (Hammerstein cascades) can be identified in a mul-tilinear (cartesian tensor) framework.
-
Input nonlinearities
The basic linear model (for sounds):
r̂(i)︸︷︷︸predicted rate
=∑jk
wtfjk︸︷︷︸STRF weights
s(i− j, k)︸ ︷︷ ︸stimulus power
,
How to measure s? (pressure, intensity, dB, thresholded, . . . )
We can learn an optimal representation g(.):
r̂(i) =∑jk
wtfjkg(s(i− j, k)).
Define: basis functions {gl} such that g(s) =∑lw
llgl(s)
and stimulus array Mijkl = gl(s(i− j, k)). Now the model is
r̂(i) =∑j
wtfjkwllMijkl or r̂ = (w
tf ⊗ wl) •M.
-
Multilinear models
Multilinear forms are straightforward to optimise by alternating least squares.
Cost function:E =
∥∥∥r− (wtf ⊗ wl) •M∥∥∥2Minimise iteratively, defining matrices
B = wl •M and A = wtf •M
and updating
wtf = (BTB)−1BTr and wl = (ATA)−1ATr.
Each linear regression step can be regularised by evidence optimisa-tion (suboptimal), with uncertainty propagated approximately using vari-ational methods.
-
Some input non-linearities
25 40 55 70
0
l (dB−SPL)
wl
-
Parameter grouping
Separable STRF: (time) ⊗ (frequency). The input nonlinearity model isseparable in another sense: (time, frequency) ⊗ (sound level).
intensitytime
freq
uenc
y
time
freq
uenc
y
intensityw
eigh
t
Other separations:
• (time, sound level) ⊗ (frequency): r̂ = (wtl ⊗ wf) • M,• (frequency, sound level) ⊗ (time): r̂ = (wfl ⊗ wt) • M,• (time) ⊗ (frequency) ⊗ (sound level): r̂ = (wl ⊗ wf ⊗ wl) • M.
-
(time, frequency) ⊗ (sound level)
t (ms)
f (kH
z)
180 120 60 0
32
16
8
4
2
0
25 40 55 70
0
l (dB−SPL)w
l
-
(time, sound level) ⊗ (frequency)
t (ms)
l (dB
−S
PL)
180 120 60 0
70
55
40
25
0
25 50 100
0
f (kHz)
wf
-
(frequency, sound level) ⊗ (time)
f (kHz)
l (dB
−S
PL)
2 4 8 16 32
70
55
40
25
0
180 120 60 0
0
t (ms)
wt
-
Contextual influences
stimulus
wτφ
wtfPSTH
time
frequ
ency
rate(i) = c +∑jkl
wtjwfkw
ll gl(soundi−j,k)(
c2 +∑mnp
wτmwφn w
λp gp(soundi−j−m,k+n)
)
-
Contextual influences
Introduce extra dimensions:
• τ : time difference between contextual and primary tone,•φ: frequency difference between contextual and primary tone,•λ: sound level of the contextual tone.
Model the effective sound level of a primary tone by
Level(i, j)→ Level(i, j) · (const + Context(i, j)) .
and the context by
Context(i, j) =∑m,n,p
wτmwφn w
λp hp(s(i−m, j + n))
This leads to a multilinear model
r̂ = (wt ⊗ wf ⊗ wl ⊗ wτ ⊗ wφ ⊗ wλ) •M.
-
Inseparable contexts
We can also allow inseparable contexts (and principal fields), droppingthe level-nonlinearity to reduce parameters.
r(i) = c +∑jk
wtfjk soundi−j,k(
1 +∑mn
wτφmn soundi−j−m,k+n)
stimulus
wτφ
wtfPSTH
time
frequ
ency
-
Performance
0.1
0.5
0.9
Ext
rapo
late
d no
rmal
ised
pred
ictiv
e po
wer
STRF
Separable Context
Inseparable Context
test set
training set
Studying sensory systemsGeneral approachSpikes, or rate?Spike-triggered averageLinear regressionLinear modelsHow good are linear predictions?Estimating predictable powerSignal power in A1 responsesTesting a modelPredictive performanceExtrapolating the model performanceJackknifed estimatesExtrapolated linearitySimulated (almost) linear dataLinear fits to non-linear functionsLinear fits to non-linear functionsApproximations are stimulus dependentConsequences``Independently distributed'' stimuliNonlinearity & non-independence distort RF estimatesWhat about natural sounds?Beyond linearityBeyond linearityNon-linear modelsLNP estimation – the Spike-triggered ensembleSingle linear filterMultiple filtersSTCBiasesMore LNP methodsGeneralised linear modelsMultilinear modelsInput nonlinearitiesMultilinear modelsSome input non-linearitiesParameter grouping(time, frequency) (sound level)(time, sound level) (frequency)(frequency, sound level) (time)Contextual influencesContextual influencesInseparable contextsPerformance