deep imaging Lecture 5: A gentle introduction to optimizationLecture 5: A gentle introduction to...

Machine Learning and Imaging – Roarke Horstmeyer (2019)

deep imaging

Machine Learning and Imaging

BME 590LRoarke Horstmeyer

Lecture 5: A gentle introduction to optimization

deep imaging

Real WorldMeasurement device

Digitization Machine Learning

ML+Imaging pipeline

γ -> e-

LastClassThisClass

deep imaging

Discrete convolution Discrete 2D convolution

deep imaging

Convolutions as a big matrix multiplication

u1u2u3

deep imaging

Last thing – matrix and vector derivatives

• When confused, write out one entry, solve derivative and generalize

• Use dimensionality to help (if x has N elements, and y has M, then dy/dx must be NxM• Take advantage of The Matrix Cookbook:

• https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

deep imaging

• When confused, write out one entry, solve derivative and generalize

• Use dimensionality to help (if x has N elements, and y has M, then dy/dx must be NxM• Take advantage of The Matrix Cookbook:

• https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

deep imaging

Mathematical Optimization: "Selection of a best element (with regard to some criterion) from a set of available alternatives"

3 elements:

1) Your desired output (a better image, a clean signal, a classification of "cat" or "dog", etc.)

2) A model of what you are looking for - how you form the desired output from your measured data

3) A cost function, to measure how close you're getting to the answer (the cost function minimum)

deep imagingGeneralized optimization pipeline

Input: measurementsvec[•]

N x N pixels

Input space dimension

deep imagingGeneralized optimization pipeline

N x N pixels

ModelOutput

Output space dimension

Costfunction

Performance measure

1 number!

Howwelldidwedo?

deep imagingMachine learning: update model to decrease error

N x N pixels

ModelOutput

Costfunction

Performance measure

1 number!

Howwelldidwedo?

deep imaging

De-noising: "What is the closest image to what I detected, except without so many fluctuations"?

deep imaging

Input dimension: N x N image

Output dimension: N x N image

deep imaging

Pixel 2

Cost function: "Don't let nearby pixels vary around too much"

(Only showing 2 of the N2 dimensions)

deep imaging

Cost function

Pixel 2

deep imagingStart with a guess

Noisy image

Pixelvalue1Pixelvalue2

Cost function

Pixel 2

deep imaging

Noisy image

Pixel 2

"Descend" to minimize cost function(tweak values of each pixel)

Note: This part computers are really good at!

deep imaging

Noisy image

Pixel 2

Note: This part computers are really good at!

deep imaging

Noisy image

Pixel 2

Note: This part computers are really good at!Desired output (x1, x2,…)

Desired outputx1x2…

deep imagingOptimization pipeline for denoising

N x N pixels

N2Input space dimension

ModelOutput

N2Output space

dimension

Costfunction

Mean-squared error

“Neighboring pixels shouldn’t vary too rapidly”

Performance measure

deep imaging

"Cat" or "dog" cost function

Image Classification: "Is the image of a dog or a cat"?

Input image

deep imaging

Start with a guess: 50% dog, 50% cat

Input image

deep imaging

Desired output:70% "cat" and 30% "dog"(So google will guess it is likely a cat)

Input image

deep imagingOptimization pipeline for classification

N x N pixels

N2Input space dimension

ModelOutput

2Output space

dimension

Costfunction

Mean-squared error

“Neighboring pixels shouldn’t vary too rapidly”

Performance measure

deep imagingA simple example: spectral unmixing

(For whatever reason, whenever I get confused about optimization, I think about this example...it’s a good one)

The setup:

- measure the color (spectral) response of a sample (e.g., how much red, green and blue there is, or several hundred measurements of its different colors).

- You know that the sample can only contain 9 different fluorophores.

- What % of each fluorophores is in your sample?

spectrometer

sample

wavelength

spectrometer

sample

wavelength

3 elements of optimization:

1) Desired output

2) The model

3) The cost function

spectrometer

sample

wavelength

1) Desired output

2) The model

What % of each of the 9 fluorophores

spectrometer

sample

wavelength

1) Desired output

2) The model

"Dictionary" of the 9 different spectra

spectrometer

sample

wavelength

1) Desired output

2) The model

"Dictionary" of the 9 different spectra

Minimum mean squared error (to start)

deep imagingOptimization pipeline for spectral unmixing

ModelOutput

Costfunction

Error between measurement and modeled mixture

“Mix spectra of known fluorophores to simulation

my measurement”

wavelength

deep imagingMathematical model for spectral unmixing

spectrometer

Pacific Blue

wavelength

a) First make the "dictionary":

spectrometer

Pacific Blue

wavelength

a) First make the "dictionary":Dictionary matrix A

Put "Pacific blue" spectral intensities in first column of matrix A

spectrometer

wavelength

a) First make the "dictionary":Dictionary matrix A

Put "GFP" spectral intensities in 2nd

column

spectrometer

b) Model the unknown sample %'sDictionary matrix A

Unknown sample y

9possible spectra

Somemixture…

(the desired output)

+ + +…

spectrometer

b) Model the unknown sample %'sDictionary matrix A

Unknown sample y

9possible spectra

Unknown sample %'s y

9weightsy(1)-y(9)

Eachweightinxispercentage:

(the desired output)

y(1) + y(2) + y(3) +…

spectrometer

Dictionary matrix A

Unknown sample y

9possible spectra

Unknown sample %'s y

9weightsy(1)-y(9)

y(1) + y(2) + y(3) +…

Eachweightinxispercentage:

wavelength

New measurement x

b) Model the unknown sample %'s

spectrometer

Dictionary matrix A

Unknown sample y

9weights

Matrix equation: x=Ay

Goal: Given A and x, find y

This is your model!

spectrometer

Dictionary matrix A

Unknown sample y

9weights

Data in A can be thought of , in some sense, as “training data”

deep imaging

Cost function for spectral unmixing

x = Ay won't always be true, due to noise (actually, x = Ay+n)

Common cost function is minimum mean-squared error:

spectral measurements

Cost function f(y) = Σ ( x – Ay)2

deep imagingOptimization pipeline for denoising

ModelOutput

Costfunction

0.47 x = Ay

wavelength

f(y) = Σ (x – Ay)2

Forward:

deep imaging

x = Ay won't always be true, due to noise (actually, x = Ay+n)

Common cost function is minimum mean-squared error:

Cost function f(y) = Σ ( x – Ay)2

Find mixture y of known spectra A that is as close as possible to measurement x

y(2)f(y)y* = minimize f(y)

deep imaging

f(y) = Σ (x – Ay)2

y(2)f(y)

One minimizer y*

f(y) is convex, so finding y* is easy via its gradient:

Convexfunction

Downthehilltothevalley!

deep imaging

f(y) = Σ (x – Ay)2

y(2)f(y)

One minimizer y*

Convexfunction

d/dy f(y) = d/dy Σ (x – Ay)2

deep imaging

f(y) = Σ (x – Ay)2

y(2)f(y)

One minimizer y*

Convexfunction

df/dy = Σ d/dy ( x – Ay)2

deep imaging

f(y) = Σ (x – Ay)2

y(2)f(y)

One minimizer y*

Convexfunction

df/dy[j] = Σ 2 a(:,j) * ( x – Ay)

deep imaging

f(y) = Σ (x – Ay)2

y(2)f(y)

One minimizer y*

Convexfunction

df/dy[j] = Σ 2 a(:,j) * ( x – Ay)df/dy = 2 AT( x – Ay*)

deep imaging

f(y) = Σ (x – Ay)2

y(2)f(y)

One minimizer y*

Convexfunction

Method 1: Gradient descent – follow gradient downhill to solution y*

deep imaging

df/dy = AT( x – Ay*) = 0AT x = ATAy*

(ATA)-1 AT x = y* "Moore-Penrose Pseudo-inverse"

y* is where gradient of f(y) is zero

Method 2: Direct solution – set derivative to 0 to find y* directly

(Note: setting gradient to 0 and solving is hard to do for non-linear problems…)

f(y) = Σ (x – Ay)2

y(2)f(y)

One minimizer y*

Convexfunction

deep imaging

Example unmixing with the pseudo-inverse

y* = (ATA)-1 AT xMoore-Penrose Pseudo-inverse:

ExampledictionaryA9 spectra

Exampley,computeAy

Exampledetectedspectrax

Computepseudo-inverse,x*=Ay*isredcurve:

Goodfit!

deep imaging

ExampledictionaryA9 spectra

Exampley,computeAy

Exampledetectedspectrax

Computepseudo-inverse,x*=Ay*isredcurve:

Goodfit!

PROBLEM:

y*=[0.2,-1.1,-1.6,…]

Solutionhasnegativeweights!

Notphysicallypossible…

deep imaging

Pseudo-inverse=one line

deep imaging

Spectral un-mixing with a positivity constraint

Minimize f(y) = Σ ( x – Ay)2

Subject to y >= 0

Convex cost function

Convex constraint

*When you have constraints, can use CVX, convex toolbox for Matlab

Option 1: Add a constraint

http://cvxr.com/cvx/

deep imaging

Minimize f(y) = Σ (x – Ay)2

Subject to y >= 0

Convex constraint

0.3612 0.2238 0.0006 0.0000 0.7336 0.0000 0.0000 0.0000 0.0000

WithCVX

deep imaging

Minimize f(y) = Σ (x – Ay)2

Subject to y >= 0

Convex constraint

0.3612 0.2238 0.0006 0.0000 0.7336 0.0000 0.0000 0.0000 0.0000

WithCVXWithPseudo-inv. 0.6683 -1.5880 5.7848 -11.7459 17.9304 -20.1231 18.3572 -10.7984 2.8557

deep imaging

Minimize f(z) = Σ (x – Az2)2

z2 = y is dummy variable, will change cost function and gradient

Option 2: Modify cost function

*When you don't have constraints but can find the gradient, use Minfunchttps://www.cs.ubc.ca/~schmidtm/Software/minFunc.html

deep imaging

*When you don't have constraints but can find the gradient, use Minfunchttps://www.cs.ubc.ca/~schmidtm/Software/minFunc.html

Pseudo-inverse Dummyvariablew/minfunc

Notworkingtoowell,gradientcouldbewrong?

0.5013 0.3345 0.1811 0.0367 0.0132 0.0705 0.2626 0.5080 0.7539

deep imaging

Other bells and whistles

Minimize f(x) = Σ (x – Ay)2 +C*Σ (y)2

1) Sometimes see solutions where x values get really big

Fix this with a "regularizer": "Don't letxvarytoomuch"

ChooseconstantCappropriately

2) If you think your signal is "sparse", then it probably has mostly zeros. Can include this in your model with an "L1" cost function:

Minimize f(y) = Σ | x – Ay |

- An extremely simple modification with pretty strong implications

deep imaging Lecture 5: A gentle introduction to optimizationLecture 5: A gentle introduction to...

Documents

Transcript of deep imaging Lecture 5: A gentle introduction to optimizationLecture 5: A gentle introduction to...

5 Deep Learning Multi-layered feedforward neural networks ... · 5 Deep Learning •Some Topics in Deep Learning: ∗Learning algorithms: ~Back propagation, Stochastic Gradient Descent

On the Fundamental Stability of Deep Networks · ON THE STABILITY OF DEEP NETWORKS RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference

Quantile Regression: A Gentle Introductionroger/courses/NIPE/lectures/L1.pdf · Quantile Regression: A Gentle Introduction Roger Koenker University of Illinois, Urbana-Champaign University

DVCS & DVCS & Generalized Parton Distributions. Compton Scattering “DVCS” (Deep Virtual Compton Scattering) “DVCS” (Deep Virtual Compton Scattering)

Basics of opical imaging (NON IMAGING OPTICS)

Molecular Imaging Techniques

Niobe Deep

Do Deep Neural Networks Suffer from Crowding?papers.nips.cc/paper/...deep-neural-networks-suffer-from-crowding.pdf · Do Deep Neural Networks Suffer from Crowding? Anna Volokitiny\

DEEP-Theory Meeting 30 October 2017physics.ucsc.edu/~joel/DEEP-Theory_Slides/DEEP... · density rather than by location within the cosmic web — we are ﬁnishing the paper led by

Radioisotope imaging equipment

Provable approximation properties for deep neural networks

GFS Deep and Shallow Cumulus Convection Schemes

ON LINE O IMAGING - E

ΤΕΥΧΟΣ 01, ΦΘΙΝΟΠΩΡΟ 2010, ΔΙΑΝΕΜΕΤΑΙ ΔΩΡΕΑΝ …natureland.gr/gr/pdf/mag01.pdfΑπό τον Wilhelm Reich στην Eva Reich Gentle Bio-Energetics: μια

A Gentle Introduction to Stata - Oregon State Universitypeople.oregonstate.edu/~acock/hdfs361/stata1_3.pdf · A Gentle Introduction to Stata Alan Acock Oregon State University ...

Functional Imaging Techniques

IMAGING SPECTROMETRY AND VEGETATION SCIENCE

Biomedical Imaging Hyperspectral Imaging

General Imaging Fourier Transform Imaging Photometer

Infrared imaging