Least squares and Regression Techniquesstockage.univ-brest.fr/~herbette/...least-squares.pdf ·...

Least squares andRegression Techniques

Goodness of fits (and tests)Non linear least square techniques

Glover, D. M., W. J. Jenkins, S. C. Doney: ModelingMethods for Marine Science, Cambridge UniversityPress, Chapter 3

I. Basics Statistics used for Regression1. The Chi Squared χ2

How can we judge the goodness of a fit (outside from “eyeball”) ?

The “BEST” fit wants to reduce the “distance” between the collecteddata an the model.

If the distribution has a Gaussian nature, then the “chi-squared” χ2 provides a standard measure of this distance:

“estimation from our model” “real” (collected) data

sample index(ex: time index)

Uncertainty in the individualmeasurement y

i

I. Basics Statistics used for Regression1. The Chi Squared χ2

σi could :

● be the size of your smallest graduation on your measuring stick● be related to some fundamental physical limitation of your

measurement technique● depend on some internal statistics associated with the measurement

(ex: you may take as your measurement the “actual time average”over some given time)

I. Basics Statistics used for Regression2. The reduced Chi Squared χ2

The Root Mean Square (RMS) deviation normalized tomeasurement error should tend to one if things are working“correctly”. Here we define such a measure, as the reducedChi-Squared:

degrees of freedom

number ofcollected samples

number of parametersused by “your

regression fit” (your“model”)


Example 1:

You have N measurements

You “best estimate” is the mean:

Your “model” has therefore ONE parameter: n=1

And you have N-1 degrees of freedom (or N-1 independent variables.Knowing the mean and N-1 variables, you can deduce the Nth one)


Example 2:

You have N measurements

You “best estimate” is a linear fit (regression)

Your “model” has therefore TWO parameters: n=2

And you have N-2 degrees of freedom


If you are “doing a bad job” at collecting your measurements, or if your“model” is inappropriate, then your reduced chi-squared will havelarge values (much larger than 1)

If you have been too pessimistic about your measurement errors, thanyour chi-squared value will be very small (<0.1)

I. Basics Statistics used for Regression3. Look at the residuals

A good chi-squared” may not mean that you have a good fit (or “model”)

Always look at the “shape” of the residuals

Minimizing the chi-squared is thefoundation of all the least squares

regression techniques !

II. Least squares fitting a straight line Introduction

The most common data regression model (aside from the mean andstandard deviation) is the fit to a straight line.

We therefore define the following “mode”l:

This model is based on TWO parameters: a1 and a2

yi is the dependent variable

xi is the independent variable


We want to find the “BEST” estimates for the two parameters a1 and a2

TYPE I regression techniques: no uncertainty on the dependent variable x: σ

x~ 0

The “BEST” estimates for the two parameters are the ones thatminimize the chi-squared, i.e the VERTICAL distance betweenthe estimated y values and the measured y values

II. Least squares fitting a straight line 1. The normal equations

II. Least squares fitting a straight line 1. The normal equations

“normal”equations

Cramers'rule:

II. Least squares fitting a straight line 2. Uncertainties in the coefficients

Error onmeasurements


If there is no systematic error (uncorrelated noise)between TWO distinct measurements (taken at adifferent time or at a different location, i≠j), then: - the cross-terms cancel

sample meanof the error

square(”dispersion”)


Back to the Type I regression fitting to a straight line:

The amplitude of the error does NOT depend on the collected data (the yivalues)

The amplitude of the error depends: - on where (or at which time) you made the measurements (the xi values) - the uncertainties in the measurements (the σi values)

Maximizing Δ (the determinant of A) is a good thing !


Maximizing Δ is equivalent to maximizing the spreading of themeasurements in time or space (increasing the range of the x-values):the larger you spread the data, the lower the uncertainty on theintercept and the slope. We want to spread the cloud of data around thecentroid.

Increasing the number of measurements ?It makes Sxx (which is always positive) grow.Therefore the determinant Δ also grows


Largely spread data improves the uncertainty on the slope


Poorly spread data far away for the x=0 axis leads to largeuncertainty on the interceptor

II. Least squares fitting a straight line 3. Uncertainties in the estimated y-values

II. Least squares fitting a straight line 4. Type 2 regression

for (2 independent variables)

We have assumed so far that we know x infinitely well

What about if we also have uncertainties on both the yi values and thexi values ?

Should we perform a fit of y against x or x against y ? If you performboth on scattered data, you will get significant difference in thepredicted slopes

Minimizing the vertical distances between the y-data and the fit is nowINCORRECT

You should consider the “TRUE” distance, and minimize theperpendicular distance

II. Least squares fitting a straight line 4. Type 2 regression

for (2 independent variables)

For a straight line, it becomes:

III. General Least squares techniques

You can derive the normal equations for any set of basis functions. Basisfunctions can be thought as building blocks for describing your data.

The more complicated the functions, the more difficult it is to write the normalequations, and the more the risk that the solution to the normal equationsbecomes numerically ill-behaved.

Example: polynomial sharpness

III. General Least squares techniquesThe design matrix approach

Example:

Linear in the parameter spaceCan be fit with linear least squares

Problem: A is not a square matrixWhere are the weighted factors, the σi ?

Where are the weighted factors, the σi ?

III. General Least squares techniquesSolving design matrix approach with

SVD

The problem turns out in minimizing the square of the residuals:

This is exactly what singular decomposition does !

II. Least squares fitting a straight line 3. Uncertainties in the estimated y-values

Singular Value decomposition (SVD)

For any matrix A (N rows x M columns), there exists a triple product of

U: column orthonormal matrix (i.e any column vector is orthogonal to theothers and the sum of the squares of elements is ONE)of size NxM

V: orthonormal square of size M*M matrix

S: diagonal matrix of size M x M. The diagonal elements are called thesingular values. These values may be zeros if the matrix is rank deficient(i.e the rank is less than the shortest dimension of the matrix A)

W is a diagonal matrix defined from diagonal matrix S

with εw a small threshold value

How do you compute the uncertainties ?

Covariance Matrix ofuncertainties

USEFUL MATRIX ALGEBRA:

Identity 1:

Identity 2:

Demo:

k=1,...,N

Looking for an extremum, and anticipating that thisextremum is a minimum...

To have a solution, this matrix must be invertibleIf not, then SVD methods

Generalization of the linear regression

Residuals Sum of Squares (RSS):

2 methods:ATA is invertible ATA is not invertible

Covariance matrix of

Covariance matrix of

Estimatimator of σε

Student Distributionwithy n=N-K degrees offreedom

Least squares and Regression Techniquesstockage.univ-brest.fr/~herbette/...least-squares.pdf ·...

Documents

Transcript of Least squares and Regression Techniquesstockage.univ-brest.fr/~herbette/...least-squares.pdf ·...