Least squares and Regression Techniquesstockage.univ-brest.fr/~herbette/...least-squares.pdf ·...

of 55/55
Least squares and Regression Techniques Goodness of fits (and tests) Non linear least square techniques Glover, D. M., W. J. Jenkins, S. C. Doney: Modeling Methods for Marine Science, Cambridge University Press, Chapter 3
  • date post

    25-Sep-2020
  • Category

    Documents

  • view

    1
  • download

    1

Embed Size (px)

Transcript of Least squares and Regression Techniquesstockage.univ-brest.fr/~herbette/...least-squares.pdf ·...

  • Least squares andRegression Techniques

    Goodness of fits (and tests)Non linear least square techniques

    Glover, D. M., W. J. Jenkins, S. C. Doney: ModelingMethods for Marine Science, Cambridge UniversityPress, Chapter 3

  • I. Basics Statistics used for Regression1. The Chi Squared χ2

    How can we judge the goodness of a fit (outside from “eyeball”) ?

    The “BEST” fit wants to reduce the “distance” between the collecteddata an the model.

    If the distribution has a Gaussian nature, then the “chi-squared” χ2 provides a standard measure of this distance:

    “estimation from our model” “real” (collected) data

    sample index(ex: time index)

    Uncertainty in the individualmeasurement y

    i

  • I. Basics Statistics used for Regression1. The Chi Squared χ2

    σi could :

    ● be the size of your smallest graduation on your measuring stick● be related to some fundamental physical limitation of your

    measurement technique● depend on some internal statistics associated with the measurement

    (ex: you may take as your measurement the “actual time average”over some given time)

  • I. Basics Statistics used for Regression2. The reduced Chi Squared χ2

    The Root Mean Square (RMS) deviation normalized tomeasurement error should tend to one if things are working“correctly”. Here we define such a measure, as the reducedChi-Squared:

    degrees of freedom

    number ofcollected samples

    number of parametersused by “your

    regression fit” (your“model”)

  • I. Basics Statistics used for Regression2. The reduced Chi Squared χ2

    Example 1:

    You have N measurements

    You “best estimate” is the mean:

    Your “model” has therefore ONE parameter: n=1

    And you have N-1 degrees of freedom (or N-1 independent variables.Knowing the mean and N-1 variables, you can deduce the Nth one)

  • I. Basics Statistics used for Regression2. The reduced Chi Squared χ2

    Example 2:

    You have N measurements

    You “best estimate” is a linear fit (regression)

    Your “model” has therefore TWO parameters: n=2

    And you have N-2 degrees of freedom

  • I. Basics Statistics used for Regression2. The reduced Chi Squared χ2

    If you are “doing a bad job” at collecting your measurements, or if your“model” is inappropriate, then your reduced chi-squared will havelarge values (much larger than 1)

    If you have been too pessimistic about your measurement errors, thanyour chi-squared value will be very small (

  • I. Basics Statistics used for Regression3. Look at the residuals

    A good chi-squared” may not mean that you have a good fit (or “model”)

    Always look at the “shape” of the residuals

  • Minimizing the chi-squared is thefoundation of all the least squares

    regression techniques !

  • II. Least squares fitting a straight line Introduction

    The most common data regression model (aside from the mean andstandard deviation) is the fit to a straight line.

    We therefore define the following “mode”l:

    This model is based on TWO parameters: a1 and a2

    yi is the dependent variable

    xi is the independent variable

  • II. Least squares fitting a straight line Introduction

    We want to find the “BEST” estimates for the two parameters a1 and a2

    TYPE I regression techniques: no uncertainty on the dependent variable x: σ

    x~ 0

    The “BEST” estimates for the two parameters are the ones thatminimize the chi-squared, i.e the VERTICAL distance betweenthe estimated y values and the measured y values

  • II. Least squares fitting a straight line Introduction

  • II. Least squares fitting a straight line 1. The normal equations

  • II. Least squares fitting a straight line 1. The normal equations

    “normal”equations

  • Cramers'rule:

  • II. Least squares fitting a straight line 2. Uncertainties in the coefficients

    Error onmeasurements

  • II. Least squares fitting a straight line 2. Uncertainties in the coefficients

    If there is no systematic error (uncorrelated noise)between TWO distinct measurements (taken at adifferent time or at a different location, i≠j), then: - the cross-terms cancel

    sample meanof the error

    square(”dispersion”)

  • II. Least squares fitting a straight line 2. Uncertainties in the coefficients

    Back to the Type I regression fitting to a straight line:

    The amplitude of the error does NOT depend on the collected data (the yivalues)

    The amplitude of the error depends: - on where (or at which time) you made the measurements (the xi values) - the uncertainties in the measurements (the σi values)

    Maximizing Δ (the determinant of A) is a good thing !

  • II. Least squares fitting a straight line 2. Uncertainties in the coefficients

    Maximizing Δ is equivalent to maximizing the spreading of themeasurements in time or space (increasing the range of the x-values):the larger you spread the data, the lower the uncertainty on theintercept and the slope. We want to spread the cloud of data around thecentroid.

    Increasing the number of measurements ?It makes Sxx (which is always positive) grow.Therefore the determinant Δ also grows

  • II. Least squares fitting a straight line 2. Uncertainties in the coefficients

    Largely spread data improves the uncertainty on the slope

  • II. Least squares fitting a straight line 2. Uncertainties in the coefficients

    Poorly spread data far away for the x=0 axis leads to largeuncertainty on the interceptor

  • II. Least squares fitting a straight line 3. Uncertainties in the estimated y-values

  • II. Least squares fitting a straight line 4. Type 2 regression

    for (2 independent variables)

    We have assumed so far that we know x infinitely well

    What about if we also have uncertainties on both the yi values and thexi values ?

    Should we perform a fit of y against x or x against y ? If you performboth on scattered data, you will get significant difference in thepredicted slopes

    Minimizing the vertical distances between the y-data and the fit is nowINCORRECT

    You should consider the “TRUE” distance, and minimize theperpendicular distance

  • II. Least squares fitting a straight line 4. Type 2 regression

    for (2 independent variables)

    For a straight line, it becomes:

  • III. General Least squares techniques

    You can derive the normal equations for any set of basis functions. Basisfunctions can be thought as building blocks for describing your data.

    The more complicated the functions, the more difficult it is to write the normalequations, and the more the risk that the solution to the normal equationsbecomes numerically ill-behaved.

    Example: polynomial sharpness

  • III. General Least squares techniquesThe design matrix approach

    Example:

    Linear in the parameter spaceCan be fit with linear least squares

    Problem: A is not a square matrixWhere are the weighted factors, the σi ?

  • Where are the weighted factors, the σi ?

  • III. General Least squares techniquesSolving design matrix approach with

    SVD

    The problem turns out in minimizing the square of the residuals:

    This is exactly what singular decomposition does !

  • II. Least squares fitting a straight line 3. Uncertainties in the estimated y-values

  • Singular Value decomposition (SVD)For any matrix A (N rows x M columns), there exists a triple product of

    U: column orthonormal matrix (i.e any column vector is orthogonal to theothers and the sum of the squares of elements is ONE)of size NxM

    V: orthonormal square of size M*M matrix

    S: diagonal matrix of size M x M. The diagonal elements are called thesingular values. These values may be zeros if the matrix is rank deficient(i.e the rank is less than the shortest dimension of the matrix A)

  • W is a diagonal matrix defined from diagonal matrix S

    with εw a small threshold value

  • W is a diagonal matrix defined from diagonal matrix S

    with εw a small threshold value

  • How do you compute the uncertainties ?

  • Covariance Matrix ofuncertainties

  • USEFUL MATRIX ALGEBRA:

    Identity 1:

    Identity 2:

    Demo:

    k=1,...,N

  • Looking for an extremum, and anticipating that thisextremum is a minimum...

    To have a solution, this matrix must be invertibleIf not, then SVD methods

    Generalization of the linear regression

    Residuals Sum of Squares (RSS):

  • 2 methods:ATA is invertible ATA is not invertible

    Covariance matrix of

  • Covariance matrix of

  • Estimatimator of σε

  • Student Distributionwithy n=N-K degrees offreedom

  • Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56