Machine Learning Seminar: Support Vector Regression

Click here to load reader

  • date post

    03-Jan-2016
  • Category

    Documents

  • view

    28
  • download

    8

Embed Size (px)

description

Machine Learning Seminar: Support Vector Regression. Presented by: Heng Ji 10/08/03. Outline. Regression Background Linear ε - Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel Formulation Quadratic ε - Insensitive Loss Algorithm - PowerPoint PPT Presentation

Transcript of Machine Learning Seminar: Support Vector Regression

  • Machine Learning Seminar:Support Vector Regression

    Presented by: Heng Ji10/08/03

  • OutlineRegression BackgroundLinear - Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel FormulationQuadratic - Insensitive Loss AlgorithmKernel Ridge Regression & Gaussian Process

  • Regression = find a function that fits the observations Observations:

    (1949,100)(1950,117)...(1996,1462)(1997,1469)(1998,1467)(1999,1474)

    (x,y) pairs

  • Linear fit...Not so good...

  • Better linear fit...Take logarithmof y and fit astraight line

  • Transform back to originalSo so...

  • So what is regression about?Construct a model of a process, using examples of the process.

    Input: x (possibly a vector)Output: f(x) (generated by the process)Examples: Pairs of input and output {y, x}

    Our model:

    The function is our estimate of the true function g(x)

  • Assumption about the processThe fixed regressor modelx(n)Observed inputy(n)Observed outputg[x(n)]True underlying functione(n)I.I.D noise process with zero mean

    Data set:

  • Example0
  • Model Sets (examples)g(x) = 0.5 + x + x2 + 6x3F1F2F3F1={a+bx}; F2={a+bx+cx2}; F3={a+bx+cx2+dx3};Linear; Quadratic; Cubic; F1 F2 F3

  • Idealized regressionFg(x)Model Set (our hypothesis set)fopt(x) FErrorFind appropriate model family F and find f(x) F with minimum distance to g(x) (error)

  • How measure distance?

    Q: What is the distance (difference) between functions f and g?

  • Margin Slack VariableFor Example(xi, yi), function f, Margin slack variable

    : target accuracy in test : difference between target accuracy and margin in training

  • - Insensitive Loss FunctionLet = -, Margin Slack Variable

    Linear - Insensitive Loss:

    Quadratic - Insensitive Loss

  • Linear - Insensitive Loss a Linear SV MachineYi-

  • Basic Idea of SV RegressionStarting point We have input data X = {(x1,y1), ., (xN,yN)}GoalWe want to find a robust function f(x) that has at most deviation from the targets y, while at the same time being as flat as possible. Idea Simple Regression Problem + Optimization + Kernel Trick

  • Thus setting:

    Primal Regression Problem

  • Linear - Insensitive Loss Regressionmin

    subject to

    decide Insensitive Zone C a trade-off between error and ||w|| and C must be tuned simultaneously Regression is more difficult than Classification?

  • Parameters used in SV Regression

  • Dual FormulationLagrangian function will help us to formulate the dual problem

    : insensitive loss i* : Lagrange Multiplier i : difference value for points above band i*: difference value for points below bandOptimality Conditions

  • Dual Formulation(Cont)Dual Problem

    Solving

  • KKT Optimality Conditions and bKKT Optimality Conditions

    b can be computed as followsThis means that the Lagrange multipliers will only be non-zero for points outside the e band. Thus these points are the support vectors

  • The Idea of SVMinput space feature space

  • Kernel VersionWhy can we use Kernel?

    The complexity of a functions representation depends only on the number of SVs the complete algorithm can be described in terms of inner product. An implicit mapping to the feature spaceMapping via Kernel

  • Quadratic - Insensitive Loss RegressionProblem:min

    subject to

    Kernel Formulation

  • Kernel Ridge Regression & Gaussian Processes= 0 Least Square Linear Regression The weight decay factor is controlled by C min (~1/C)

    subject to Kernel Formulation (I: Identity Matrix) is also the mean of a Gaussian distribution

  • Architecture of SV Regression Machinesimilar to regression in a three-layered neural network!?

    b

  • ConclusionSVM is a useful alternative to neural networkTwo key concepts of SVMoptimizationkernel trickAdvantages of SV RegressionRepresent solution by a small subset of training pointsEnsure the existence of global minimumEnsure the optimization of a reliable eneralization bound

  • Discussion1: Influence of an insensitivity band on regression quality17 measured training data points are used.Left: = 0.1 15 SV are chosenRight: = 0.5 6 chosen SV produced a much better regression function

  • Discussion2: - Insensitive Loss

    Enables sparseness within SVs, but guarantees sparseness?

    Robust (robust to small changes in data/ model)

    Less sensitive to outliers