• date post

03-Jan-2016
• Category

## Documents

• view

28

8

Embed Size (px)

description

Machine Learning Seminar: Support Vector Regression. Presented by: Heng Ji 10/08/03. Outline. Regression Background Linear ε - Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel Formulation Quadratic ε - Insensitive Loss Algorithm - PowerPoint PPT Presentation

### Transcript of Machine Learning Seminar: Support Vector Regression

• Machine Learning Seminar:Support Vector Regression

Presented by: Heng Ji10/08/03

• OutlineRegression BackgroundLinear - Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel FormulationQuadratic - Insensitive Loss AlgorithmKernel Ridge Regression & Gaussian Process

• Regression = find a function that fits the observations Observations:

(1949,100)(1950,117)...(1996,1462)(1997,1469)(1998,1467)(1999,1474)

(x,y) pairs

• Linear fit...Not so good...

• Better linear fit...Take logarithmof y and fit astraight line

• Transform back to originalSo so...

• So what is regression about?Construct a model of a process, using examples of the process.

Input: x (possibly a vector)Output: f(x) (generated by the process)Examples: Pairs of input and output {y, x}

Our model:

The function is our estimate of the true function g(x)

• Assumption about the processThe fixed regressor modelx(n)Observed inputy(n)Observed outputg[x(n)]True underlying functione(n)I.I.D noise process with zero mean

Data set:

• Example0
• Model Sets (examples)g(x) = 0.5 + x + x2 + 6x3F1F2F3F1={a+bx}; F2={a+bx+cx2}; F3={a+bx+cx2+dx3};Linear; Quadratic; Cubic; F1 F2 F3

• Idealized regressionFg(x)Model Set (our hypothesis set)fopt(x) FErrorFind appropriate model family F and find f(x) F with minimum distance to g(x) (error)

• How measure distance?

Q: What is the distance (difference) between functions f and g?

• Margin Slack VariableFor Example(xi, yi), function f, Margin slack variable

: target accuracy in test : difference between target accuracy and margin in training

• - Insensitive Loss FunctionLet = -, Margin Slack Variable

Linear - Insensitive Loss:

• Linear - Insensitive Loss a Linear SV MachineYi-

• Basic Idea of SV RegressionStarting point We have input data X = {(x1,y1), ., (xN,yN)}GoalWe want to find a robust function f(x) that has at most deviation from the targets y, while at the same time being as flat as possible. Idea Simple Regression Problem + Optimization + Kernel Trick

• Thus setting:

Primal Regression Problem

• Linear - Insensitive Loss Regressionmin

subject to

decide Insensitive Zone C a trade-off between error and ||w|| and C must be tuned simultaneously Regression is more difficult than Classification?

• Parameters used in SV Regression

• Dual FormulationLagrangian function will help us to formulate the dual problem

: insensitive loss i* : Lagrange Multiplier i : difference value for points above band i*: difference value for points below bandOptimality Conditions

• Dual Formulation(Cont)Dual Problem

Solving

• KKT Optimality Conditions and bKKT Optimality Conditions

b can be computed as followsThis means that the Lagrange multipliers will only be non-zero for points outside the e band. Thus these points are the support vectors

• The Idea of SVMinput space feature space

• Kernel VersionWhy can we use Kernel?

The complexity of a functions representation depends only on the number of SVs the complete algorithm can be described in terms of inner product. An implicit mapping to the feature spaceMapping via Kernel

• Quadratic - Insensitive Loss RegressionProblem:min

subject to

Kernel Formulation

• Kernel Ridge Regression & Gaussian Processes= 0 Least Square Linear Regression The weight decay factor is controlled by C min (~1/C)

subject to Kernel Formulation (I: Identity Matrix) is also the mean of a Gaussian distribution

• Architecture of SV Regression Machinesimilar to regression in a three-layered neural network!?

b

• ConclusionSVM is a useful alternative to neural networkTwo key concepts of SVMoptimizationkernel trickAdvantages of SV RegressionRepresent solution by a small subset of training pointsEnsure the existence of global minimumEnsure the optimization of a reliable eneralization bound

• Discussion1: Influence of an insensitivity band on regression quality17 measured training data points are used.Left: = 0.1 15 SV are chosenRight: = 0.5 6 chosen SV produced a much better regression function

• Discussion2: - Insensitive Loss

Enables sparseness within SVs, but guarantees sparseness?

Robust (robust to small changes in data/ model)

Less sensitive to outliers