Smooth ε -Insensitive Regression by Loss Symmetrization

14
Smooth Smooth ε ε -Insensitive -Insensitive Regression by Loss Regression by Loss Symmetrization Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer Singer School of Computer Science and Engineering School of Computer Science and Engineering The Hebrew University The Hebrew University {oferd,shais,singer}@cs.huji.ac.il {oferd,shais,singer}@cs.huji.ac.il COLT 2003: The Sixteenth Annual Conference on COLT 2003: The Sixteenth Annual Conference on Learning Theory Learning Theory

description

Smooth ε -Insensitive Regression by Loss Symmetrization. Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University {oferd,shais,singer}@cs.huji.ac.il COLT 2003: The Sixteenth Annual Conference on Learning Theory . Before We Begin …. - PowerPoint PPT Presentation

Transcript of Smooth ε -Insensitive Regression by Loss Symmetrization

Page 1: Smooth  ε -Insensitive Regression by Loss Symmetrization

Smooth Smooth εε-Insensitive -Insensitive Regression by Loss Regression by Loss

SymmetrizationSymmetrizationOfer Dekel, Shai Shalev-Shwartz, Yoram SingerOfer Dekel, Shai Shalev-Shwartz, Yoram Singer

School of Computer Science and EngineeringSchool of Computer Science and EngineeringThe Hebrew UniversityThe Hebrew University

{oferd,shais,singer}@cs.huji.ac.il{oferd,shais,singer}@cs.huji.ac.il

COLT 2003: The Sixteenth Annual Conference on Learning Theory COLT 2003: The Sixteenth Annual Conference on Learning Theory

Page 2: Smooth  ε -Insensitive Regression by Loss Symmetrization

Before We Begin …Before We Begin …Linear Regression:Linear Regression: given given

find such that find such that

Least Squares:Least Squares: minimize minimize

Support Vector Regression:Support Vector Regression:

minimizeminimize

s.t. s.t.

Page 3: Smooth  ε -Insensitive Regression by Loss Symmetrization

Loss SymmetrizationLoss SymmetrizationLoss functions used in classification Loss functions used in classification Boosting:Boosting:

Symmetric versions of these losses Symmetric versions of these losses can be used for regression:can be used for regression:

Page 4: Smooth  ε -Insensitive Regression by Loss Symmetrization

• Begin with a Begin with a regressionregression training set training setwhere ,where ,

• Generate Generate 2m2m classificationclassification training examples of training examples of dimension dimension n+1n+1::

• Learn while maintainingLearn while maintainingby minimizing a margin-based classification loss by minimizing a margin-based classification loss

A General ReductionA General Reduction

Page 5: Smooth  ε -Insensitive Regression by Loss Symmetrization

An illustration of a single batch iterationAn illustration of a single batch iteration

Simplifying assumptions (just for the demo)Simplifying assumptions (just for the demo)– Instances are in Instances are in – SetSet– Use the Symmetric Log-lossUse the Symmetric Log-loss

A Batch AlgorithmA Batch Algorithm

Page 6: Smooth  ε -Insensitive Regression by Loss Symmetrization

A Batch AlgorithmA Batch AlgorithmCalculate discrepancies and weights:Calculate discrepancies and weights:

0 1 2 3 4

4

3

2

1

0

Page 7: Smooth  ε -Insensitive Regression by Loss Symmetrization

Cumulative weights:Cumulative weights:

0 1 2 3 4

A Batch AlgorithmA Batch Algorithm

Page 8: Smooth  ε -Insensitive Regression by Loss Symmetrization

Update the regressor:Update the regressor:

0 1 2 3 4

4

3

2

1

0

Two Batch AlgorithmsTwo Batch Algorithms

or Additive updateor Additive update

Log-Additive updateLog-Additive update

Page 9: Smooth  ε -Insensitive Regression by Loss Symmetrization

Theorem: Theorem: (Log-Additive update)(Log-Additive update)

Theorem: Theorem: (Additive update)(Additive update)

Lemma: Both bounds are non-negative and Lemma: Both bounds are non-negative and equal zero only at the optimumequal zero only at the optimum

Progress BoundsProgress Bounds

Page 10: Smooth  ε -Insensitive Regression by Loss Symmetrization

A new form of regularization for regression and A new form of regularization for regression and classification Boostingclassification Boosting

CCan be implemented by addingpseudo-examples

* Communicated by Rob Schapirewhere

Boosting RegularizationBoosting Regularization

Page 11: Smooth  ε -Insensitive Regression by Loss Symmetrization

• RegularizationRegularization Compactness of the Compactness of the feasible set forfeasible set for

• RegularizationRegularization A unique attainable A unique attainable optimizer of the loss functionoptimizer of the loss function

Regularization Contd.Regularization Contd.

Proof of ConvergenceProof of Convergence

Progress + compactness + uniquenessProgress + compactness + uniqueness= = asymptotic convergence to the optimumasymptotic convergence to the optimum

Page 12: Smooth  ε -Insensitive Regression by Loss Symmetrization

• Two synthetic datasetsTwo synthetic datasets

Exp-loss vs. Log-lossExp-loss vs. Log-loss

Log-lossExp-loss

Page 13: Smooth  ε -Insensitive Regression by Loss Symmetrization

ExtensionsExtensions• Parallel vs. Sequential updatesParallel vs. Sequential updates

– Parallel - update all elements of in parallel– Sequential - update the weight of a single weak

regressor on each round (like classic boosting)

• Another loss function – the “Combined Loss”Another loss function – the “Combined Loss”

Log-loss Exp-loss Comb-loss

Page 14: Smooth  ε -Insensitive Regression by Loss Symmetrization

On-line AlgorithmsOn-line Algorithms• GD and EG online algorithms for Log-loss• Relative loss bounds

Future DirectionsFuture Directions• Regression tree learning• Solving one-class and various ranking

problems using similar constructions• Regression generalization bounds based

on natural regularization