Smooth Smooth εε-Insensitive -Insensitive Regression by Loss Regression by Loss
SymmetrizationSymmetrizationOfer Dekel, Shai Shalev-Shwartz, Yoram SingerOfer Dekel, Shai Shalev-Shwartz, Yoram Singer
School of Computer Science and EngineeringSchool of Computer Science and EngineeringThe Hebrew UniversityThe Hebrew University
{oferd,shais,singer}@cs.huji.ac.il{oferd,shais,singer}@cs.huji.ac.il
COLT 2003: The Sixteenth Annual Conference on Learning Theory COLT 2003: The Sixteenth Annual Conference on Learning Theory
Before We Begin …Before We Begin …Linear Regression:Linear Regression: given given
find such that find such that
Least Squares:Least Squares: minimize minimize
Support Vector Regression:Support Vector Regression:
minimizeminimize
s.t. s.t.
Loss SymmetrizationLoss SymmetrizationLoss functions used in classification Loss functions used in classification Boosting:Boosting:
Symmetric versions of these losses Symmetric versions of these losses can be used for regression:can be used for regression:
• Begin with a Begin with a regressionregression training set training setwhere ,where ,
• Generate Generate 2m2m classificationclassification training examples of training examples of dimension dimension n+1n+1::
• Learn while maintainingLearn while maintainingby minimizing a margin-based classification loss by minimizing a margin-based classification loss
A General ReductionA General Reduction
An illustration of a single batch iterationAn illustration of a single batch iteration
Simplifying assumptions (just for the demo)Simplifying assumptions (just for the demo)– Instances are in Instances are in – SetSet– Use the Symmetric Log-lossUse the Symmetric Log-loss
A Batch AlgorithmA Batch Algorithm
A Batch AlgorithmA Batch AlgorithmCalculate discrepancies and weights:Calculate discrepancies and weights:
0 1 2 3 4
4
3
2
1
0
Cumulative weights:Cumulative weights:
0 1 2 3 4
A Batch AlgorithmA Batch Algorithm
Update the regressor:Update the regressor:
0 1 2 3 4
4
3
2
1
0
Two Batch AlgorithmsTwo Batch Algorithms
or Additive updateor Additive update
Log-Additive updateLog-Additive update
Theorem: Theorem: (Log-Additive update)(Log-Additive update)
Theorem: Theorem: (Additive update)(Additive update)
Lemma: Both bounds are non-negative and Lemma: Both bounds are non-negative and equal zero only at the optimumequal zero only at the optimum
Progress BoundsProgress Bounds
A new form of regularization for regression and A new form of regularization for regression and classification Boostingclassification Boosting
CCan be implemented by addingpseudo-examples
* Communicated by Rob Schapirewhere
Boosting RegularizationBoosting Regularization
• RegularizationRegularization Compactness of the Compactness of the feasible set forfeasible set for
• RegularizationRegularization A unique attainable A unique attainable optimizer of the loss functionoptimizer of the loss function
Regularization Contd.Regularization Contd.
Proof of ConvergenceProof of Convergence
Progress + compactness + uniquenessProgress + compactness + uniqueness= = asymptotic convergence to the optimumasymptotic convergence to the optimum
• Two synthetic datasetsTwo synthetic datasets
Exp-loss vs. Log-lossExp-loss vs. Log-loss
Log-lossExp-loss
ExtensionsExtensions• Parallel vs. Sequential updatesParallel vs. Sequential updates
– Parallel - update all elements of in parallel– Sequential - update the weight of a single weak
regressor on each round (like classic boosting)
• Another loss function – the “Combined Loss”Another loss function – the “Combined Loss”
Log-loss Exp-loss Comb-loss
On-line AlgorithmsOn-line Algorithms• GD and EG online algorithms for Log-loss• Relative loss bounds
Future DirectionsFuture Directions• Regression tree learning• Solving one-class and various ranking
problems using similar constructions• Regression generalization bounds based
on natural regularization
Top Related