Smooth ε -Insensitive Regression by Loss Symmetrization

download Smooth  ε -Insensitive Regression by Loss Symmetrization

of 14

  • date post

  • Category


  • view

  • download


Embed Size (px)


Smooth ε -Insensitive Regression by Loss Symmetrization. Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University {oferd,shais,singer} COLT 2003: The Sixteenth Annual Conference on Learning Theory . Before We Begin …. - PowerPoint PPT Presentation

Transcript of Smooth ε -Insensitive Regression by Loss Symmetrization

  • Ofer Dekel, Shai Shalev-Shwartz, Yoram SingerSchool of Computer Science and EngineeringThe Hebrew University{oferd,shais,singer}

    COLT 2003: The Sixteenth Annual Conference on Learning Theory Smooth -Insensitive Regression by Loss Symmetrization

  • Before We Begin Linear Regression: givenfind such that

    Least Squares: minimize

    Support Vector Regression:minimizes.t.

  • Loss SymmetrizationLoss functions used in classification Boosting:Symmetric versions of these losses can be used for regression:

  • A General ReductionBegin with a regression training setwhere ,Generate 2m classification training examples of dimension n+1:

    Learn while maintainingby minimizing a margin-based classification loss

  • A Batch AlgorithmAn illustration of a single batch iterationSimplifying assumptions (just for the demo)Instances are in SetUse the Symmetric Log-loss

  • A Batch AlgorithmCalculate discrepancies and weights:0 1 2 3 443210

  • A Batch AlgorithmCumulative weights:0 1 2 3 4

  • Two Batch AlgorithmsUpdate the regressor:0 1 2 3 443210Log-Additive update

  • Progress BoundsTheorem: (Log-Additive update)

    Theorem: (Additive update)

    Lemma: Both bounds are non-negative and equal zero only at the optimum

  • Boosting RegularizationA new form of regularization for regression and classification Boosting

    Can be implemented by adding pseudo-examples

    * Communicated by Rob Schapirewhere

  • Regularization Contd.Regularization Compactness of the feasible set forRegularization A unique attainable optimizer of the loss function

    Proof of ConvergenceProgress + compactness + uniqueness =asymptotic convergence to the optimum

  • Exp-loss vs. Log-lossTwo synthetic datasetsLog-lossExp-loss

  • ExtensionsParallel vs. Sequential updatesParallel - update all elements of in parallelSequential - update the weight of a single weak regressor on each round (like classic boosting)Another loss function the Combined LossLog-lossExp-lossComb-loss

  • On-line AlgorithmsGD and EG online algorithms for Log-lossRelative loss boundsFuture DirectionsRegression tree learningSolving one-class and various ranking problems using similar constructionsRegression generalization bounds based on natural regularization