Logistic Regression: Behind the Scenes

download Logistic Regression: Behind the Scenes

of 48

Embed Size (px)

Transcript of Logistic Regression: Behind the Scenes

  • Logistic Regression: Behind the Scenes

    Chris White

    Capital One

    October 9, 2016

    Logistic Regression October 9, 2016 1 / 20

  • Outline Logistic Regression: A quick refresher

    Generative Model

    yi |, xi Bernoulli((, xi )

    )where

    (, x) :=1

    1 + exp ( x)

    is the sigmoid function.

    Logistic Regression October 9, 2016 2 / 20

  • Outline Logistic Regression: A quick refresher

    Interpretation

    This setup implies that the log-odds are a linear function of the inputs

    log

    (P[y = 1]P[y = 0]

    )= x

    This linear relationship makes the individual coefficients k easy tointerpret: a unit increase in xk increases the odds of y = 1 by a factorof k (all else held equal)

    Question: Given data, how do we determine the best values for ?

    Logistic Regression October 9, 2016 3 / 20

  • Outline Logistic Regression: A quick refresher

    Interpretation

    This setup implies that the log-odds are a linear function of the inputs

    log

    (P[y = 1]P[y = 0]

    )= x

    This linear relationship makes the individual coefficients k easy tointerpret: a unit increase in xk increases the odds of y = 1 by a factorof k (all else held equal)

    Question: Given data, how do we determine the best values for ?

    Logistic Regression October 9, 2016 3 / 20

  • Outline Logistic Regression: A quick refresher

    Interpretation

    This setup implies that the log-odds are a linear function of the inputs

    log

    (P[y = 1]P[y = 0]

    )= x

    This linear relationship makes the individual coefficients k easy tointerpret: a unit increase in xk increases the odds of y = 1 by a factorof k (all else held equal)

    Question: Given data, how do we determine the best values for ?

    Logistic Regression October 9, 2016 3 / 20

  • Outline Logistic Regression: A quick refresher

    Typical Answer

    Maximum Likelihood Estimation

    = arg max

    i

    ( xi )yi (1 ( xi ))1yi

    Log-likelihood is concave

    Maximum Likelihood Estimation allows us to do classical statistics(i.e., we know the asymptotic distribution of the estimator)

    Note: the likelihood is a function of the predicted probabilities andthe observed response

    Logistic Regression October 9, 2016 4 / 20

  • Outline Logistic Regression: A quick refresher

    Typical Answer

    Maximum Likelihood Estimation

    = arg max

    i

    ( xi )yi (1 ( xi ))1yi

    Log-likelihood is concave

    Maximum Likelihood Estimation allows us to do classical statistics(i.e., we know the asymptotic distribution of the estimator)

    Note: the likelihood is a function of the predicted probabilities andthe observed response

    Logistic Regression October 9, 2016 4 / 20

  • Outline Logistic Regression: A quick refresher

    Typical Answer

    Maximum Likelihood Estimation

    = arg max

    i

    ( xi )yi (1 ( xi ))1yi

    Log-likelihood is concave

    Maximum Likelihood Estimation allows us to do classical statistics(i.e., we know the asymptotic distribution of the estimator)

    Note: the likelihood is a function of the predicted probabilities andthe observed response

    Logistic Regression October 9, 2016 4 / 20

  • Outline Logistic Regression: A quick refresher

    Typical Answer

    Maximum Likelihood Estimation

    = arg max

    i

    ( xi )yi (1 ( xi ))1yi

    Log-likelihood is concave

    Maximum Likelihood Estimation allows us to do classical statistics(i.e., we know the asymptotic distribution of the estimator)

    Note: the likelihood is a function of the predicted probabilities andthe observed response

    Logistic Regression October 9, 2016 4 / 20

  • Outline This is not the end of the story

    This is not the end of the story

    Now we are confronted with an optimization problem...

    The story doesnt end here!

    Uncritically throwing your data into an off-the-shelf solver could result in abad model.

    Logistic Regression October 9, 2016 5 / 20

  • Outline This is not the end of the story

    This is not the end of the story

    Now we are confronted with an optimization problem...

    The story doesnt end here!

    Uncritically throwing your data into an off-the-shelf solver could result in abad model.

    Logistic Regression October 9, 2016 5 / 20

  • Outline This is not the end of the story

    Important Decisions

    Inference vs. Prediction: your goals should influence yourimplementation choices

    Coefficient accuracy vs. speed vs. predictive accuracyMulticollinearity leads to both statistical and numerical issues

    Desired Input (e.g., missing values?)

    Desired Output (e.g., p-values)

    Handling of edge cases (e.g., quasi-separable data)

    Regularization and Bayesian Inference

    Goal

    We want to explore these questions with an eye on statsmodels,scikit-learn and SASs proc logistic.

    Logistic Regression October 9, 2016 6 / 20

  • Outline This is not the end of the story

    Important Decisions

    Inference vs. Prediction: your goals should influence yourimplementation choices

    Coefficient accuracy vs. speed vs. predictive accuracyMulticollinearity leads to both statistical and numerical issues

    Desired Input (e.g., missing values?)

    Desired Output (e.g., p-values)

    Handling of edge cases (e.g., quasi-separable data)

    Regularization and Bayesian Inference

    Goal

    We want to explore these questions with an eye on statsmodels,scikit-learn and SASs proc logistic.

    Logistic Regression October 9, 2016 6 / 20

  • Outline This is not the end of the story

    Important Decisions

    Inference vs. Prediction: your goals should influence yourimplementation choices

    Coefficient accuracy vs. speed vs. predictive accuracyMulticollinearity leads to both statistical and numerical issues

    Desired Input (e.g., missing values?)

    Desired Output (e.g., p-values)

    Handling of edge cases (e.g., quasi-separable data)

    Regularization and Bayesian Inference

    Goal

    We want to explore these questions with an eye on statsmodels,scikit-learn and SASs proc logistic.

    Logistic Regression October 9, 2016 6 / 20

  • Outline This is not the end of the story

    Important Decisions

    Inference vs. Prediction: your goals should influence yourimplementation choices

    Coefficient accuracy vs. speed vs. predictive accuracyMulticollinearity leads to both statistical and numerical issues

    Desired Input (e.g., missing values?)

    Desired Output (e.g., p-values)

    Handling of edge cases (e.g., quasi-separable data)

    Regularization and Bayesian Inference

    Goal

    We want to explore these questions with an eye on statsmodels,scikit-learn and SASs proc logistic.

    Logistic Regression October 9, 2016 6 / 20

  • Outline This is not the end of the story

    Important Decisions

    Inference vs. Prediction: your goals should influence yourimplementation choices

    Coefficient accuracy vs. speed vs. predictive accuracyMulticollinearity leads to both statistical and numerical issues

    Desired Input (e.g., missing values?)

    Desired Output (e.g., p-values)

    Handling of edge cases (e.g., quasi-separable data)

    Regularization and Bayesian Inference

    Goal

    We want to explore these questions with an eye on statsmodels,scikit-learn and SASs proc logistic.

    Logistic Regression October 9, 2016 6 / 20

  • Outline This is not the end of the story

    Important Decisions

    Inference vs. Prediction: your goals should influence yourimplementation choices

    Coefficient accuracy vs. speed vs. predictive accuracyMulticollinearity leads to both statistical and numerical issues

    Desired Input (e.g., missing values?)

    Desired Output (e.g., p-values)

    Handling of edge cases (e.g., quasi-separable data)

    Regularization and Bayesian Inference

    Goal

    We want to explore these questions with an eye on statsmodels,scikit-learn and SASs proc logistic.

    Logistic Regression October 9, 2016 6 / 20

  • Outline This is not the end of the story

    Important Decisions

    Inference vs. Prediction: your goals should influence yourimplementation choices

    Coefficient accuracy vs. speed vs. predictive accuracyMulticollinearity leads to both statistical and numerical issues

    Desired Input (e.g., missing values?)

    Desired Output (e.g., p-values)

    Handling of edge cases (e.g., quasi-separable data)

    Regularization and Bayesian Inference

    Goal

    We want to explore these questions with an eye on statsmodels,scikit-learn and SASs proc logistic.

    Logistic Regression October 9, 2016 6 / 20

  • Under the Hood How Needs Translate to Implementation Choices

    Prediction vs. Inference

    Coefficients and Convergence

    In the case of inference, coefficient sign and precision are important.

    Example: A model includes economic variables, and will be used forstress-testing adverse economic scenarios.

    Logistic Regression October 9, 2016 7 / 20

  • Under the Hood How Needs Translate to Implementation Choices

    Prediction vs. Inference

    Coefficients and Convergence

    In the case of inference, coefficient sign and precision are important.

    Example: A model includes economic variables, and will be used forstress-testing adverse economic scenarios.

    Logistic Regression October 9, 2016 7 / 20

  • Under the Hood How Needs Translate to Implementation Choices

    Prediction vs. Inference, contd

    Coefficients and Multicollinearity

    Linear models are invariant under affine trans