Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture...

59
1 Generalized Linear Models Lecture 10: Nonparametric regression

Transcript of Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture...

Page 1: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

1

GeneralizedLinear Models

Lecture 10: Nonparametric regression

Page 2: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Nonparametric regression

yi = f(xi) + εi

How to estimate f?Could assume f(x) = β0 + β1x + β2x2 + β3x3

Or f(x) = β0 + β1xβ2

OK if you know the right form.But often better to assume only that f is continuous and smooth.

2

Page 3: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Examples

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

−1

0

1

0.00 0.25 0.50 0.75 1.00

x

y

Example A

●●

0

4

8

12

0.00 0.25 0.50 0.75 1.00

x

y

Example B

● ●

●●

● ●

●●

● ●

●●

● ●

50

60

70

80

90

2 3 4 5

eruptions

wai

ting

Old Faithful

3

Page 4: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Outline

1 Kernel estimators

2 Local polynomials

3 Splines

4 Multivariate predictors

4

Page 5: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Kernel estimators

In briefFor any point x0, the value of the function at that point f(x0) is somecombination of the (nearby) observations.

Kernel functionThe contribution of each observation xj , f(xj) to f(x0) is calculated using aweighting function or Kernel Kh(x0, xj). h is the width of theneighborhood.

5

Page 6: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

K-Nearest-Neighbor (KNN) Average

A simple estimate of f(x0) at any point x0 is the mean of the k pointsclosest to x0.

f̂(x0) = Ave(yi|xi ∈ Nk(x0)).

6

Page 7: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

KNN example

7

Page 8: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Problem with KNN

Problem

Regression function f̂(x) is discontinuous.

SolutionWeight all points such that their contribution drop off smoothly withdistance.

8

Page 9: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Kernel estimators

Nadaraya–Watson estimator

f̂h(x) =

n∑j=1

K(x− xj

h

)yj

n∑j=1

K(x− xj

h

)

K is a kernel function where∫K = 1, K(a) = K(−a), and K(0) ≥ K(a),

for all a.f̂ is a weighted moving averageNeed to choose K and h.

9

Page 10: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Common kernels

Uniform

K(x) =

12 −1 < x < 10 otherwise.

Epanechnikov

K(x) =

34(1− x2) −1 < x < 10 otherwise.

Tri-cube

K(x) =c(1− |x|

3)3 −1 < x < 10 otherwise.

GaussianK(x) =

1√2π

e−12x

2

10

Page 11: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Common kernels

0.00

0.25

0.50

0.75

−2 0 2

x

K(x

)

KernelEpanechnikov

Gaussian

Tri−cube

Uniform

11

Page 12: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

KNN vs Smooth Kernel Comparison

12

Page 13: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Old Faithful

13

Page 14: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Kernel smoothing

A smooth kernel is better, but otherwise the choice of kernel makeslittle difference.Optimal kernel (minimizing MSE) is Epanechnikov. It is also fast.The choice of h is crucial.

14

Page 15: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Example A

15

Page 16: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Example B

16

Page 17: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Cross-validation

CV(h) =1n

n∑j=1

(yj − f̂(−j)h (xj))2

(−j) indicates jth point omitted from estimatePick h that minimizes CV.Works ok provided there are no duplicate (x, y) pairs. Butoccasionally odd results.

17

Page 18: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Kernel smoothing in R

Many packages available. One of the better ones is KernSmooth:

fit <- locpoly(faithful$eruptions, faithful$waiting,degree=0, bandwidth=0.3) %>% as.tibble

ggplot(faithful) +geom_point(aes(x=eruptions,y=waiting)) +geom_line(data=fit, aes(x=x,y=y), col='blue')

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

50

60

70

80

90

2 3 4 5

eruptions

wai

ting

18

Page 19: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Problems with the Smooth Weighted Average

19

Page 20: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Outline

1 Kernel estimators

2 Local polynomials

3 Splines

4 Multivariate predictors

20

Page 21: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Local linear regression

Kernel smoothingEquivalent to local constant regression at each point.

minn∑j=1

wj(x)(yj − a0)2

Local Linear RegressionFit a line at each point instead.

minn∑j=1

wj(x)(yj − a0 − a1xj)2

21

Page 22: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Local polynomials

22

Step 1

World pulp price

Pul

p sh

ipm

ents

1015

2025

3035

500 600 700 800

Step 2

World pulp price

Wei

ght f

unct

ion

500 600 700 800

Step 3

World pulp price

Pul

p sh

ipm

ents

1015

2025

3035

500 600 700 800

Result

World pulp price

Pul

p sh

ipm

ents

1015

2025

3035

500 600 700 800

Page 23: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Local polynomials

minn∑j=1

wj(x)(yj −p∑k=0

akxpj )2

Local linear and local quadratic are commonly used.Robust regression can be used insteadLess biased at boundaries than kernel smoothingLocal quadratic less biased at peaks and troughs than local linear orkernel

23

Page 24: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Local polynomials in R

One useful implementation isKernSmooth::locpoly(x, y, degree, bandwidth)dpill can be used to choose the bandwidth h if degree=1.Otherwise, h could be selected by cross-validation.But most people seem to use trial and error — finding the largest hthat captures what they think they see by eye.

24

Page 25: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess

Best known implementation is loess (locally quadratic)

fit <- loess(y ~ x, span=0.75, degree=2,family="gaussian", data)

Uses tri-cube kernel and variable bandwidth.span controls bandwidth. Specified in terms of percentage of datacovered.degree is order of polynomialUse family="symmetric" for a robust fit

25

Page 26: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess

● ●

●●

● ●

●●

● ●

●●

● ●

50

60

70

80

90

2 3 4 5

eruptions

wai

ting

Old Faithful (Loess, span=0.75)

26

Page 27: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess in R

smr <- loess(waiting ~ eruptions, data=faithful)ggplot(faithful) +

geom_point(aes(x=eruptions,y=waiting)) +ggtitle("Old Faithful (Loess, span=0.75)") +geom_line(aes(x=eruptions, y=fitted(smr)),

col='blue')

27

Page 28: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

−1

0

1

0.00 0.25 0.50 0.75 1.00

x

y

Example A (Loess, span=0.75)

28

Page 29: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

−1

0

1

0.00 0.25 0.50 0.75 1.00

x

y

Example A (Loess, span=0.22)

29

Page 30: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess

●●

0

4

8

12

0.00 0.25 0.50 0.75 1.00

x

y

Example B (Robust Loess, span=0.75)

30

Page 31: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess and geom_smooth()

ggplot(exa) +geom_point(aes(x=x,y=y)) +geom_smooth(aes(x=x,y=y), method='loess',

span=0.22)

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

−1

0

1

0.00 0.25 0.50 0.75 1.00

x

y

31

Page 32: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Loess and geom_smooth()

Because local polynomials use local linear models, we can easily findstandard errors for the fitted values.

Connected together, these form a pointwise confidence band.

Automatically produced using geom_smooth

32

Page 33: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Outline

1 Kernel estimators

2 Local polynomials

3 Splines

4 Multivariate predictors

33

Page 34: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Splines

34

Page 35: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Splines

A spline is a continuous function f(x) interpolating all points (κj, yj) forj = 1, . . . , K and consisting of polynomials between each consecutive pairof ‘knots’ κj and κj+1.

Parameters constrained so that f(x) is continuous.Further constraints imposed to give continuous derivatives.Cubic splines most common, with f′, f′′ continuous.

35

Page 36: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Splines

A spline is a continuous function f(x) interpolating all points (κj, yj) forj = 1, . . . , K and consisting of polynomials between each consecutive pairof ‘knots’ κj and κj+1.

Parameters constrained so that f(x) is continuous.Further constraints imposed to give continuous derivatives.Cubic splines most common, with f′, f′′ continuous.

35

Page 37: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Smoothing splines

Let y = f(x) + ε where ε ∼ IID(0, σ2). ThenChoose f̂ to minimize

1n∑i(yi − f(xi))2 + λ

∫[f′′(x)]2dx

λ is smoothing parameter to be chosen∫[f′′(x)]2dx is a measure of roughness.

Solution: f̂ is a cubic spline with knots κi = xi, i = 1, . . . , n (ignoringduplicates).Other penalties lead to higher order splinesCross-validation can be used to select λ.

36

Page 38: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Smoothing splines

37

Page 39: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Smoothing splines

● ●

●●

● ●

●●

● ●

●●

● ●

50

60

70

80

90

2 3 4 5

eruptions

wai

ting

Old Faithful (Smoothing spline, lambda chosen by CV)

38

Page 40: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Smoothing splines

smr <- smooth.spline(faithful$eruptions, faithful$waiting,

cv=TRUE)

smr <- data.frame(x=smr$x,y=smr$y)

ggplot(faithful) +

geom_point(aes(x=eruptions,y=waiting)) +

ggtitle("Old Faithful (Smoothing spline, lambda chosen by CV)") +

geom_line(data=smr, aes(x=x, y=y), col='blue')

39

Page 41: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Smoothing splines

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

−1

0

1

0.00 0.25 0.50 0.75 1.00

x

y

Example A (Smoothing spline, lambda chosen by CV)

40

Page 42: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Smoothing splines

●●

0

4

8

12

0.00 0.25 0.50 0.75 1.00

x

y

Example B (Smoothing spline, lambda chosen by CV)

41

Page 43: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Regression splines

Fewer knots than smoothing splines.Need to choose the knots rather than a smoothing parameter.Can be estimated as a linear model once knots are selected.

42

Page 44: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

General cubic regression splines

Let κ1 < κ2 < · · · < κK be “knots” in interval (a, b).

Let x1 = x, x2 = x2, x3 = x3, xj = (x− κj−3)3+ for j = 4, . . . , K + 3.

Then the regression of y on x1, . . . , xK+3 is piecewise cubic, butsmooth at the knots.

Choice of knots can be difficult and arbitrary.

Automatic knot selection algorithms very slow.

Often use equally spaced knots. Then only need to choose K.

43

Page 45: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Natural splines in R

fit <- lm(waiting ~ ns(eruptions, df=6), faithful)ggplot(faithful) +

geom_point(aes(x=eruptions,y=waiting)) +ggtitle("Old Faithful (Natural splines, 6 df)") +geom_line(aes(x=eruptions, y=fitted(fit)), col='blue')

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

50

60

70

80

90

2 3 4 5

eruptions

wai

ting

Old Faithful (Natural splines, 6 df)

44

Page 46: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Natural splines in R

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

−1

0

1

0.00 0.25 0.50 0.75 1.00

x

y

Example A (Natural splines, 12 df)

45

Page 47: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Natural splines in R

●●

0

4

8

12

0.00 0.25 0.50 0.75 1.00

x

y

Example B (Natural splines, 3 df)

46

Page 48: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Natural splines in R

●●

0

4

8

12

0.00 0.25 0.50 0.75 1.00

x

y

Example B (Natural splines, 10 df)

47

Page 49: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Splines and geom_smooth()

ggplot(exa) +geom_point(aes(x=x,y=y)) +geom_smooth(aes(x=x,y=y), method='gam',

formula = y ~ s(x,k=12))

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

−1

0

1

0.00 0.25 0.50 0.75 1.00

x

y

48

Page 50: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Splines and geom_smooth()

Because regression splines use local linear models, we can easily findstandard errors for the fitted values.

Connected together, these form a pointwise confidence band.

Automatically produced using geom_smooth

49

Page 51: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Outline

1 Kernel estimators

2 Local polynomials

3 Splines

4 Multivariate predictors

50

Page 52: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Multivariate predictors

yi = f(xi) + εi, x ∈ Rd

Most methods extend naturally to higher dimensions.

Multivariate kernel methodsMultivariate local quadratic surfacesThin-plate splines (2-d version of smoothing splines)

ProblemThe curse of dimensionality!

51

Page 53: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Multivariate predictors

yi = f(xi) + εi, x ∈ Rd

Most methods extend naturally to higher dimensions.

Multivariate kernel methodsMultivariate local quadratic surfacesThin-plate splines (2-d version of smoothing splines)

ProblemThe curse of dimensionality!

51

Page 54: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Curse of dimensionality

Most data lie near the boundary

x <- matrix(runif(1e6,-1,1), ncol=100)boundary <- function(z) { any(abs(z) > 0.95) }

mean(apply(x[,1,drop=FALSE], 1, boundary))

## [1] 0.0494

mean(apply(x[,1:2], 1, boundary))

## [1] 0.0971

mean(apply(x[,1:5], 1, boundary))

## [1] 0.2276

52

Page 55: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Curse of dimensionality

Most data lie near the boundary

x <- matrix(runif(1e6,-1,1), ncol=100)boundary <- function(z) { any(abs(z) > 0.95) }

mean(apply(x[,1:10], 1, boundary))

## [1] 0.4052

mean(apply(x[,1:50], 1, boundary))

## [1] 0.9205

mean(apply(x[,1:100], 1, boundary))

## [1] 0.9938

53

Page 56: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Curse of dimensionality

Data are sparse

x <- matrix(runif(1e6,-1,1), ncol=100)nearby <- function(z) { all(abs(z) < 0.5) }mean(apply(x[,1,drop=FALSE], 1, nearby))

## [1] 0.4953

mean(apply(x[,1:2], 1, nearby))

## [1] 0.2512

mean(apply(x[,1:5], 1, nearby))

## [1] 0.0291

54

Page 57: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Curse of dimensionality

Data are sparse

x <- matrix(runif(1e6,-1,1), ncol=100)nearby <- function(z) { all(abs(z) < 0.5) }mean(apply(x[,1:10], 1, nearby))

## [1] 5e-04

mean(apply(x[,1:50], 1, nearby))

## [1] 0

mean(apply(x[,1:100], 1, nearby))

## [1] 0

55

Page 58: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Bivariate smoothing

lomod <- loess(sr ~ pop15 + ddpi, data=savings)

pop15

25

3035

4045ddpi 5

10

15

savings rate

10

20

30

40

56

Page 59: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local

Bivariate smoothing

library(mgcv)smod <- gam(sr ~ s(pop15, ddpi), data=savings)

pop15

25

3035

4045

ddpi

5

10

15

linear predictor 10

15

57