Linear regression
description
Transcript of Linear regression
Linear regression
By gradient descent(with thanks to Prof. Ng’s machine learning course)
Extending the single variablemultivariate linear regression
hΘ(x) = Θ0 + Θ1x
hΘ(x) = Θ0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn
e.g. start with house prices versus sq ft and then move to house prices versus sq ft, number of bedrooms, age of house
hΘ(x) = Θ0x0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn
With x0 = 1
hΘ(x) = ΘTx
Cost functionJ(Θ) = (1/2m)Σ i=1,m (hΘ(x(i)) – y(i))2
Gradient descent:
Repeat {Θj = Θj - α ∂J(Θ)/∂Θj
} for all j simultaneously
Θj = Θj - (α /m)Σ i=1,m (hΘ(x(i)) – y(i))
Θ0 = Θ0 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x0(i) 1
Θ1 = Θ1 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x1(i)
Θ2 = Θ2 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x2(i)
What the Equations MeanThe matrices: y and x
PRICE SQFT AGE FEATS 2050 1 2650 13 7 2150 1 2664 6 5 2150 1 2921 3 6 1999 1 2580 4 4 1900 1 2580 4 4 1800 1 2774 2 4
Feature ScalingWould like all features to fall roughly into range -1 ≤ x ≤ +1
xi replace with (xi - µi )/si where µi is the mean and si is the range;alternatively, use mean and standard deviation
Don’t scale x0
Converting results back
Learning Rate and Debugging
With small enough α, J should decrease on each iteration: this is first test. An α too large could have you going past the minimum and climbing other side of curve.
With α too small, convergence is too slow.
Try series of α values, say .oo1, .003,. 01, .03, .1, .3, 1, …
Matlab Implementation
Feature Normalizationfunction [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;mu = zeros(1, size(X, 2));sigma = zeros(1, size(X, 2));
mu = mean(X);sigma = std(X); m = size(X,1); A = repmat(mu,m,1); X_norm = X_norm - A; A = repmat(sigma,m,1); X_norm =X_norm./A;
end
Gradient Descent
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examplesJ_history = zeros(num_iters, 1);
for iter = 1:num_itersA = (X*theta - y);deltatheta = (alpha/m)*(A'*X);theta = theta - deltatheta'; J_history(iter) = computeCostMulti(X, y, theta);
endend
Cost Function
function J = computeCostMulti(X, y, theta)
m = length(y); % number of training examples
A = (X*theta - y); J = (1/(2*m))*(A'*A);
end
PolynomialshΘ(x) = Θ0 + Θ1x + Θ2x2 + Θ3x3
Replace x with x1, x2 with x2, x3 with x3
Scale the x, x2 , x3 values
Normal EquationsΘ = (A’ A)-1 A’y
A(:,n+1) = ones(length(x),1,class(x));
for a polynomial:for j = n:-1:1 A(:,j) = x.*A(:,j+1);end
W = A'*A Y = A'*y
Θ = W\Y