Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the...

38
Linear Rergression Time Complexity Calculation Nipun Batra January 30, 2020 IIT Gandhinagar

Transcript of Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the...

Page 1: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Linear Rergression Time Complexity

Calculation

Nipun Batra

January 30, 2020

IIT Gandhinagar

Page 2: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• Consider X ∈ RN×D

• N examples and D dimensions

• What is the time complexity of solving the normal equation

θ̂ = (XTX )−1XT y?

1

Page 3: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• Consider X ∈ RN×D

• N examples and D dimensions

• What is the time complexity of solving the normal equation

θ̂ = (XTX )−1XT y?

1

Page 4: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• Consider X ∈ RN×D

• N examples and D dimensions

• What is the time complexity of solving the normal equation

θ̂ = (XTX )−1XT y?

1

Page 5: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

O(D3)

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

2

Page 6: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

O(D3)

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

2

Page 7: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

O(D3)

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

2

Page 8: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

O(D3)

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

2

Page 9: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

O(D3)

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

2

Page 10: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

O(D3)

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

2

Page 11: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

O(D3)

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

2

Page 12: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

Start with random values of θ0 and θ1

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

• Question: Can you write the above for D dimensional data in

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

θD = θD − α ∂∂θD

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

3

Page 13: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

Start with random values of θ0 and θ1

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

• Question: Can you write the above for D dimensional data in

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

θD = θD − α ∂∂θD

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

3

Page 14: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

Start with random values of θ0 and θ1

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

• Question: Can you write the above for D dimensional data in

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

θD = θD − α ∂∂θD

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

3

Page 15: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

Start with random values of θ0 and θ1

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

• Question: Can you write the above for D dimensional data in

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

θD = θD − α ∂∂θD

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

3

Page 16: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

Start with random values of θ0 and θ1

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

• Question: Can you write the above for D dimensional data in

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

θD = θD − α ∂∂θD

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

3

Page 17: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

∂∂θ (y − Xθ)>(y − Xθ)

= ∂∂θ

(y> − θ>X>

)(y − Xθ)

= ∂∂θ

(y>y − θ>X>y − y>xθ + θ>X>Xθ

)= −2X>y + 2X>xθ

= 2X>(Xθ − y)

4

Page 18: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once!

5

Page 19: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once!

5

Page 20: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once!

5

Page 21: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once!

5

Page 22: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once!

5

Page 23: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once!

5

Page 24: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once! 5

Page 25: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

For each of the t iterations, we now need to first multiply αX>X

with θ which is matrix multiplication of a D × D matrix with a

D × 1, which is O(D2)

The remaining subtraction/addition can be done in O(D) for each

iteration.

What is overall computational complexity?

O(tD2) + O(D2N) = O((t + N)D2)

6

Page 26: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

For each of the t iterations, we now need to first multiply αX>X

with θ which is matrix multiplication of a D × D matrix with a

D × 1, which is O(D2)

The remaining subtraction/addition can be done in O(D) for each

iteration.

What is overall computational complexity?

O(tD2) + O(D2N) = O((t + N)D2)

6

Page 27: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

For each of the t iterations, we now need to first multiply αX>X

with θ which is matrix multiplication of a D × D matrix with a

D × 1, which is O(D2)

The remaining subtraction/addition can be done in O(D) for each

iteration.

What is overall computational complexity?

O(tD2) + O(D2N) = O((t + N)D2)

6

Page 28: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

For each of the t iterations, we now need to first multiply αX>X

with θ which is matrix multiplication of a D × D matrix with a

D × 1, which is O(D2)

The remaining subtraction/addition can be done in O(D) for each

iteration.

What is overall computational complexity?

O(tD2) + O(D2N) = O((t + N)D2)

6

Page 29: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent

For each of the t iterations, we now need to first multiply αX>X

with θ which is matrix multiplication of a D × D matrix with a

D × 1, which is O(D2)

The remaining subtraction/addition can be done in O(D) for each

iteration.

What is overall computational complexity?

O(tD2) + O(D2N) = O((t + N)D2)

6

Page 30: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 31: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 32: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 33: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 34: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 35: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 36: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 37: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7

Page 38: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

What is overall computational complexity?

O(NDt)

7