Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the...

Linear Rergression Time Complexity

Calculation

Nipun Batra

January 30, 2020

IIT Gandhinagar

Normal Equation

• Consider X ∈ RN×D

• N examples and D dimensions

• What is the time complexity of solving the normal equation

θ̂ = (XTX )−1XT y?

Normal Equation

θ̂ = (XTX )−1XT y?

Normal Equation

θ̂ = (XTX )−1XT y?

Normal Equation

• X has dimensions N × D, XT has dimensions D × N

• XTX is a matrix product of matrices of size: D × N and

N × D, which is O(D2N)

• Inversion of XTX is an inversion of a D × D matrix, which is

• XT y is a matrix vector product of size D × N and N × 1,

which is O(DN)

• (XTX )−1XT y is a matrix product of a D × D matrix and

D × 1 matrix, which is O(D2)

• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)

= O(D2N) + O(D3)

• Scales cubic in the number of columns/features of X

Normal Equation

which is O(DN)

= O(D2N) + O(D3)

Normal Equation

which is O(DN)

= O(D2N) + O(D3)

Normal Equation

which is O(DN)

= O(D2N) + O(D3)

Normal Equation

which is O(DN)

= O(D2N) + O(D3)

Normal Equation

which is O(DN)

= O(D2N) + O(D3)

Normal Equation

which is O(DN)

= O(D2N) + O(D3)

Gradient Descent

Start with random values of θ0 and θ1

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

• Question: Can you write the above for D dimensional data in

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

θD = θD − α ∂∂θD

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

Gradient Descent

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

Gradient Descent

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

Gradient Descent

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

Gradient Descent

Till convergence

• θ0 = θ0 − α∂

∂θ0(∑ε2i )

• θ1 = θ1 − α∂

∂θ1(∑ε2i )

vectorised form?

• θ0 = θ0 − α ∂∂θ0

(y − Xθ)> (y − Xθ)

θ1 = θ1 − α ∂∂θ1

(y − Xθ)> (y − Xθ)...

(y − Xθ)> (y − Xθ)

• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)

Gradient Descent

∂∂θ (y − Xθ)>(y − Xθ)

= ∂∂θ

(y> − θ>X>

)(y − Xθ)

= ∂∂θ

(y>y − θ>X>y − y>xθ + θ>X>Xθ

)= −2X>y + 2X>xθ

= 2X>(Xθ − y)

Gradient Descent

We can write the vectorised update equation as follows, for each

iteration

θ = θ − αX>(Xθ − y)

For t iterations, what is the computational complexity of our

gradient descent solution?

Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y

Complexity of computing X>y is O(DN)

Complexity of computing αX>y once we have X>y is O(D) since

X>y has D entries

Complexity of computing X>X is O(D2N) and then multiplying

with α is O(D2)

All of the above need only be calculated once!

Gradient Descent

iteration

θ = θ − αX>(Xθ − y)

X>y has D entries

with α is O(D2)

Gradient Descent

iteration

θ = θ − αX>(Xθ − y)

X>y has D entries

with α is O(D2)

Gradient Descent

iteration

θ = θ − αX>(Xθ − y)

X>y has D entries

with α is O(D2)

Gradient Descent

iteration

θ = θ − αX>(Xθ − y)

X>y has D entries

with α is O(D2)

Gradient Descent

iteration

θ = θ − αX>(Xθ − y)

X>y has D entries

with α is O(D2)

Gradient Descent

iteration

θ = θ − αX>(Xθ − y)

X>y has D entries

with α is O(D2)

All of the above need only be calculated once! 5

Gradient Descent

For each of the t iterations, we now need to first multiply αX>X

with θ which is matrix multiplication of a D × D matrix with a

D × 1, which is O(D2)

The remaining subtraction/addition can be done in O(D) for each

iteration.

What is overall computational complexity?

O(tD2) + O(D2N) = O((t + N)D2)

Gradient Descent

iteration.

O(tD2) + O(D2N) = O((t + N)D2)

Gradient Descent

iteration.

O(tD2) + O(D2N) = O((t + N)D2)

Gradient Descent

iteration.

O(tD2) + O(D2N) = O((t + N)D2)

Gradient Descent

iteration.

O(tD2) + O(D2N) = O((t + N)D2)

Gradient Descent (Alternative)

If we do not rewrite the expression θ = θ − αX>(Xθ − y)

For each iteration, we have:

• Computing Xθ is O(ND)

• Computing Xθ − y is O(N)

• Computing αX> is O(ND)

• Computing αX>(Xθ − y) is O(ND)

• Computing θ = θ − αX>(Xθ − y) is O(N)

O(NDt)

Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the...

Documents

Transcript of Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the...

The Complexity of Translation Membership for Macro Tree Transducers · 2009. 6. 23. · THE COMPLEXITY OF TRANSLATION MEMBERSHIP FOR MACRO TREE TRANSDUCERS Kazuhiro Inaba (The University

The Complexity of Tree Transducer Output Languages

The Complexity of Tree Transducer Output Languages

CS151 Complexity Theory Lecture 9 April 27, 2015.

THE COMPLEXITY OF TRANSLATION MEMBERSHIP FOR MACRO TREE TRANSDUCERS

Computational Complexity: A Modern Approach - Theorytheory.cs.princeton.edu/complexity/NPchap.pdf · DRAFT Chapter 2 NP and NP completeness “(if φ(n) ≈ Kn2)a then this would

Decidability and Complexity of Tree Share Formulashobor/Publications/2016/...Decidability and Complexity of Tree Share Formulas Introduction Shares Shares are embedded into separation

Algorithms and Complexity - · PDF fileE.G.M. Petrakis Algorithms and Complexity 4 Hard Problems: an exponential algorithm that solves the problem is known to exist E.g., TSP Is there

Holographic Complexity 101 - physik.uni-wuerzburg.de · Holographic complexity Susskind proposed \holographic complexity" as the CFT quantity that encodes the continued evolution

F2.complexity 2 · ·Algorithm complexity ·Big O and big Ω ·To calculate running time ·Analysis of recursive Algorithms Next time: Litterature: Chapter 4.5-4.7, slides ·Greedy

CS151 Complexity Theory

The low-complexity domain of the FUS RNA binding protein ...

Bayesian evidence & complexity as statistical tools for ...

Languages and Complexity - its.caltech.edumatilde/ComplexityLing.pdf · Reference:Robin Clark, Kolmogorov complexity and the information content of parameters, Technical Report. Philadelphia,

Complexity of Estimating Rényi Entropypeople.csail.mit.edu/jayadev/papers/soda2014renyi.pdf · The Complexity of Estimating R enyi Entropy Jayadev Acharya 1, Alon Orlitskyy 2, Ananda

CISC 235: Topic 1 Complexity of Iterative Algorithms.

Algorithm Mergesort: (nlogn) Complexity · Algorithm Mergesort: ( nlogn) Complexity Georgy Gimel’farb COMPSCI 220 Algorithms and Data Structures ... 91 2 25 8 70 20 65 15 50 31

Chapter 2: Complexity Analysis

Stable complexity and simplicial volume of manifolds

CS151 Complexity Theory Lecture 9 April 27, 2004.