LECTURE NOTES IN LINEAR ALGEBRA - Pokrovka11's Blog · PDF fileInternational College of...

International College of Economics and Finance

LECTURE NOTES

INLINEAR ALGEBRA

α11 β12 γ13 δ14 ε15

ζ21 η22 θ23 ι24 κ25

λ31 µ32 ν33 ξ34 o35

π41 ρ42 σ43 τ44 υ45

ϕ51 χ52 ψ53 ω54 ,

ICEF MOSCOW 2011

CONTENTS CONTENTS

ContentsPreface 5

1 Systems of linear equations 71.1 A few introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Transformations of matrix rows . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Gauss elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Geometric interpretation of SLE . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Linear vector space 132.1 Cartesian product of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Linear vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Linear hull and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Fundamental set of solutions 213.1 Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Fundamental set of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 The determinant and its applications 264.1 Special cases 2×2 and 3×3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Properties of the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 Formal definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Laplace’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5 Minors and ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.6 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Matrix algebra 355.1 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.3 Inverse matrix by Gauss elimination . . . . . . . . . . . . . . . . . . . . . . . . . 385.4 Inverse matrix by algebraic complements . . . . . . . . . . . . . . . . . . . . . . 405.5 Matrix equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.6 Elementary transformations matrices . . . . . . . . . . . . . . . . . . . . . . . . 435.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3

CONTENTS CONTENTS

6 Linear operators 486.1 Matrix of a linear operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.2 Change of base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.3 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.4 Diagonalizable matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Quadratic forms 597.1 Matrix of a quadratic form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597.2 Change of base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.3 Canonical diagonal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.4 Sylvester’s criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

8 Euclidean vector spaces 668.1 Linear spaces with dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . 678.2 Gram-Schmidt orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 688.3 Projection of a vector on a linear subspace . . . . . . . . . . . . . . . . . . . . . 708.4 Orthogonal complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.5 Intersection and sum of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . 748.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

9 Answers to problems 82

10 Sample Midterm Exam I 90

11 Sample Midterm Exam II 95

12 Sample Midterm Exam III 100

13 Sample Final Exam I 105

14 Sample Final Exam II 110

15 Sample Final Exam III 115

Index 119

4

CONTENTS CONTENTS

Preface

This study guide is based on the lecture notes taken by a group of students in my LinearAlgebra class at ICEF in 2009. Several handwritten lecture notes were compiled into one bigpiece with the addition of numerical exercises and extended explanation to the parts of lecturenotes that were not explained clearly in the class. Numerical exercises are listed at the endof each chapter, and their main purpose is to help you master your algebraic skills as well asachieve enough confidence that you worked out each individual topic well enough. The piecesmissing from the original lecture notes were, in most cases, theorems’ proofs, which I had toskip mostly due to time constraints, and which are now included here. They are still essentialfor students who are used to study mathematics with proofs. Additionally, I included six full-length practice exams (three midterm exams and three final exams) to give you an idea of whatwill be in the the end of the course and also to help you get prepared and pass these importantacademic checkpoints.

It is very important for you to keep in mind that this study guide is not a replacement forthe attendance of lectures and seminars, but rather a manual containing a detailed inventoryof all class topics, which you can use in case if you missed something in the class or simplyif you need an extra piece of explanation to the topic. This manual was not designed to bea textbook, and you should not use it as a textbook. But you can use it as lecture notes (inits original sense) and you can also take advantage of the additional problems to check yourperformance. Each chapter contains a number of “typical” problems which occur frequently inexams. These problems are discussed in detail in examples, which are followed by a plenty ofexercises. I strongly recommended that every student go through these exercises before eachexam.

Some students complained on the technical nature of problems in this study guide and onthe lack of economically-oriented examples. Here I have to remind that the class of LinearAlgebra was designed to be an instrumental complement to the other quantitative subjectsstudied at ICEF. In this light, and also taking into account the very strict time frame we havefor teaching this class, obtaining numerical proficiency becomes our first priority, at least for therange of topics we discuss here. Since we teach you very basic matrix techniques, the numericaldetails of these techniques is what actually matters, and that’s why we ask you to work withoutany aid of computers at home and on exams, and that’s why it becomes very technical. As toeconomic applications and examples, the most of them are related to optimization problemsand differential equations, which are the objectives of study in other classes, ones which theclass of Linear Algebra is complementing. I thus refer you to the other quantitative subjects

5

CONTENTS CONTENTS

since they have better opportunity to provide you with entertaining and relevant economicexamples.

I would like to acknowledge the efforts of all students who took lecture notes, helped meto typeset this study guide in LATEX, and participated in hunting for typos. I decided not topublish the list of names here because it either has to be too long to include everyone, or beunfair. Once again, I thank everyone who contributed to this manual and hope that ICEFreadership will appreciate our work. However, even after extensive checks, some typos mightbe skip our collective attention, and I encourage you to send your comments, suggestions, orsimply questions to the lecturer of the class by email.

D.D. Pervouchine

December 4, 2011

6

1 SYSTEMS OF LINEAR EQUATIONS

1 Systems of linear equations

1.1 A few introductory examples

Many problems in mathematics stem from the technique for solving systems of simultaneouslinear equations. This technique, called Gauss elimination, is one of the most frequently usedmethods in Linear Algebra.

Example 1.1. Find the set of solutions to the following system of linear equations{2x+ y = 3x+ 3y = 4

.

Solution. By expressing x from the first equation and substituting into the second equation,we get

x = 4− 3y

2(4− 3y) + y = 3

8− 6y − 6 = 3

y = 1, x = 1.

Essentially, we solved this problem by using equivalent transformations of linear equations.

Example 1.2. Solve the following system of linear equations.x+ 2y + 3z = 62x+ y + z = 4x− y + z = 1

Solution. Again, we express x and substitute into the second and the third equation.

x = 6− 2y − 3z,{2(6− 2y − 3z) + y + z = 46− 2y − 3z − y + z = 1

,{12− 3y − 5z = 46− 3y − 2z = 1

,{3y + 5z = 83y + 2z = 5

,

3z = 3, z = 1, x = 1, y = 1.

7

1.2 Transformations of matrix rows 1 SYSTEMS OF LINEAR EQUATIONS

One could notice that all these operations apply only to the coefficients at x, y, and z. Wethus can save some paper and ink by writing the system of the linear equations (SLE) in thefollowing extended matrix form 1 2 3 6

2 1 1 41 −1 1 1

,

in which the coefficients at x, y, and z form a square table, with the free terms being attachedto the right after a vertical bar.

In principle, matrices are rectangular tables of numbers. From now on, we reserve thecapital Latin letters (typically, from the beginning of the alphabet) to be matrix names, suchas

A =

a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...am1 am2 . . . amn

,

while the corresponding small letters denote matrix elements. The same notation is abbreviatedby A = (aij), where i and j are running indices which denote the position of the element aij inthe matrix (i and j are not equal to any number). The element aij is located in the intersectionof the ith row and the jth column.

1.2 Transformations of matrix rows

In what follows we will use the following types of transformations which are applied to matrixrows.

I. The jth row is multiplied by some constant c and added to the ith row.

II. The ith row is multiplied by a non-zero constant c.

III. The ith row and the jth row are interchanged.

Type I transformations will be denoted by the expression [i] 7→ [i] + [j] × c. For instance,[2] 7→ [2] − [1] × 2 means that the first row, multiplied by 2, is subtracted from the secondrow1. Note that in every Type I transformation only the ith row is affected, while the jth rowis unchanged. For instance, in [2] 7→ [2]− [1]× 2, the first row stays unchanged.

Type II transformations will be denoted by [i] 7→ [i] × c. For instance, [2] 7→ [2] × (−1)means that the second row is multiplied by −1. Multiplication of a row by a constant meansthat every element of that row is multiplied by the constant.

Similarly, Type III transformations are denoted by [i] ↔ [j]. The three types of transfor-mations and their application to systems of linear equations will be exemplified further in thenext section.

1Recall that the addition or subtraction of two rows means that their components sum up subtract at therespective positions.

8

1.3 Gauss elimination 1 SYSTEMS OF LINEAR EQUATIONS

1.3 Gauss elimination

The Gauss elimination algorithm is used very frequently for solving systems of linear equationsand other related problems. The strategy of Gauss elimination is to create zeroes under thematrix’s main diagonal2, thereby transforming the matrix into so called upper-triangular form.The elements below the diagonal of an upper-triangular matrix must be equal to zero. Similarly,all elements above the diagonal of a lower-triangular matrix must be equal to zero. The matrixwhich is both upper-triangular and lower-triangular is called diagonal matrix. All elementsoutside of the diagonal of a diagonal matrix are equal to zero. The n×n diagonal matrixwith the numbers λ1, . . . , λn on the diagonal will be denoted by diag(λ1, . . . , λn). The upper-triangular matrix is transformed further to obtain the vector of solutions, as will be shownbelow.

The essential steps of Gauss elimination are illustrated by the following examples. Notethat the elementary transformations described in section 1.2 do not change the set of solutionsto the system of linear equations and, thus, create equivalent system of equations at each step.

Example 1.3. Solve the following system of linear equations (same as in example 1.2)x+ 2y + 3z = 62x+ y + z = 4x− y + z = 1

.

Solution. 1 2 3 62 1 1 41 −1 1 1

[2] 7→[2]−[1]∗2−−−−−−−→

1 2 3 60 −3 −5 −81 −1 1 1

[2] 7→[2]∗(−1)−−−−−−−→

1 2 3 60 3 5 81 −1 1 1

[3]7→[3]−[1]−−−−−−→

1 2 3 60 3 5 80 −3 −2 −5

[3] 7→[3]+[2]−−−−−−→

1 2 3 60 3 5 80 0 3 3

[3] 7→[3]/3−−−−−→

1 2 3 60 3 5 80 0 1 1

[1] 7→[1]−[3]∗3−−−−−−−→

1 2 0 30 3 5 80 0 1 1

[2] 7→[2]−[3]∗5−−−−−−−→

1 2 0 30 3 0 30 0 1 1

[2] 7→[2]/3−−−−−→

1 2 0 30 1 0 10 0 1 1

[1] 7→[1]−[2]∗2−−−−−−−→

1 0 0 10 1 0 10 0 1 1

.

2The main diagonal of a matrix goes south-east from the upper left corner.

9

1.3 Gauss elimination 1 SYSTEMS OF LINEAR EQUATIONS

That is, the strategy of Gauss elimination is the following.

• Look at the first column and, if necessary, use a type III transformation to make theleft-upper-corner element be non-zero. This non-zero element will be called leading orpivot element. Proceed to the second column if all elements of the first column are equalto zero.

• For each of the rows below the row containing the leading element, use type I transfor-mations with suitable constants to make all elements below the pivot element be equalto zero.

• Proceed to the second column and, if necessary, use a type III transformation to makethe element in the position (2, 2) be non-zero. It will now be next pivot element. If allthe elements of the second column below the pivot element are equal to zero, proceed tothe next column.

• Repeat these steps with other columns until the matrix is in the upper-triangular form.

When transformed to the upper-triangular form, the matrix of a system of linear equationsdoes not immediately give you the set of solutions. For that we need backward steps of Gausselimination, which can be applied to the matrix with non-zero elements of the diagonal.

• Since we assumed that all the elements on the diagonal are non-zero, the element in theposition (n, n) is also non-zero. Therefore, a type II transformation can make it equal toone. Then, we can apply type I transformations with suitable λ’s to make all elementsabove the pivot element be equal to zero.

• Similarly, we can make all the elements outside of the diagonal be equal to zero, andall elements on the diagonal be equal to one. Then, the set of solutions appears in thecolumn of free terms. Note that this strategy works only when all the diagonal elementsafter the forward Gauss elimination were non-zero.

The following example shows that the pivot element does not have to be always equal to one,although it is very convenient not to deal with fractions in Gauss elimination.

Example 1.4. Solve the following system of linear equations2x+ 3y + z = 33x− y + 2z = 52x+ 2y + z = 3

.

Solution. 2 3 1 33 −1 2 52 2 1 3

[2]7→[2]−[1]∗ 32−−−−−−−−→

2 3 1 30 −11

212

12

2 2 1 3

[3] 7→[3]−[1]−−−−−−→

10

1.4 Geometric interpretation of SLE 1 SYSTEMS OF LINEAR EQUATIONS

2 3 1 30 −11

212

12

0 −1 0 0

[2]↔[3]−−−−→

2 3 1 30 −1 0 00 −11

212

12

[2] 7→[2]∗(−1)−−−−−−−→

2 3 1 30 1 0 00 −11

212

12

[3]7→[3]+[2]∗ 112−−−−−−−−→

2 3 1 30 1 0 00 0 1

212

[37→[3]∗2−−−−−→

2 3 1 30 1 0 00 0 1 1

[1] 7→[1]−[3]−−−−−−→

2 3 0 20 1 0 00 0 1 1

[1]7→[1]−[2]∗3−−−−−−−→

2 0 0 20 1 0 00 0 1 1

[1] 7→[1]/2−−−−−→

1 0 0 10 1 0 00 0 1 1

.

1.4 Geometric interpretation of SLE

The geometry standing behind the systems of linear equations is very simple and natural.Consider the system in example 1.1. The first equation, 2x+ y = 3, defines a line in xy-plane.The second equation, x + 3y = 4, defines another line. The set of solutions to the SLE is theset of points in xy-plane satisfying both equations, i.e., the intersection of the two lines. Inprinciple, there are three options in the mutual positioning of two lines: intersecting (genericcase), parallel, or coinciding. Thus a SLE with two variables may have either one, or zero,or infinitely many solutions. Similarly, the set of solutions to the SLE with three variablesshown in example 1.3 is the intersection of three planes in 3-D space. The generic case is againwhen three planes intersect by one point (the intersection of two planes is a line, and that lineintersects the third plane at one point), but there are cases with infinitely many solutions (thethree planes have a common line or coincide) or no solutions (parallel planes or lines). Someof these arrangements are shown in Figure 1.

Repeating the same argument for arbitrary number of variables (or reasoning by induction),we arrive to the following conclusion.

Theorem 1. A system of linear equations can have either no solutions, or just one solution,or infinite number of solutions.

The proof of this statement is not very difficult. However, we will need to introduce thenotion of a linear vector space to talk more generally about the set of solutions to an arbitrarysystem of linear equations. We will do it in the next section.

In both example 1.3 and example 1.4 we dealt with SLE with the unique solution, and thusthe final matrix of such SLE was diagonal. This is not true for degenerate systems (i.e., withno or infinitely many solutions). However, Gauss elimination is also applicable to degeneratesystems.

11

1.5 Exercises 1 SYSTEMS OF LINEAR EQUATIONS

(a) ∞ solutions (b) only 1 solution

(c) no solutions (d) also no solutions

Figure 1: The intersection of three planes in 3D may consist of zero, one, or infinitely manypoints.

Example 1.5. Solve the following system of linear equations{2x+ 3y + z = 32x+ 3y + 2z = 6

.

Solution. (2 3 1 32 3 2 6

)[2] 7→[2]−[1]−−−−−−→

(2 3 1 30 0 1 3

)[1] 7→[1]−[2]−−−−−−→

(2 3 0 00 0 1 3

).

We have x3 = 3 and x1 = −32x2. Here x2 can be considered as a parameter, and x1 can be

found when x2 is known.

Note that at the second step the matrix is also upper-triangular, but this time one “step” ofthe “ladder” is longer than in the former examples. This type of representation, in which thefirst non-zero element of each row appears to the right and down of the first non-zero elementof the previous row, is called matrix’s echelon form.

1.5 Exercises

Problem 1.1. Solve the following systems of linear equations.

12

2 LINEAR VECTOR SPACE

(a)

−3x1 + 3x3 = −3−x1 + 2x2 + 2x4 = −8−2x2 + 2x3 − 2x4 = 82x1 + 4x2 − 2x3 + 2x4 = 0

(b)

−5x1 − 2x2 + 4x3 + 3x4 = −14−3x1 − 4x2 − 4x3 = 10−5x1 + 4x2 + 4x3 − 4x4 = −18−2x2 + 3x3 + x4 = 1

(c)

4x1 − x2 + 2x3 − 3x4 − x5 = 10−x2 + 2x3 − 3x4 + x5 = 10−2x1 − 5x2 + 2x3 − 4x4 + 2x5 = 143x1 + 4x2 + 2x3 − x4 + 4x5 = −16x1 + 4x2 − 5x3 − x4 − 4x5 = 13

(d)

4x1 + 3x2 − 4x3 + 3x4 = −3−4x1 − 3x2 + 3x4 = 21−x1 + 2x2 + 4x3 + 3x4 = 1−5x1 − 4x2 − x3 + 4x4 = 28

(e)

3x1 − 5x2 + 3x3 + 4x4 = −274x1 − 3x2 − 2x3 − 3x4 = 04x1 − 4x2 + 3x4 = −18−x1 − 3x4 = 7

(f)

−3x1 + 3x3 − x4 = −33x1 − 2x2 − 2x3 + 4x4 = 204x1 + 4x2 + 2x3 + 3x4 = 25−4x1 + 2x2 − 4x3 − x4 = −39

(g)

x1 − 2x2 + 2x3 − 5x4 = 49x1 + 2x2 − 10x3 − 6x4 = 103x1 + 5x2 − 3x3 + 3x4 = 1111x1 + 5x2 + 10x4 = 0

(h)

3x1 + 11x2 + 7x3 − 9x4 = −112x1 + 10x2 − x3 + 4x4 = −812x1 + 3x2 − x4 = 34x1 + 2x2 + x3 + 2x4 = −10

(i)

5x1 + 10x2 + 11x3 + 9x4 = −105x1 − 4x2 + 12x3 + 6x4 = 33x1 + 5x2 − 3x3 + 3x4 = −66x1 + 3x2 − 5x3 + 5x4 = −7

(j)

x1 + 12x3 − 2x4 = −15x1 − 8x2 + 10x3 + x4 = 8x1 − 4x2 − 4x3 + 10x4 = −110x1 − 8x2 − 8x3 − x4 = −5

2 Linear vector space

2.1 Cartesian product of sets

Let A and B be two sets (not necessarily sets of numbers). We define the union A ∪ B,intersection A ∩B, difference A\B, and symmetric difference A4B of A and B to be

A ∪B = {x | x ∈ A orx ∈ B},

A ∩B = {x | x ∈ A andx ∈ B},

A\B = {x | x ∈ A andx /∈ B},

A4B = (A\B) ∪ (B\A).

Additionally, we define another operation on elementary sets called Cartesian product, whichdescribes the set that is composed of all ordered pairs of elements of A and B:

A×B = {(x, y) | x ∈ A, y ∈ B}.

For instance, if A = B = [0, 1] then A×B can be thought of as a square in xy-plane formedby the pairs (x, y) such that 0 ≤ x, y ≤ 1. If A = R and B = [0, 1] then A× B is a horizontalstripe of width 1 in y-direction. If A = R and B = Z then A × B is a grid of horizontal linescrossing y-axis at every integer number. Following this definition, the set formed by n-tuples

13

2.2 Linear vector space 2 LINEAR VECTOR SPACE

of real numbers, {(x1, . . . , xn) | ∀i : xi ∈ R}, is exactly the same set as Rn = R × · · · × R.Previously, we did not define functions assuming that a function is a “genuine” object whichdoesn’t need a definition. In fact, one can use Cartesian product to give a rigorous definitionof a function. This is achieved in two steps.

First, let us assume we are given two sets, X and Y , called the domain set and the imageset, respectively. We define a relation F on X and Y to be a generic subset of X × Y . Forinstance, the set B2(1) = {(x, y) | x2 + y2 ≤ 1}, which corresponds to a circle on the plane,can be considered to be a relation. However, it differs from our conventional understanding ofa function because for each given x such that (x, y) ∈ F , y can take more than one value. Nowwe will define a function or, more generally, a mapping3, to be a relation F on X and Y suchthat (x, y1) ∈ F and (x, y2) ∈ F necessarily implies y1 = y2. Restricting this construction ona subset of X, if necessary, we can also assume that for every x ∈ X there exists y ∈ Y suchthat (x, y) ∈ F . This fact is denoted by F : X → Y , which spells as “the function F maps theset X to the set Y ”.

The definition of a function ensures that there is one and only one y for every x such that(x, y) ∈ F . As you know, this y is denoted by y = F (x). It is a link from the formal set-theoretic construct, which defines a function to be a special class of relations, whereas in ourcolloquial understanding a function is the rule, which appoints to every value of x from thedomain one and only one value of y from the image. Functions can be of different nature,and the same set-theoretic formalism also allows defining functions of several variables. Forinstance, a function of two variables, x1 and x2, both from the set X, and taking values in theset Y can be thought of as F : X ×X → Y . We need functions to give a formal definition of alinear vector space.

2.2 Linear vector space

Definition 1. A vector space over the field of real numbers consists of three objects: (i) aset V called the ground set, (ii) the mapping f : V × V → V , which is a binary operation,and (iii) the mapping g : R × V → V . The mapping f is called addition; the mapping g iscalled multiplication by scalars. The mapping f(x, y) is usually denoted by the + sign, i.e.,f(x, y) = x+y, while x and y could be objects which you don’t normally add to each other (forinstance, students or teachers). Similarly, the mapping g(λ, x) is often denoted by g(λ, x) = λx,although it might have nothing to do with the conventional multiplication. The elements of Vwill be referred to as vectors although they might not be geometric vectors. The object (V,+, ∗)with the attached operations must obey the following axioms.

1. ∀x, y ∈ V : x+ y = y + x

2. ∀x, y, z ∈ V : (x+ y) + z = x+ (y + z)

3. ∃0 ∈ V : ∀x ∈ V : x+ 0 = 0 + x = x

4. ∀x ∈ V ∃(−x) ∈ V : x+ (−x) = (−x) + x = 0

3Speaking rigorously, the term “function” means a mapping which takes numeric values, while for generalmappings the image can be any set.

14

2.2 Linear vector space 2 LINEAR VECTOR SPACE

5. ∀x, y ∈ V, ∀λ ∈ R : λ(x+ y) = λx+ λy

6. ∀x ∈ V, ∀λ, µ ∈ R : λ(µx) = (λµ)x

The following examples can give you the idea which sets do form linear vector spaces, andwhich sets do not.

1. The set R forms a linear vector space with respect to conventional addition of numbersand multiplication by scalars.

2. The set R2 = {(x, y)} forms a linear vector space with respect to the component-wiseaddition and multiplication by scalars, i.e., (x1, x2) + (y1, y2) = (x1 + y1, x2 + y2) andλ(x1, x2) = (λx1, λx2).

3. Similarly, the set Rn forms a linear vector space with respect to component-wise additionand multiplication by scalars, i.e., (x1, . . . , xn)+(y1, . . . , yn) = (x1+y1, x2+y2, . . . , xn+yn)and λ(x1, . . . , xn) = (λx1, . . . , λxn);

4. The set Pdeg=n of n-th degree polynomials is not a linear vector space because 0 /∈ Pdeg=n.

5. The set Pdeg≤n of all polynomials with the degree not greater than n does form a linearvector space

6. The set C[a, b] of all continuous functions defined on [a, b] does form a linear vector space(the sum of continuous functions is again a continuous function).

7. The set Cn[a, b] of all real functions on the segment [a, b] which are continuous up to theirn-th derivative (in particular, n could be ∞) forms a linear vector space.

Note that in most cases the operation comes logically from the naturally existing addition andmultiplication by scalars, while being or not being a linear vector space depends on whether ornot these operations stay within the same set V , i.e., whether or not f(x, y) still belongs to V .This motivates the following definition.

Definition 2. Let (V,+, ∗) the linear vector space. The subset4 U ⊆ V is called a subspaceof V if (U,+, ∗) forms a linear vector space with respect to the operations + and ∗ that areinherited from V (by restriction on the smaller domain U). In other words, U is a subspace ofV if x, y ∈ U =⇒ x+ y ∈ U , x ∈ U =⇒ (−x) ∈ U , and λ ∈ R, x ∈ U =⇒ λx ∈ U.

Examples of subspaces of linear vector spaces are given below.

1. R ⊂ R2 can be considered as a linear subspace.

2. Pdeg≤n ⊂ C(−∞,+∞) is a linear subspace.

3. Cn+1[a, b] ⊂ Cn[a, b] is a linear subspace.

4. The set B2(1) defined on page 14 is not a linear subspace of R2.

4Please pay attention: U ⊆ V means that U is a subset of V , while U ⊂ V means that U is a proper subsetof V , i.e., U ⊆ V but U 6= V . Note that the difference is similar to “≤” and “<”.

15

2.3 Linear independence 2 LINEAR VECTOR SPACE

2.3 Linear independence

In this section we introduce the notion of linear dependence and linear independence, whichare critical to understand the rest of the course.

Definition 3. Let V be a linear vector space over the set of real numbers and let x1, x2, . . . , xnbe its vectors. Let also λ1, λ2, . . . , λn be a set of real numbers. The expression λ1x1 + λ2x2 +· · ·+ λnxn is called linear combination of vectors x1, x2, . . . , xn with factors λ1, λ2, . . . , λn. Thelinear combination is called trivial if λ1 = λ2 = · · · = λn = 0.

Definition 4. Let x1, x2, . . . , xn be a set of vectors in a linear space V . The set x1, x2, . . . , xnis called linearly dependent if there exist numbers λ1, λ2, . . . , λn, not all equal to zero, such thatλ1x1 + λ2x2 + · · · + λnxn = 0. In other words, the set x1, x2, . . . , xn is linearly dependent ifsome non-trivial linear combination of x1, x2, . . . , xn is equal to zero. The set of vectors that isnot linearly dependent is called linearly independent.

Example 2.1. Consider the following three vectors, x1 = (1, 2, 3), x2 = (0, 1, 2), x3 = (1, 4, 7).We need to check whether or not they are linearly independent. They are linearly dependentif there exist numbers λ1, λ2, λ3 such that λ1x1 + λ2x2 + λ3x3 = 0. That is, (λ1, 2λ1, 3λ1) +(0, λ2, λ2) + (λ3, 4λ3, 7λ3) = (0, 0, 0). This happens if and only if

λ1 + λ3 = 02λ1 + λ2 + 4λ3 = 03λ1 + 2λ2 + 7λ3 = 0

.

Solving this SLE by Gauss elimination, we get 1 0 1 02 1 4 03 2 7 0

[2] 7→[2]−[1]∗2−−−−−−−→

1 0 1 00 1 2 03 2 7 0

[3] 7→[3]−[1]∗3−−−−−−−→

1 0 1 00 1 2 00 2 4 0

[3]7→[3]∗ 12−−−−−→

1 0 1 00 1 2 00 1 2 0

[3] 7→[3]−[2]−−−−−−→

1 0 1 00 1 2 00 0 0 0

As you can see, the third equation degenerates to the equation 0 = 0, i.e., effectively we

have only two equations. This implies that SLE has infinitely many solutions. Hence, thereexists at least one non-trivial solution. Thus, we conclude that x1, x2, x3 is a linearly dependentset of vectors.

Definition 5. We say that the vector x is expressed linearly through the vectors x1, x2, . . . , xnif x = λ1x1 + λ2x2 + · · · + λnxn for some λ1, λ2, . . . , λn ∈ R. In other words, x is expressedlinearly through x1, x2, . . . , xn if x is equal to a linear combination of these vectors.

16

2.4 Linear hull and bases 2 LINEAR VECTOR SPACE

Theorem 2. The set x1, x2, . . . , xn is linearly dependent if and only if one of x1, x2, . . . , xn isexpressed linearly through the combination of other vectors.

Proof. Assume that xi is expressed linearly through x1, . . . , xi−1, xi+1, . . . , xn. That is, xi =λ1x1 + · · ·+ λi−1xi−1 + λi+1xi+1 + · · ·+ λnxn. But then it means that λ1x1 + · · ·+ λi−1xi−1 −xi + λi+1xi+1 + · · · + λnxn = 0, i.e., at least one non-trivial (the factor at xi is equal to −1)linear combination of x1, . . . , xn is equal to zero.

Conversely, let λ1x1 + λ2x2 + · · · + λnxn = 0 be a non-trivial linear combination, i.e.,λi 6= 0 for some i. Then −λix = λ1x1 + · · · + λi−1xi−1 + λi+1xi+1 + · · · + λnxn, and thusxi = λ1

−λix1 + · · · + λi−1

−λi xi−1 + λi+1

−λi xi+1 + · · · + λn−λixn, i.e., xi is expressed linearly through

x1, . . . , xi−1, xi+1, . . . , xn.

Let us comment on the properties of linearly dependent vectors.

1. If the set of vectors x1, x2, . . . , xn−1 is linearly dependent, then the set x1, x2, . . . , xn−1, xnis also linearly dependent.

2. If the set of vectors x1, x2, . . . , xn contains a zero vector then it is linearly dependent.

3. If the set of vectors x1, x2, . . . , xn is linearly independent, then the set x1, x2, . . . , xn−1

also must be linearly independent.

Spend some time with paper and pencil to prove these properties.

2.4 Linear hull and bases

Definition 6. Consider x1, x2, . . . , xn, a set of vectors in a linear vector space V . The setL(x1, x2, . . . , xn) = {λ1x1 + λ2x2 + · · · + λnxn | λ1, λ2, . . . , λn ∈ R} is called the linear hull ofx1, x2, . . . , xn. In other words, the linear hull of x1, x2, . . . , xn is the set of all their possiblelinear combinations.

Definition 7. The set x1, x2, . . . , xn of vectors in a linear vector space V is said to generate Vif L(x1, x2, . . . , xn) = V . In other words, x1, x2, . . . , xn generates V if every vector of V can beexpressed linearly through x1, x2, . . . , xn.

Remark. Note that the notion of linear independence defined in the previous section was notrelated to the enveloping vector space, i.e., the property of linear independence of a set ofvectors is their internal business. In contrast, the property to generate V has a lot to do withthe enveloping vector space.

Definition 8. The set x1, x2, . . . , xn of vectors of a linear vector space V is said to be a basein V if it generates V and is linearly independent. Sometimes a base is also called a basis.

Theorem 3. Let the sets x1, x2, . . . , xn and y1, y2, . . . , ym be two bases in a linear vector spaceV . Then n = m.

17

2.4 Linear hull and bases 2 LINEAR VECTOR SPACE

Proof. Assume the contrary, i.e., n 6= m. Without loss of generality, n > m. Then, every vectorx1, x2, . . . , xn can be expressed linearly through y1, y2, . . . , ym, as y1, y2, . . . , ym is a base. Thatis,

x1 = c11y1 + c12y2 + · · ·+ c1mym,x2 = c21y1 + c22y2 + · · ·+ c2mym,

. . .xn = cn1y1 + cn2y2 + · · ·+ cnmym.

Consider now an arbitrary linear combination of x1, x2, . . . , xn that is equal to zero. We getλ1x1 +λ2x2 + · · ·+λnxn = λ1(c11y1 +c12y2 + · · ·+c1mym)+λ2(c21y1 +c22y2 + · · ·+c2mym)+ · · ·+λn(cn1y1 +cn2y2 + · · ·+cnmym) = (c11λ1 +c21λ2 + · · ·+cn1λn)y1 +(c12λ1 +c22λ2 + · · ·+cn2λn)y2 +· · ·+ (c1mλ1 + c2mλ2 + · · ·+ cnmλn)ym = 0. This condition is equivalent to the following SLE

c11λ1 + c21λ2 + . . . + cn1λn = 0c12λ1 + c22λ2 + . . . + cn2λn = 0c1mλ1 + c2mλ2 + . . . + cnmλn = 0

,

which must have at least one non-zero solution since m < n by our supposition. This impliesthat x1, x2, . . . , xn cannot be linearly independent. Thus m = n.

Corollary 3.1. Let x = λ1x1 + λ2x2 + · · · + λnxn = µ1x1 + µ2x2 + · · · + µnxn be two lineardecompositions of the same vector x in the base x1, x2, . . . , xn. Then ∀i : λi = µi. That is, thecoefficients of a linear decomposition of a vector x in a given base are defined uniquely. Thesecoefficients are called the coordinates of the vector x in that base. Note that the coordinatesdepend on the order, in which x1, x2, . . . , xn are written.

Example 2.2. Find the coordinates of the vector x = (1, 2, 3) in the base e1 = (1, 1, 0),e2 = (1, 0, 1), and e3 = (0, 1, 1).

Solution. From now on we will follow the notation, in which the components of a vector arewritten in columns, not rows. It helps to save space on the page and also makes sense in termsof the formal framework, in which a row and a column are considered to be a particular caseof rectangular matrix. By definition, we get 1

23

= λ1

110

+ λ2

101

+ λ3

011

=

λ1 + λ2

λ1 + λ3

λ2 + λ3

.

We thus have the following SLE λ1 + λ2 = 1λ1 + λ3 = 2λ2 + λ3 = 3

,

which is solved by Gauss elimination 1 1 0 11 0 1 20 1 1 3

→ 1 1 0 1

0 −1 1 10 1 1 3

→ 1 1 0 1

0 −1 1 10 0 1 2

→18

2.5 Dimension 2 LINEAR VECTOR SPACE

1 1 0 10 1 0 10 0 1 2

→ 1 0 0 0

0 1 0 10 0 1 2

.

Thus, the coordinates of x in e1, e2, e3 are (0, 1, 2). Indeed, x = 0 · e1 + 1 · e2 + 2 · e3.

Example 2.3. Find the coordinates of the function f(x) = 1x2−5x+6

considered as a vector inthe respective linear vector space with the base e1 = 1

x−2and e2 = 1

x−3.

Solution.1

x2 − 5x+ 6=

λ1

x− 2+

λ2

x− 3=λ1(x− 3) + λ2(x− 2)

x2 − 5x+ 6.

In order for this to hold, the sum of free terms of the polynomial in the right-hand side ofthe equation must be equal to the free term of the polynomial in the right-hand side of theequation, and the same must hold for the terms containing x. That is,{

λ1 + λ2 = 0−3λ1 − 2λ2 = 1

.

Thus, we have (1 1 0−3 −2 1

)→(

1 1 00 1 1

)→(

1 0 −10 1 1

).

Thus, the coordinates of f(x) in e1, e2 are (−1, 1).

2.5 Dimension

The notion of dimension also plays a key role in many branches of mathematics. The intuitivemeaning of dimension is that the line is 1-dimensional, the plane is 2-dimensional, the space is3-dimensional. A bit of generalization takes us to a 4-D space if we agree to include time asthe fourth dimension. The dimensions five and higher are more difficult to imagine. However,in all cases the quantity standing behind the notion of dimension is the number of independentdirections you can go in your vector space.

Definition 9. The dimension of a vector space V is the number of vectors in its every base(see theorem 3). The dimension of a vector space V is denoted by dim(V ).

The dimension of a vector space can be either finite or infinite. In this course we deal withonly finite-dimensional spaces. For instance, the set of (not more than) quadratic polynomialsis three-dimensional because it spans over {1, x, x2}. The dimension of C[a, b], in contrast, isinfinite because the set {1, x, . . . , xn} is linearly independent for an arbitrarily large n.

Theorem 4. Let U and V be finite-dimensional vector spaces such that U ⊆ V . Then,dim(U) ≤ dim(V ) and dim(U) = dim(V ) if and only if U = V .

19

2.6 Exercises 2 LINEAR VECTOR SPACE

Proof. By theorem 3, if dim(U) = dim(V ) then every base e1, e2, . . . , en of U is also the baseof V . Then, we have U ⊆ V and V ⊆ U , i.e., U = V . Now consider the case when U ⊆ Vand U 6= V . Let e1, e2, . . . , en be the base of U . Since V \U 6= ∅, there exists at least oney ∈ V \U . At that, e1, e2, . . . , en and y is linearly independent because otherwise either y ∈ Uor e1, e2, . . . , en are linearly dependent. We now get the linear space U1 = L(e1, e2, . . . , en, y).Similarly, if U1 6= V then there exists z ∈ V which is not a linear combination of e1, e2, . . . , en,and y, so we now obtain U2 = L(e1, e2, . . . , en, y, z). For some k we will get Uk = V becausedim(V ) is finite and the growing chain of subspaces must end at some point. Thus, the base ofV consists of e1, e2, . . . , en and some other vectors, i.e., dim(U) < dim(V ).

The procedure described in this theorem is known as complementation or completion ofbases. It says that if one vector space is contained in another, then every base of the smallervector space can be complemented to be the base of the bigger vector space. Note that itdoesn’t work for infinite-dimensional vector spaces as, for instance, Pdeg≤n ⊂ C(−∞,+∞), buttheir dimensions are both infinite.

2.6 Exercises

Problem 2.1. Determine which of the following sets of vectors are linearly independent. If theset is linearly dependent, chose a set of base vectors and express the rest of the vectors throughthe base chosen.

(a) a1 =

20−3

1

, a2 =

−2

31−5

, a3 =

−3−5

4−1

, a4 =

4144

(b) a1 =

30−4−1−1

, a2 =

−3

1−5−3

3

, a3 =

−1

1−4

1−1

, a4 =

−2

0−5

3−4

(c) a1 =

−3−4

3

, a2 =

−1−3

4

, a3 =

43−3

, a4 =

−114

(d) a1 =

412−4

, a2 =

−2−1−4

3

, a3 =

3422

, a4 =

−5

22−3

, a5 =

−5

421

.

Problem 2.2. Find the coordinates of the vector x in the corresponding bases

20

3 FUNDAMENTAL SET OF SOLUTIONS

(a) e1 =

127

, e2 =

11−1

, e3 =

−1−2−8

, x =

−3−5

1

(b) e1 =

634

, e2 =

948

, e3 =

−4−2−3

, x =

01−4

(c) x =

3465

, e1 =

5122

, e2 =

1−4−5

0

, e3 =

42−1−1

, e4 =

−7−1−6−2

(d) x =

−6

21−1

, e1 =

3144

, e2 =

311−3

, e3 =

7−2

0−5

, e4 =

4−1

2−2

(e) x =

12−4

3

, e1 =

3527

, e2 =

−5−6

2−7

, e3 =

3458

, e4 =

11−7−3

3 Fundamental set of solutions

3.1 Homogeneity

We will now give a formal overview of the method for solving arbitrary systems of linearequations. Consider a system of linear equations with n variables and m equations of thefollowing general form.

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2...

... . . . ......

am1x1 + am2x2 + . . . + amnxn = bm

,

which is written in the extended matrix form asa11 a12 . . . a1n b1

a21 a22 . . . a2n b2...

... . . . ......

am1 am2 . . . amn bm

.

Definition 10. A system of linear equations is called homogeneous if b1 = b2 = · · · = bn = 0.Otherwise, it is called non-homogeneous. Homogeneous and non-homogeneous systems of linearequations will be abbreviated as HSLE and NHSLE, respectively.

21

3.2 Rank of a matrix 3 FUNDAMENTAL SET OF SOLUTIONS

Theorem 5. The set of solutions to a homogeneous system of linear equations is a linear vectorspace.

Proof. Let x = (x1, . . . , xn) and y = (y1, . . . , yn) be two solutions to a HSLE. Since ai1(x1 +y1) + · · ·+ ain(xn + yn) = (ai1x1 + · · ·+ ainxn) + (ai1y1 + · · ·+ ainyn) = 0 + 0 = 0 for all i, thenx + y is a solution, too. Similarly, λx = (λx1, . . . , λxn) and 0 = (0, . . . , 0) are also solutionsto the same SLE. Therefore, the set of solutions to a homogeneous system of linear equationsforms a linear vector space.

We will sometimes refer to a general solution to the system of linear equations as an expres-sion describing all possible solutions, as it may have many, while a particular solution refers toone particularly chosen solution.

Theorem 6. A general solution to NHSLE can be expressed as a particular solution to NHSLE+ general solution to the respective HSLE.

Proof. This statement follows from the observation that if x = (x1, . . . , xn) and y = (y1, . . . , yn)are two solutions to the NHSLE, then x − y is a solution to the HSLE. Indeed, ai1(x1 −y1) + · · · + ain(xn − yn) = (ai1x1 + · · · + ainxn) − (ai1y1 + · · · + ainyn) = bi − bi = 0 for alli. Similarly, if x = (x1, . . . , xn) is a solution to NHSLE and y = (y1, . . . , yn) is a solutionto HSLE, then x + y is a solution to NHSLE because ai1(x1 + y1) + · · · + ain(xn + yn) =(ai1x1 + · · ·+ ainxn) + (ai1y1 + · · ·+ ainyn) = bi + 0 = bi.

3.2 Rank of a matrix

Recall that by theorem 5 in the previous section, the set of solutions to a homogeneous systemof linear equations is a linear vector space. The dimension of this vector space is related to thedimension of the vector space generated by the rows of the matrix of SLE.

Definition 11. Let A be a m×n matrix. The maximum number of linearly independent rowsof A is called the rank of matrix A and is denoted by rk(A).

It follows immediately from this definition and from the Gauss elimination method that therank of a matrix is equal to the number of non-zero rows in the matrix echelon form (page 12).

Definition 12. Let A be a m×n matrix. The transpose to A is the matrix AT which is obtainedby writing the columns of A as rows or, equivalently, by writing the rows of A as columns. Thatis,

AT =

a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...am1 am2 . . . amn

, where A =

a11 a21 . . . am1

a12 a22 . . . am2...

... . . . ...a1n a2n . . . amn

.

It turns out that the maximum number of linearly-independent rows and the maximumnumber of linearly-independent columns are the same. That is, rk(A) is independent of trans-position of rows and columns, i.e., rk(AT ) = rk(A).

22

3.2 Rank of a matrix 3 FUNDAMENTAL SET OF SOLUTIONS

Consider a set of vectors a1 = (a11, a12, . . . , a1n), a2 = (a21, a22, . . . , a2n), am = (am1, am2, . . . , amn)in Rn formed by the rows of matrix A. Then, rk(A) = dimL(a1, a2, . . . , am). The following isthe fundamental theorem about the rank of a matrix, which defines SLE, and the dimension ofthe solution space.

Theorem 7. Let V be the linear space of solutions to the HSLE defined by the matrix A. Then,dim(V ) = n− rk(A).

Before we proceed with the proof of the theorem, let us consider the following example.

Example 3.1. Describe the set of solutions of{x1 + 2x2 + x4 = 0x1 + x2 − x3 = 0

.

Solution. The Gauss elimination(1 2 0 1 01 1 −1 0 0

)→(

1 1 −1 0 00 1 1 1 0

)→(

1 0 −2 −1 00 1 1 1 0

)brings us to the SLE defined by {

x1 − 2x3 − x4 = 0x2 + x3 + x4 = 0

.

Note that the variables x3 and x4 are “free”, i.e., they can take any values without restrictions,while x1 and x2 can be expressed through x3 and x4 as the underlined coefficients are not equalto zero. We thus can use the following strategy: give to one of the free variables, x3 or x4,the value of 1, and set the other free variables to zero. Then, x1 and x2 can be found fromthe matrix of SLE. Namely, the values of x1 and x2 stand in the corresponding column of thematrix, up to multiplication by -1. This is illustrated by the following schema.

x1 2 1x2 −1 −1x3 1 0x4 0 1

.

Theorem 7 says that dim(V ) = n − rk(A) = 4 − 2 = 2. On the other hand, the two vectorsfound above, e1 = (2,−1, 1, 0) and e2 = (1,−1, 0, 1) are linearly independent by construction.Therefore, the set of solutions spans on e1 and e2, i.e., every solution to the original system oflinear equations can be expressed as a linear combination of e1 and e2, i.e.,

x = λ1

2−1

10

+ λ2

1−1

01

.

23

3.3 Fundamental set of solutions 3 FUNDAMENTAL SET OF SOLUTIONS

Proof of theorem 7. First, note that elementary transformations of rows do not affect the set ofsolutions because they give an equivalent SLE’s on each step. It is enough to consider the caseof under-defined system (m < n), as the statement of the theorem is trivial for the defined andover-defined systems. Essentially, the idea of the proof follows the previous example, where weused Gauss elimination to transform the matrix A to the form

a11 a12 . . . a1m . . . a1n

a21 a22 . . . a2m . . . a2n

a31 a32 . . . a3m . . . a3n...

... . . . ......

am1 am2 . . . amm . . . amn

→∗ ∗ . . . ∗ ∗ . . . ∗0 ∗ . . . ∗ ∗ . . . ∗...

... . . . ......

...0 0 . . . ∗ ∗ . . . ∗0 0 . . . 0 0 . . . 0

,

if necessary exchanging the order of variables. Then, the dimension of the solution space isexactly the number of free variables, which is equal to the total number of variables minus thenumber of equations in the transformed matrix. The number of variables is equal to n, and thenumber of equations in the transformed matrix is equal to rk(A), i.e., dim(V ) = n− rk(A).

3.3 Fundamental set of solutions

The technique explained in example 3.1 in the previous section gave rise to an algorithm calledfundamental system of solutions. First, the matrix of a SLE is transformed to the echelon form.At that, the element of the matrix that were used to create zeros in other rows are called pivotingelements. The columns which do not contain pivoting elements are declared to correspond tofree variables. Then, one by one, we give the value of 1 to each of the free variables and assignzero to the rest of the free variables; the dependent variables are calculated from these data.Taking into account the conclusion of theorem 6, this also allows solving NHSLE, as we showbelow.

Example 3.2. Find the fundamental set of solutions to{x1 + 2x2 + x3 − x4 + x5 = 4x1 + x2 − x3 + 2x4 − x5 = 2

.

Solution. (1 2 1 −1 1 41 1 −1 2 −1 2

)→(

1 2 1 −1 1 40 1 2 −3 2 2

)→(

1 0 −3 5 −3 00 1 2 −3 2 2

)e1 e2 e3 NHSLE

x1 3 −5 3 0x2 −2 3 −2 2x3 1 0 0 0x4 0 1 0 0x5 0 0 1 0

24

3.4 Exercises 3 FUNDAMENTAL SET OF SOLUTIONS

First, we set x3 = 1, x4 = 0, and x5 = 0, and compute x1 and x2. The values for x1 and x2 areopposite to the numbers in the third column of the transformed matrix. Then, we set x3 = 0,x4 = 1, and x5 = 0, and compute x1 and x2 again. This time, x1 and x2 are the numbers thatare opposite to those in the fourth column. When we need a particular solution to NHSLE,we can set x3 = 0, x4 = 0, and x5 = 0 and compute x1 and x2 again, this time by using thenumbers to the right of the bar. Finally,

x = λ1

3−2

100

+ λ2

−5

3010

+ λ3

3−2

001

+

02000

.

.

3.4 Exercises

Problem 3.1. Find ranks of the following matrices

(a)

4 8 −4 16 −8−1 −2 1 −4 2

3 6 −3 12 −62 4 −2 8 −4

(b)

−2 −10 −10 2 4−2 −10 −10 2 4

1 5 5 −1 −22 10 10 −2 −4

(c)

9 10 −16 11 39 8 −8 7 −1

−22 −18 −20 15 114 −3 −1 12 −10

(d)

12 −2 −3 12 −212 8 3 12 −4−20 0 3 −20 4

12 −12 −9 12 0

(e)

4 −8 −4 −16 −12−4 8 4 16 12−3 6 3 12 9

3 −6 −3 −12 −9

(f)

0 1 −3 −1 −4

−16 16 4 0 −816 −13 −13 −3 −4−8 6 8 2 4

(g)

14 −14 4 22 14−8 −16 20 −4 −8

0 −14 13 5 06 −16 11 13 6

(h)

−3 −3 9 −15 −9

3 3 −9 15 95 5 −15 25 15−2 −2 6 −10 −6

(i)

5 −2 4 −4 −4

12 −9 16 −8 −14−1 −8 12 4 −8−9 12 −20 4 16

(j)

10 −14 −2 −6 −4−19 11 5 21 4−15 8 4 17 3

13 −13 −3 −11 −4

Problem 3.2. Find the fundamental system of solutions to the following systems of linearequations

(a){x1 − x2 + 5x3 − 13x4 = 3x1 + x2 − x3 + x4 = −3

25

4 THE DETERMINANT AND ITS APPLICATIONS

(b){

3x1 − 5x2 − 3x3 + 13x4 = 102x1 − x2 + 5x3 + 4x4 = −5

(c){

2x1 − x2 − 2x3 − 5x4 = −72x1 + x2 − 6x3 + x4 = −1

(d)

3x1 + x2 + x3 + 6x5 = −62x1 + 3x2 + 2x3 − 4x4 + 5x5 = −43x1 + 3x2 + 5x3 + 8x4 − 16x5 = −6

(e)

−2x2 + x3 − 7x4 − 4x5 = 7x1 + 2x2 − x3 + x4 − 2x5 = −73x1 + 4x2 − 2x3 − 4x4 − 10x5 = −14

(f)

x1 − 6x2 − 3x3 − 17x4 − 17x5 = −15x2 + 7x3 − 8x4 − 17x5 = −53x2 + 5x3 − 8x4 − 15x5 = −3

(g){

2x1 + x2 + 9x3 − 15x4 = 9x1 + 2x3 − 6x4 + x5 = 1

(h){x1 + x2 + 2x3 + 2x4 − 2x5 = 0x2 + 4x3 + 7x4 − 5x5 = −2

(i){

3x1 − 5x2 + 18x3 + x4 − 3x5 = 2x1 + x2 − 2x3 + 11x4 − x5 = 6

(j){x1 − x2 − 3x3 + 7x5 = 0x1 + 2x3 − x4 + 7x5 = 1

4 The determinant and its applicationsThe determinant of a n×n matrix A is a value that can be obtained by multiplying certainsets of entries of A, and adding and subtracting such products, according to the rules that arediscussed below.

4.1 Special cases 2×2 and 3×3The determinant det(A) of a 2×2 matrix A is defined by

det(A) =

∣∣∣∣ a11 a12

a21 a22

∣∣∣∣ = a11a22 − a12a21.

The vectors (a11, a12) and (a21, a22) form a parallelogram in the xy-plane with vertices at(0, 0), (a11, a12), (a21, a22), and (a11 +a21, a21 +a22). The absolute value of a11a22−a12a21 is the

26

4.2 Properties of the determinant 4 THE DETERMINANT AND ITS APPLICATIONS

area of the parallelogram. That is, the determinant of a 2×2 matrix is somewhat similar to thearea. The absolute value of the determinant together with the sign becomes the oriented areaof the parallelogram. The oriented area is the same as the usual area, except that it changesits sign when the two vectors forming the parallelogram are interchanged.

The determinant det(A) of a 3×3 matrix

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

is defined by det(A) = a11a22a33 + a12a23a31 + a21a32a13 − a31a22a13 − a12a21a33 − a11a23a32.

Similarly to the 2×2 case, the geometric interpretation of the determinant is the volume of acuboid spanning the vectors (a11, a12, a13), (a21, a22, a23), and (a31, a32, a33). Again, the absolutevalue of the determinant together with the sign gives the oriented volume of the cuboid.

The determinant expression grows rapidly with the size of the matrix (an n×n matrixcontributes n! terms). In the next section we give the formal expression for det(A) for arbitrarysize matrices, which subsumes these two cases, while for now we list the properties of thedeterminant, which actually determine it as a function uniquely, and continue the sequence ofterm “length-area-volume” to something that can be called n-dimensional volume.

Example 4.1. det

(1 23 4

)= 1× 4− 2× 3 = 4− 6 = −2

Example 4.2. det

1 2 34 5 67 8 9

= 1×5×9+4×8×3+2×6×7−7×5×3−4×2×9−1×8×6 =

45 + 96 + 84− 105− 72− 48 = 0

4.2 Properties of the determinant

Recall the three types of transformations we introduced to operate with matrices of linearequations (page 8). In type I transformation, the jth row is multiplied by some constant cand added to the ith row. In type II transformation, the ith row is multiplied by a non-zeroconstant c. Type III transformation interchanges the ith row and the jth row of the matrix.The following are the properties of the determinant.

1. det(A) doesn’t change under type I transformations, i.e., the addition of a row (or column),factored by a number, to another row (or column) doesn’t change det(A).

2. det(A) is multiplied by c under type II transformations, i.e., if a row or column is factoredby a number, then det(A) is multiplied by the same number.

3. det(A) is multiplied by -1 under type III transformations, i.e., det(A) changes sign if tworows or two columns are interchanged.

27

4.3 Formal definition 4 THE DETERMINANT AND ITS APPLICATIONS

These three properties hold for the area of a parallelogram and for the volume of a cuboid.They must also hold for higher-dimensional analogs of these measures, and these requirementsdefine det(A) uniquely up to a constant factor. If we additionally require that det(E) = 1,where

E =

1 0 . . . 00 1 . . . 0...

... . . . ...0 0 . . . 1

is the n×n identity matrix, then with the requirements (1)-(3) above it can be used as adefinition of the determinant.

4.3 Formal definition

We define a permutation (i1, i2, . . . , in) of the set of integer numbers (1, 2, . . . , n) as a mappingf : (1, 2, . . . , n) → (1, 2, . . . , n) such that no pair of numbers i and j are mapped to the samenumber, i.e., f(i) = f(j) implies i = j. It means that i1, i2, . . . , in is the same set of numberas 1, 2, . . . , n, simply written in a different order. For instance, the permutation (2, 3, 1, 4) ofnumbers (1, 2, 3, 4) is the mapping f such that f(1) = 2, f(2) = 3, f(3) = 1, and f(4) = 4,whereas if f(1) = f(2) then f is not a permutation.

Each permutation (i1, i2, . . . , in) contains orders and inversions. For instance, (2, 3, 1, 4)contains four orders, (2, 3),(2, 4), (3, 4), and (1, 4) and two inversions, (2, 1) and (3, 1). Aninversion happens when a bigger number precedes a smaller number; all other pairs are calledorders. We define the sign of a permutation, σ(i1, i2, . . . , in) be equal to −1 to the power of thenumber of inversions. For instance, σ(2, 3, 1, 4) = 1 and σ(2, 1, 3, 4) = −1.

The determinant of an arbitrary n×n matrix is defined by

det

a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...am1 am2 . . . amn

=∑

(i1,i2,...,in)

σ(i1, i2, . . . , in)a1i1a2i2 . . . anin ,

where the sum runs over all possible permutations of the set (1, 2, . . . , n). Clearly, there aren! = 1 · 2 · · · · · n such permutations and, thus, the general determinant expression contains n!terms. One can also say that the determinant of a matrix A = (aij) is the sum of products ofmatrix elements such that in every product term one and only one element is taken from eachrow and column.

Theorem 8. The determinant of an upper-triangular square matrix is equal to the product ofits diagonal entries.

28

4.4 Laplace’s theorem 4 THE DETERMINANT AND ITS APPLICATIONS

Proof. Let A be an upper-triangular matrix, i.e.,

A =

a11 a12 a13 . . . a1n

0 a22 a23 . . . a2n

0 0 a33 . . . a3n...

...... . . . ...

0 0 0 . . . ann

.

The sum of products of matrix elements such that one and only one element is taken fromeach row and column always contains an element located below the diagonal, all of which areequal to zero, with the exception of the term which is the product of the diagonal entries, i.e.,a11a22 . . . ann. This completes the proof.

4.4 Laplace’s theorem

The minor Mi,j of a matrix

A =

a11 . . . a1 j−1 a1 j a1 j+1 . . . a1n... . . . ...

......

...ai−1 1 . . . ai−1 j−1 ai−1 j ai−1 j+1 . . . ai−1n

ai1 . . . ai j−1 ai j ai j+1 . . . ai nai+1 1 . . . ai+1 j−1 ai+1 j ai+1 j+1 . . . ai+1n...

......

... . . . ...an1 . . . an j−1 an j an j+1 . . . ann

is defined to be the determinant of the (n-1)×(n-1) matrix that is obtained from A by removingthe ith row and the jth column:

Mij = det

a11 . . . a1 j−1 a1 j+1 . . . a1n... . . . ...

......

ai−1 1 . . . ai−1 j−1 ai−1 j+1 . . . ai−1n

ai+1 1 . . . ai+1 j−1 ai+1 j+1 . . . ai+1n...

...... . . . ...

an1 . . . an j−1 an j+1 . . . ann

The expression Mij is also known as algebraic complement to aij. Then the determinant of Ais given by the following Laplace’s formula.

Theorem 9.

det(A) =n∑i=1

(−1)i+jaijMij (decomposition by the j-th column)

det(A) =n∑j=1

(−1)i+jaijMij (decomposition by the i-th row)

29

4.5 Minors and ranks 4 THE DETERMINANT AND ITS APPLICATIONS

The Laplace’s theorem gives a useful tool for the calculation of determinants, one in whichthe nth order determinant is expressed through lower order minors recursively. Below we gothrough examples of these recursive calculations.

Example 4.3.

∣∣∣∣∣∣∣∣1 0 1 22 0 1 1−1 −1 3 1

2 0 1 2

∣∣∣∣∣∣∣∣ = (−1)1+2× 0×

∣∣∣∣∣∣2 1 1−1 3 1

2 1 2

∣∣∣∣∣∣+ (−1)2+2× 0×

∣∣∣∣∣∣1 1 2−1 3 1

2 1 2

∣∣∣∣∣∣+(−1)2+3 × (−1)×

∣∣∣∣∣∣1 1 22 1 12 1 2

∣∣∣∣∣∣+ 0 = 2 + 2 + 4− 4− 1 = −1

This example shows that the Laplace’s formula is most useful for decomposition by a columnor a row which contains many zeros. If there is no such row or column, you may to create itby using type I transformations as they don’t affect the determinant.

Example 4.4.

∣∣∣∣∣∣∣∣1 1 1 −12 1 1 23 −1 1 15 2 1 −1

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 1 1 −11 0 0 33 −1 1 15 2 1 −1

∣∣∣∣∣∣∣∣ = 1× (−1)2+1×

×

∣∣∣∣∣∣1 1 −1−1 1 1

2 1 −1

∣∣∣∣∣∣+0×(−1)2+2

∣∣∣∣∣∣1 1 −13 1 15 1 −1

∣∣∣∣∣∣+0×(−1)2+3

∣∣∣∣∣∣1 1 −13 −1 15 2 −1

∣∣∣∣∣∣+3×(−1)2+4

∣∣∣∣∣∣1 1 13 −1 15 2 1

∣∣∣∣∣∣ =

−2 + 30 = 28.

4.5 Minors and ranks

The notion of a minor can be used to assess the rank of an arbitrary n×m matrix. Consider twosets of indices, 1 ≤ i1 ≤ i2 ≤ · · · ≤ ik ≤ n and 1 ≤ j1 ≤ j2 ≤ · · · ≤ jk ≤ m, and consider thesubmatrix of A which consists of the elements located on the intersection of rows i1, i2, . . . ikand j1, j2, . . . jk:

ai1j1 ai1j2 . . . ai1jkai2j1 ai2j2 . . . ai2jk...

... . . . ...aikj1 aikj2 . . . aikjk

.

Its determinant is called a kth order minor of the matrix A. For instance, if

A =

1 −1 −8 1 20 8 −4 −5 6−6 9 0 6 −1

2 1 2 3 6

,

(i1, i2, i3) = (1, 2, 4), and (j1, j2, j3) = (2, 3, 5), then the corresponding 3rd order minor is∣∣∣∣∣∣−1 −8 2

8 −4 61 2 6

∣∣∣∣∣∣ = 24− 48 + 32 + 8 + 6 + 384 + 12 = 418.

30

4.6 Cramer’s rule 4 THE DETERMINANT AND ITS APPLICATIONS

Theorem 10. Let A be a n×m matrix. If at least one kth order minor of A is not equal to zerothen rk(A) ≥ k. If all kth order minors of A are equal to zero then rk(A) < k.

Example 4.5.

rk

2 1 0 0 3 7 −21 0 1 0 −1 2 −1−3 0 0 1 2 3 1

= ?

Solution. The matrix contains the minor

∣∣∣∣∣∣1 0 00 1 00 0 1

∣∣∣∣∣∣ = 1 6= 0. Since the identity matrix has

rank three, the rank of the original matrix is not smaller than three. Hence, it is equal to three.

4.6 Cramer’s rule

Consider the general form of a system of n linear equations with n variablesa11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2...

......

...an1x1 + an2x2 + . . . + annxn = bn

,

and denote ∆ = det

a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...an1 an2 . . . ann

,

∆1 = det

b1 a12 . . . a1n

b2 a22 . . . a2n...

... . . . ...bn an2 . . . ann

, ∆2 = det

a11 b1 . . . a1n

a21 b2 . . . a2n...

... . . . ...an1 bn . . . ann

, and so on for all

columns, where the b-column replaces the ith column in ∆i.

Theorem 11 (Cramer’s rule). Let (x1, x2, . . . , xn) be a solution to the system of linear equationsgiven above. If ∆ 6= 0 then (x1, x2, . . . , xn) is unique and xi = ∆i

∆for all i.

Example 4.6. Consider the system of linear equations given by(2 3 52 −1 1

)Solution.

∆ =

∣∣∣∣ 2 32 −1

∣∣∣∣ = −2− 6 = −8,

∆1 =

∣∣∣∣ 5 31 −1

∣∣∣∣ = −5− 3 = −8,

31


∆2 =

∣∣∣∣ 2 52 1

∣∣∣∣ = 2− 10 = −8,

x1 = ∆1

∆= −8−8

= 1, x2 = ∆2

∆= −8−8

= 1.

Example 4.7. Consider the system of linear equations given byx1 + 2x2 + x3 = 3x1 − x2 + 2x3 = 03x1 − x2 + 7x3 = 2

Solution. 1 2 1 31 −1 2 03 −1 7 2

∆ =

∣∣∣∣∣∣1 2 11 −1 23 −1 7

∣∣∣∣∣∣ = −7 + 12− 1 + 4− 14 + 2 = −5,

∆1 =

∣∣∣∣∣∣3 2 10 −1 22 −1 7

∣∣∣∣∣∣ = −21 + 8 + 0 + 2− 0 + 6 = −11 + 6 = −5,

∆2 =

∣∣∣∣∣∣1 3 11 0 23 2 7

∣∣∣∣∣∣ = 0 + 18 + 2− 0− 21− 4 = −5,

∆3 =

∣∣∣∣∣∣1 2 31 −1 03 −1 2

∣∣∣∣∣∣ = −2 + 0− 3 + 9− 4− 0 = 0,

x1 =∆1

∆=−5

−5= 1,

x2 =∆2

∆=−5

−5= 1,

x3 =∆3

∆=

0

−5= 0.

An extension of the Cramer’s rule is sometimes used to distinguish between the case ofno solutions and the case of infinitely many solutions when the system of linear equations isdegenerate.

Example 4.8.{x1 + x2 = 2x1 + x2 = 3

32


Solution. This system of linear equations has no solutions since 2 6= 3. Look what happenswhen we apply Cramer’s rule. (

1 1 21 1 3

)∆ = 0,

∆1 =

∣∣∣∣ 2 13 1

∣∣∣∣ 6= 0.

This situation corresponds to having two parallel lines, x1 + x2 = 2 and x1 + x2 = 3, which“intersect” at x =∞, and thus the Cramer’s rule gives ∆1/∆ = ∆1/0 =∞.

Example 4.9.{x1 + x2 = 12x1 + 2x2 = 2

Solution. This system of linear equations has infinitely many solutions since x1 + x2 = 1 and2x1 + 2x2 = 2 have the same set of solutions. Then,(

1 1 12 2 2

)∆ = 0,

∆1 = ∆2 =

∣∣∣∣ 1 12 2

∣∣∣∣ = 0

Thus, we have infinitely many solutions, as hinted by the formal application of Cramer’s rule,which gives “x = 0/0”.

Theorem 12 (Kronecker-Capelli). A system of linear equations has a solution if and only ifthe rank of the coefficient matrix A is equal to the rank of the extended matrix (A|b).

Proof. Assume that the system of linear equations has at least one solution. It means that bis expressed as a linear combination of columns of A and, thus, rk(A|b) must be the same asrk(A) since linear combination doesn’t change the rank. Vice versa, assume that rk(A|b) =rk(A) = k. It means that the last equation in the echelon form of the extended matrix isakxk + ak+1xk+1 + · · ·+ anxn = bk. We can set xk+1 = · · · = xn = 0 and use theorem 11 to findthe solution for x1, . . . , xk as rk(A) = k and the corresponding minor is non-zero.

Corollary 12.1. Consider the notation of theorem 11. If ∆ = 0 but ∆i 6= 0 for some i thenthe system of linear equations has no solutions.

The connection between the number of solutions to a system of linear equations and thedeterminant of the matrix that system leads us to the following important definition

Definition 13. A square n×n matrix A is called regular (or non-degenerate) if det(A) 6= 0.Otherwise, it is called singular (or degenerate).

33

4.7 Exercises 4 THE DETERMINANT AND ITS APPLICATIONS

Out of these four terms, we will preferentially use the term non-degenerate when describingthe properties of matrix determinant. The term itself stems from the property of the system oflinear equations, which degenerates when some its equations become linearly dependent. If youpick a set of n2 numbers at random and put them into a square table, in “most cases” (that is,with probability 1) it will give a non-degenerate matrix because a set of 3D planes in generalposition intersects by one point, but “sometimes” (with probability 0) it will become special,i.e. singular, and give many or no solutions. This explains the other pair of terms. It followsimmediately from definition 13 and theorem 12 that

Corollary 12.2. A system of linear equations with n variables and n equations has a uniquesolution if and only if its matrix is non-degenerate.

Corollary 12.3. A n×n matrix A is non-degenerate if and only if A has the maximum possiblerank, i.e., if rk(A) = n.

Corollary 12.4. A n×n matrix A is non-degenerate if and only if its rows (or columns) arelinearly independent.

4.7 Exercises

Problem 4.1. Evaluate the following determinants

(a)

∣∣∣∣∣∣∣∣∣∣−4 −4 0 0 4

6 −3 −12 8 40 −9 −12 0 06 4 12 8 2

12 6 12 12 0

∣∣∣∣∣∣∣∣∣∣(b)

∣∣∣∣∣∣9 −7 −2

−10 −12 24 −11 −1

∣∣∣∣∣∣ (c)

∣∣∣∣∣∣9 4 1

−12 −4 4−12 0 12

∣∣∣∣∣∣

(d)

∣∣∣∣∣∣∣∣∣∣1 4 11 −9 4−1 −4 −2 6 0−9 −4 −2 −2 0−8 1 5 7 −4

7 −4 −11 5 0

∣∣∣∣∣∣∣∣∣∣(e)

∣∣∣∣∣∣0 0 −61 −1 8

10 −2 4

∣∣∣∣∣∣ (f)

∣∣∣∣∣∣−2 11 1

1 9 −82 6 −10

∣∣∣∣∣∣

(g)

∣∣∣∣∣∣∣∣∣∣0 3 0 −11 −1

12 −2 −2 10 2−6 −3 0 7 −4−6 −4 1 12 7

6 −2 −1 10 −10

∣∣∣∣∣∣∣∣∣∣(h)

∣∣∣∣∣∣8 −2 1−9 3 1212 −3 3

∣∣∣∣∣∣ (i)

∣∣∣∣∣∣−10 4 0

1 −7 −1−7 −10 −2

∣∣∣∣∣∣(j)

∣∣∣∣∣∣∣∣7 10 −4 −41 −12 8 6−3 6 12 6−5 −8 12 8

∣∣∣∣∣∣∣∣ (k)

∣∣∣∣∣∣5 11 7−2 3 −10−4 −4 −10

∣∣∣∣∣∣ (l)

∣∣∣∣∣∣∣∣3 −4 0 −6

12 1 0 −2−12 −2 0 0−6 10 4 −1

∣∣∣∣∣∣∣∣Problem 4.2. Evaluate ranks of the following matrices by any appropriate methods.

34

5 MATRIX ALGEBRA

(a)

12 0 7 5 424 0 14 10 8−8 16 −2 −6 0−12 −6 −8 −4 −5

(b)

−20 −10 13 −1 13−12 −9 12 0 9

12 4 −5 1 −720 −10 15 5 −5

(c)

16 18 −6 −5 8−16 −18 6 8 −14

19 12 −8 3 −85 −9 −17 −14 −3

(d)

3 −2 −4 2 0

−10 2 8 −4 2−19 −6 4 −2 8−17 −12 −4 2 10

(e)

−15 −11 9 0 −16−8 −10 6 −2 −8−22 −12 12 2 −24

5 14 −6 5 4

(f)

−12 −8 −16 5 −11

6 −2 11 2 −2−10 6 −18 8 10−3 −11 −1 −4 −17

Problem 4.3. Solve the following SLE’s by using the Cramer’s rule, when possible.

(a)

x1 + 3x2 + 2x3 = 6−4x1 + x2 + 3x3 = 7−5x1 + x2 − x3 = −10

(b)

−3x1 + 4x2 − 5x3 = −2−4x1 − 2x2 − 5x3 = 74x1 + x2 − 5x3 = 25

(c)

−x1 + x2 + 2x3 = −54x1 + 3x2 − 5x3 = 12−4x1 − 5x2 = 11

(d)

−5x1 − 3x2 + 2x3 − 3x4 = −112x1 + x2 + 4x3 + 2x4 = −104x1 − 4x2 − x3 + 4x4 = −26−5x1 + 4x2 + 2x3 + 2x4 = −8

(e)

3x2 − 5x3 − 2x4 = −243x1 + x2 + 3x3 + 2x4 = 1−5x1 + x2 − 3x3 − 3x4 = 74x1 + x2 + 2x3 − 5x4 = −22

(f)

−4x1 − 4x2 − 5x3 = −6−3x1 + 2x2 − 5x3 = 13−x1 − 3x2 + x3 = −12

(g)

−4x1 − 2x3 = −4−3x1 − 2x2 − 3x3 = 1−x1 − 2x2 + x3 = 3

(h)

3x1 − x2 − 5x3 + 4x4 = −2−x1 − 3x2 − 3x3 = 4x2 − 4x3 − 5x4 = −112x1 + x2 − x3 + 3x4 = −2

5 Matrix algebra

5.1 Matrix operations

The set of n×m matrices Mn,m(R) is naturally equipped with the structure of a linear vectorspace with respect to the operation of matrix addition that is defined below. Let A,B ∈Mn,m(R),

A =

a11 a12 . . . a1m

a21 a22 . . . a2m...

... . . . ...an1 an2 . . . anm

, B =

b11 b12 . . . b1m

b21 b22 . . . b2m...

... . . . ...bn1 bn2 . . . bnm

.

35

5.1 Matrix operations 5 MATRIX ALGEBRA

The matrices A+ B and λA are defined by component-wise addition of elements of A andB and component-wise multiplication of elements of A by λ, respectively, i.e.,

A+B =

a11 + b11 a12 + b12 . . . a1m + b1m

a21 + b21 a22 + b22 . . . a2m + b2m...

... . . . ...an1 + bn1 an2 + bn2 . . . anm + bnm

,

λA =

λa11 λa12 . . . λa1m

λa21 λa22 . . . λa2m...

... . . . ...λan1 λan2 . . . λanm

Example 5.1.

A =

(0 1 23 4 5

), B =

(2 1 00 0 1

), A+B =

(2 2 23 4 6

).

The following properties of matrix addition and scaling are evident.

1. ∀A,B,C ∈Mn,m(R) (A+B) + C = A+ (B + C)

2. ∃0 ∈Mn,m(R) : ∀A ∈Mn,m(R) A+ 0 = 0 + A = A

3. ∀A ∈Mn,m(R) ∃(−A) ∈Mn,m(R) : A+ (−A) = (−A) + A = 0

4. A+B = B + A for ∀A,B ∈Mn,m(R)

5. ∀λ ∈ R ∀A,B ∈Mn,m(R) λ(A+B) = λA+ λB

6. ∀λ, µ ∈ R ∀A ∈Mn,m(R) (λµ)A = λ(µ)A

Thus, Mn,m(R) is a linear vector space5 with dimMn,m(R) = nm. In contrast to matrixaddition, matrix multiplication is generally defined for matrices of different sizes.

Let A ∈Mn,k(R) and B ∈Mk,m(R). Note that the number of columns of the first matrix isequal to the number of rows of the second matrix. Such matrices are called compatible. Then,the matrix C = A×B = (cij) is defined by the formula

cij =k∑s=1

ais × bsj.

Example 5.2.

A =

(1 23 4

), B =

(1 2 34 5 6

)5Can you find a natural base in this space?

36

5.2 Inverse matrix 5 MATRIX ALGEBRA

c11 =2∑s=1

a1sbs1 = a11b11 + a12b21 = 1× 1 + 2× 4 = 9,

c12 =2∑s=1

a1sbs2 = a11b12 + a12b22 = 1× 2 + 2× 5 = 12,

. . .

c23 =2∑s=1

a2sbs3 = a21b13 + a22b23 = 3× 3 + 4× 6 = 33,

A×B =

(9 12 15

19 26 33

).

Matrix multiplication is a binary operation which can be though of as a mapping f :Mn,k(R)×Mk,m(R)→Mn,m(R) with the following properties.

1. ∀A ∈Mn,k(R)∀B ∈Mk,l(R)∀C ∈Ml,m(R) (A×B)× C = A× (B × C)

2. ∀A ∈Mn,k(R)∀B,C ∈Mk,m(R) A× (B + C) = A×B + A× C

3. ∀λ ∈ R ∀A ∈Mn,k(R)∀B ∈Mk,m(R) (λA)×B = λ(A×B)

Note that, in general, A×B 6= B×A, i.e., matrix multiplication is not commutative. This is themost striking difference of matrix multiplication from the multiplication of numbers. So, fromnow on we will pay special attention to the order of terms in matrix products. For instance,we distinguish between left and right multiplication, where A multiplied by B from the rightmeans A×B, while A multiplied by B from the left means B × A.

Example 5.3.

A =

(0 10 0

)B =

(0 00 1

)A×B =

(0 10 0

)B × A =

(0 00 0

)Lemma 13. (A×B)T = BT × AT

Proof. Consider A × B = C = (cij). That is, cij =∑

k aikbkj. Then, CT = (cji) andcji =

∑k ajkbki =

∑k bkiajk =

∑k bikakj, where bij and aij are the elements of BT and AT ,

respectively. Therefore, CT = BT × AT .

5.2 Inverse matrix

In this section we consider only square matrices. Let A ∈Mnn(R) be such a matrix. The notionof inverse matrix is very similar to the notion of inverse (reciprocal) number. The reciprocal

37

5.3 Inverse matrix by Gauss elimination 5 MATRIX ALGEBRA

to x is x−1, which is defined as a number such that x× x−1 = 1. For matrices, the role of 1 isplayed by the identity matrix

En =

1 0 . . . 00 1 . . . 0...

... . . . ...0 0 . . . 1

∈Mnn(R),

which is the neutral element of matrix multiplication, i.e., A × E = E × A = A for everyA ∈Mnn(R).

Definition 14. Let A be a square n×n matrix. The matrix A−1 ∈Mnn(R) is called the inversematrix for A if A× A−1 = A−1 × A = E, where E is the identity matrix.

Lemma 14. (A×B)−1 = B−1 × A−1.

Proof. (B−1 ×A−1)× (A×B) = B−1 × (A−1 ×A)×B = B−1 ×B = E. Similarly, (A×B)×(B−1 × A−1) = A× (B ×B−1)× A−1 = A× A−1 = E.

Lemma 15. (AT )−1 = (A−1)T .

Proof. According to lemma 13, (A−1)T×AT = (A×A−1)T = E and AT×(A−1)T = (A−1×A)T =E. Thus, (A−1)T is the inverse to AT .

5.3 Inverse matrix by Gauss elimination

The following theorem gives the necessary and sufficient condition of existence of the inversematrix.

Theorem 16. A−1 exists if and only if det(A) 6= 0

Proof. The necessary part follows from the definition: det(A × A−1) = det(A) det(A−1) = 1and, therefore, det(A) cannot be equal to zero.

Now we prove the sufficient condition. Assume that

A =

a11 a12 . . . a1m

a21 a22 . . . a2n...

... . . . ...an1 an2 . . . ann

, A−1 =

b11 b12 . . . b1m

b21 b22 . . . b2n...

... . . . ...bn1 bn2 . . . bnn

,

and C = A×B. We need C = E. Thus, for the first column of C we get

c11 = a11b11 + a12b21 + · · ·+ a1nbn1 = 1,

c21 = a21b11 + a22b21 + · · ·+ a2nbn1 = 0,

cn1 = an1b11 + an2b21 + · · ·+ annbn1 = 0,

38

5.3 Inverse matrix by Gauss elimination 5 MATRIX ALGEBRA

That is, in order to solve for b11, . . . , bn1, we have to solve the SLEa11 a12 . . . a1n 1a21 a22 . . . a2n 0

...... . . . ... 0

an1 an2 . . . ann 0

Similarly, the second column of C gives

c12 = a11b12 + a12b22 + · · ·+ a1nbn2 = 0,

c22 = a21b12 + a22b22 + · · ·+ a2nbn2 = 1,

cn2 = an1b12 + an2b22 + · · ·+ annbn2 = 0,

i.e., in order to get b12, . . . , bn2, we need to solve fora11 a12 . . . a1n 0a21 a22 . . . a2n 1

...... . . . ... 0

an1 an2 . . . ann 0

.

A similar SLE can be written for b1n, . . . , bnn. Note that all these SLEs have the same matrixand differ only by the column of free terms. Thus, we can solve them all simultaneouslybecause the sequence of operations that are applied to the matrix of SLE will be the same foreach column in the right-hand side. In other words, we can write

a11 a12 . . . a1n 1 0 . . . 0a21 a22 . . . a2n 0 1 . . . 0

...... . . . ...

...... . . . 0

an1 an2 . . . ann 0 0 . . . 1

,

or, shortly, (A|E) and apply Gauss elimination procedure to transform A to E. Then, thematrix on the right of the vertical bar will be A−1. According to theorem 11, if det(A) 6= 0then the solution (A|E) exists and is unique.

Remark. The proof of this theorem can be used to calculate A−1 numerically.

Example 5.4. Given A =

(2 33 5

), compute A−1

Solution. (2 3 1 03 5 0 1

)→(

1 2 −1 12 3 1 0

)→(

1 2 −1 10 −1 3 −2

)→

(1 2 −1 10 1 −3 2

)→(

2 3 1 03 5 0 1

)→(

1 0 5 −30 1 −3 2

)39

5.4 Inverse matrix by algebraic complements 5 MATRIX ALGEBRA

Verify by multiplication:(

2 33 5

)×(

5 −3−3 2

)=

(1 00 1

).

That is, A−1 =

(5 −3−3 2

).

Example 5.5. Given A =

2 1 −1−1 1 2

0 3 1

, find A−1.

Solution. 2 1 −1 1 0 0−1 1 2 0 1 0

0 3 1 0 0 1

→ 1 −1 −2 0 −1 0

0 3 3 1 2 00 3 1 0 0 1

→ 1 −1 −2 0 −1 0

0 1 1 13

23

00 0 1 1

21 −1

2

→ 1 −1 0 1 1 −1

0 1 0 −16−1

312

0 0 1 12

1 −12

→ 1 0 0 5

623−1

2

0 1 0 −16−1

312

0 0 1 12

1 −12

.

Check: 2 1 −1−1 1 2

0 3 1

× 5

623−1

2

−16−1

312

12

1 −12

=

1 0 00 1 00 0 1

..

5.4 Inverse matrix by algebraic complements

In this section we discuss the algorithm for calculation of A−1 which is based on the use ofdeterminants. The inverse matrix can be calculated in five steps.

1. Calculate det(A). Proceed to the next step unless det(A) = 0.

2. For each i and j, compute the value of algebraic complement Mij (see definition in sec-tion 4.4 on page 29) and collect them in a new matrix.

3. Transpose the matrix obtained in step 2.

4. Multiply each element of the transpose matrix by (−1)i+j, i.e., change signs according tothe chess-board pattern:

+ − + − +− + − + −+ − + − +− + − + −

5. Divide each element of the obtained matrix by det(A)

40

5.5 Matrix equations 5 MATRIX ALGEBRA

Example 5.6. Find the inverse matrix for A =

1 2 12 1 01 0 3

Solution. 1. det A= 3+0+0-1-12-0=-10

2. Compute algebraic complements (also called cofactor matrix)

M11 =

∣∣∣∣ 1 00 3

∣∣∣∣ = 3, M12 = 6, M13 = −1

M21 =

∣∣∣∣ 2 10 3

∣∣∣∣ = 6, M22 = 2, M23 = −2

M31 =

∣∣∣∣ 2 11 0

∣∣∣∣ = −1, M32 = −2, M33 = −3

3. Transpose 3 6 −16 2 −2−1 −2 −3

T

=

3 6 −16 2 −2−1 −2 −3

.

4. Change signs 3 −6 −1−6 2 2−1 2 −3

.

5. Divide each element by det(A)

A−1 =

−0.3 0.6 0.10.6 −0.2 −0.20.1 −0.2 0.3

..

6. Check: 1 2 12 1 01 0 3

× −0.3 0.6 0.1

0.6 −0.2 −0.20.1 −0.2 0.3

=

1 0 00 1 00 0 1

.

5.5 Matrix equations

A matrix equation is written similarly to a numeric equation, like AX = B, with the exceptionthat A and B are matrices, and the unknown X is a matrix, too (for simplicity of notation, weomit the multiplication sign, ×, as we often do for multiplication of numbers). When you solvenumeric equations, for instance 2x = 3, you simply divide the right-hand side by the factor inthe left-hand side, i.e., x = 3/2. Division is not that easy for matrices, because it is equivalent

41

5.5 Matrix equations 5 MATRIX ALGEBRA

to multiplication by the inverse matrix, which can be done from the right or from the left, eachwith different outcomes.

In order to “cancel out” the matrix A in the left-hand side, one needs to multiply the matrixequation AX = B by A−1 from the left, not from the right. Then A−1AX = EX = X = A−1B.Similarly, the solution to the equation XA = B is X = XAA−1 = BA−1.

Example 5.7. Solve the matrix equation 5 −1 35 0 46 −1 4

X =

−2 44 66 0

.

Solution. First, we calculate the inverse matrix

A−1 =

4 1 −44 2 −5−5 −1 5

.

Then,

X = A−1B =

4 1 −44 2 −5−5 −1 5

× −2 4

4 66 0

=

−28 22−30 28

36 −26

.

One can check by direct calculation that 5 −1 35 0 46 −1 4

× −28 22−30 28

36 −26

=

−2 44 66 0

.

There is another way to solve matrix equations, one that is based on the direct calculations.Consider, for instance, the matrix equation AX = B. Similarly to what we have been doingin section 5.3, the matrix product of A with the first column of X gives the following linearequations

a11x11 + a12x21 + . . . + a1nxn1 = b11

a21x11 + a22x21 + . . . + a2nxn1 = b21

an1x11 + an2x21 + . . . + annxn1 = bn1

.

Thus, in order to solve for the first column of X, we need to solve SLE with the matrixa11 a12 . . . a1n b11

a21 a22 . . . a2n b21...

... . . . ......

an1 an2 . . . ann bn1

.

42

5.6 Elementary transformations matrices 5 MATRIX ALGEBRA

Similarly, to solve for the second column of X, we need to solve the equationsa11x12 + a12x22 + . . . + a1nxn2 = b12

a21x12 + a22x22 + . . . + a2nxn2 = b22

an1x12 + an2x22 + . . . + annxn2 = bn2

,

which yields SLE matrix, which differs from the previous one only by the column of free termsa11 a12 . . . a1n b12

a21 a22 . . . a2n b22...

... . . . ......

an1 an2 . . . ann bn2

.

We can solve all these systems simultaneously by writing all b-terms to the right of the verticalbar a(s in section 5.3) and applying Gauss elimination

a11 a12 . . . a1n b11 b12... b1m

a21 a22 . . . a2n b21 b22... b2m

...... . . . ...

...... . . . ...

an1 an2 . . . ann bn1 bn2... bnm

.

Example 5.8. Solve the matrix equation from the example 5.7 by Gauss elimination.

Solution. 5 −1 3 −2 45 0 4 4 66 −1 4 6 0

[3] 7→[3]−[1]−−−−−−→

1 0 1 8 −45 −1 3 −2 45 0 4 4 6

[2] 7→[2]−[1]×5−−−−−−−−→

1 0 1 8 −40 −1 −2 −42 245 0 4 4 6

[3] 7→[3]−[1]×5−−−−−−−−→

1 0 1 8 −40 1 2 42 −240 0 1 36 −26

[1] 7→[1]−[3]−−−−−−→

1 0 0 −28 220 1 2 42 −240 0 1 36 −26

[2] 7→[2]−[3]×2−−−−−−−−→

1 0 0 −28 220 1 0 −30 280 0 1 36 −26

.

5.6 Elementary transformations matrices

All of the transformations discussed in section 1.2 can be expressed in terms of matrix multi-plication with the aid of special matrices called elementary transformation matrices.

43


The Type I transformation, in which the jth row is multiplied by some constant λ and addedto the ith row is equivalent to the multiplication (from the left) of the original matrix by thematrix

Uij(λ) =

1 0 λ . . . 00 1 0 . . . 00 0 1 . . . 0...

...... . . . ...

0 0 0 . . . 1

,

where λ is located in the intersection of the ith row and jth column, for instance in the case of3×3 matrices,

U13(λ) =

1 0 λ0 1 00 0 1

.

The Type II transformation, in which the ith row is multiplied by a non-zero constant λ, isequivalent to the multiplication (again, from the left) of the original matrix by

Pi(λ) =

1 0 0 . . . 00 1 0 . . . 00 0 λ . . . 0...

...... . . . ...

0 0 0 . . . 1

,

where λ is located in the ith row on the diagonal, for instance,

P2(λ) =

1 0 00 λ 00 0 1

.

The Type III transformation, in which the ith row and the jth row of the original matrixare interchanged, can be expressed as multiplication (again, from the left) by

Tij(λ) =

1 0 . . . 0 0 . . . 00 1 . . . 0 0 . . . 0...

... . . . ......

...0 0 . . . 0 1 . . . 00 0 . . . 1 0 . . . 0...

......

... . . . ...0 0 . . . 0 0 . . . 1

,

where the interchanged 1’s are located in the intersection of the ith and jth row and ith and jth

column, for instance

T12 =

0 1 01 0 00 0 1

.

44


Example 5.9. Let A be a 3×3 matrix of the general form

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

,

then U13(λ)× A =

=

1 0 λ0 1 00 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

a11 + λa31 a12 + λa32 a13 + λa33

a21 a22 a23

a31 a32 a33

,

P2(λ)× A =

1 0 00 λ 00 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

a11 a12 a13

λa21 λa22 λa23

a31 a32 a33

, and

T12 × A =

0 1 01 0 00 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

a21 a22 a23

a11 a12 a13

a31 a32 a33

.

Similarly, one can show that the multiplication from the right by Uji(λ), Pi(λ), and Tijleads to Type I, Type II, and Type III transformations of columns, respectively6. Note thatthe multiplication by Uij(λ) doesn’t change det(A) because it is a type I transformation, whilethe multiplication by Tij leads to the change of sign of det(A). Then, we will sometimes usethe matrix

Rij =

1 0 . . . 0 0 . . . 00 1 . . . 0 0 . . . 0...

... . . . ......

...0 0 . . . 0 −1 . . . 00 0 . . . 1 0 . . . 0...

......

... . . . ...0 0 . . . 0 0 . . . 1

,

where the 1 and -1 are located in the intersection of the ith and jth row and ith and jth column.Its important feature is that the multiplication of a matrix by Rij from the left or right doesn’tchange its determinant. We will use the elementary transformation matrices to prove thefollowing theorem.

Theorem 17.det(A×B) = det(A) det(B)

Proof. First, the proof is trivial when A or B is a degenerate matrix. Indeed, if A or B isdegenerate then A×B is degenerate, too, because its rows are linear combinations of a linearlydependent set of vectors and we get both det(A×B) and det(A) det(B) equal to zero.

6Spend some time with paper and pencil to check that it is, indeed, true!

45

5.7 Exercises 5 MATRIX ALGEBRA

Next, we prove this statement for the particular case when both A and B are upper-triangular matrices. Assume that

A =

a11 a12 a13 . . . a1n

0 a22 a23 . . . a2n

0 0 a33 . . . a3n...

...... . . . ...

0 0 0 . . . ann

, B =

b11 b12 b13 . . . b1n

0 b22 b23 . . . b2n

0 0 b33 . . . b3n...

...... . . . ...

0 0 0 . . . bnn

.

Then, by definition of the matrix multiplication, we geta11b11 ∗ ∗ . . . ∗

0 a22b22 ∗ . . . ∗0 0 a33b33 . . . ∗...

...... . . . ...

0 0 0 . . . annbnn

,

and, since the determinant of an upper-triangular matrix is the product of its diagonal en-tries (theorem 8), we get det(A × B) = a11b11a22b22 . . . annbnn = a11a22 . . . annb11b22 . . . bnn =det(A) det(B).

The general case can be reduced to the case of upper-triangular matrices by elementarytransformations of rows, for the matrix A, and by elementary transformations of columns, forthe matrix B. Indeed, the matrix A can be represented as a product A = A1×A2×· · ·×Ak×X,where Ai is one of the matrices Uij(λ) or Rij which don’t affect the determinant, while X isan upper-triangular matrix such that det(A) = det(X). Similarly, we can transform B byelementary transformations of columns to the form Y × Bl × · · · × B2 × B1, where Bj arematrices Uij(λ) or Rij and Y is also an upper-triangular matrix such that det(B) = det(Y ).Then, det(A× B) = det(A1 × A2 × · · · × Ak ×X × Y × Bl × · · · × B2 × B1) = det(X × Y ) =det(X) det(Y ) = det(A) det(B).

5.7 Exercises

Problem 5.1. Find the inverse matrix for each of the following matrices

(a)

−1 −2 4 11 −2−4 0 9 −7 −3

0 1 −1 −8 0−5 −1 12 −3 −5

4 0 −9 9 4

(b)

2 0 −7 −10 0 1 0−5 3 −9 2−12 5 −3 5

(c)

−6 7 7 −6−1 2 0 0−6 7 8 −6

2 −3 −1 1

(d)

7 4 6 34 −2 −2 1

12 6 9 510 3 5 4

46

5.7 Exercises 5 MATRIX ALGEBRA

(e)

3 −2 −4−2 1 3−6 3 8

(f)

11 −11 10 100 −1 1 1−3 5 −4 −5

7 −4 4 3

(g)

−11 −10 −6−6 −5 −3−10 −8 −5

(h)

−3 10 −1 0 −1

4 −12 1 0 11 −2 3 0 −2−1 0 5 1 −2

0 −3 7 1 −3

(i)

3 −7 −11 −7 57 −1 11 −8 3−5 2 −5 6 −3

5 1 12 −5 11 4 12 1 −2

(j)

1 2 −1 −1 −2−1 −8 10 0 5

1 1 −2 −4 −2−1 0 2 6 2

1 2 −5 −6 −3

Problem 5.2. Solve the following matrix equations

(a)

−3 3 51 −1 −2−7 6 10

X =

−1 −5 −43 2 −2−1 2 2

(b)

6 4 1 −6−5 −4 −1 6

8 3 1 −410 0 1 1

X =

1 −1 0 −4−4 −5 −4 −2−2 −2 2 1−1 2 1 3

(c) X

0 −3 11 0 91 0 −5 −1 −31 0 5 2 5−1 4 −2 3 −3−1 −5 2 −5 1

=

−1 0 3 −1 3

4 1 −3 2 3−4 1 −4 −1 −4−2 4 1 0 3

(d) X

7 −6 −8−1 1 1

3 −2 −3

=

1 −2 −31 2 −3−4 3 2

4 1 −31 −5 −5

(e)

3 −3 −12 1 04 −3 −1

X =

−4 −2 12 0 −2−3 −3 −1

47

6 LINEAR OPERATORS

(f)

8 −12 −1 0−3 7 2 −1−7 6 −2 2

0 5 3 −2

X =

2 3 2 −2 −44 −3 1 1 4−5 −3 2 2 4

3 4 −2 −5 −2

6 Linear operatorsThe notion of a linear operator is very general and has many applications in all parts ofmathematics, including calculus, statistics, and optimization theory. Commonly used synonymsof a linear operator are linear map, linear mapping, linear transformation, or linear function.

Definition 15. Let V and W be linear vector spaces. The mapping f : V → W is said to bea linear operator if

1. ∀x, y ∈ V f(x+ y) = f(x) + f(y)

2. ∀x ∈ V, ∀λ ∈ R f(λx) = λf(x),

i.e., f preserves the operations of vector addition and scalar multiplication. Below we discussseveral examples of mapping which are (or are not) linear operators.

1. f : R→ R, f(x) = cx is a linear operator as c(x+ y) = cx+ cy and c(λx) = (cλ)x.

2. f : R→ R, f(x) = x2 is NOT a linear operator because (x+ y)2 6= x2 + y2.

3. f : R→ R, f(x) = x+ 1 is NOT a linear operator because (x+ 1) + (y+ 1) 6= (x+ y) + 1.

4. The linear operator of differentiation by x, i.e., ∂∂x

: C∞[a, b] → C∞[a, b] such that∂∂x

(f) = f ′(x), which is defined on the set of infinitely smooth functions (page 15) is alinear operator.

5. The linear operator of integration over a segment [a, b], i.e., I : C[a, b] → R such thatI(f) =

∫ baf(x)dx is a linear operator as

∫ ba(f + g)(x)dx =

∫ baf(x)dx +

∫ bag(x)dx and∫ b

a(λf)(x)dx = λ

∫ baf(x)dx.

6. The linear operator of expected value acts on the linear space of random numbers to Rso that E(X + Y ) = E(X) + E(Y ) and E(λX) = λE(X).

7. f : R2 → R2, f(x, y) = (y, x) is a linear operator of symmetry with respect to the bisectorline of the first quadrant, as the vector symmetric to a sum of two vectors is equal to thesum of their symmetric images (the same is true for scaling).

8. f : R2 → R2, f(x, y) = (y,−x) is a linear operator of rotation 90◦ counterclockwise (checkit!), as the rotated sum of two vectors is equal to the sum of rotations.

9. f : R2 → R, f(x, y) = x is a linear operator of projection onto the x-axis, because theprojection of the sum of two vectors is equal to the sum of their projections.

48

6.1 Matrix of a linear operator 6 LINEAR OPERATORS

10. f : R2 → R, f(x, y) =√x2 + y2 is NOT a linear operator, because the length of the sum

of two vectors is not equal to the sum of their lengths.

6.1 Matrix of a linear operator

We now explore the connection between linear operators and matrices. It turns out, thateach linear operator can be described with a matrix, but the problem is that this matrix isbasis-dependent.

Consider a vector space V with a base e1, e2, . . . , en. Since each vector of V can be expressedas a linear combination of e1, e2, . . . , en, we get x = x1e1 +x2e2 + · · ·+xnen, where x1, x2, . . . , xnare the coordinates of x in the base e1, e2, . . . , en. If A : V → W is a linear operator then A(x) =A(x1e1+x2e2+· · ·+xnen) = x1A(e1)+x2A(e2)+· · ·+xnA(en). Here, A(e1), A(e2), . . . , A(en) arethe images of base vectors under the map f . Note that they are the same for all vectors x. Chosethe base f1, f2, . . . , fm in the vector space W and decompose vectors A(e1), A(e2), . . . , A(en) bythat base7.

A(e1) = a11f1 + a21f2 + . . . + am1fm,A(e2) = a12f1 + a22f2 + . . . + am2fm,A(en) = a1nf1 + a2nf2 + . . . + amnfm.

We have A(x) = x1A(e1)+x2A(e2)+· · ·+xnA(en) = x1(a11f1 +a21f2 +· · ·+am1fm)+x2(a12f1 +a22f2 + · · ·+ am2fm) + · · ·+xn(a1nf1 + a2nf2 + · · ·+ amnfm) = (a11x1 + a12x2 + · · ·+ a1nxn)f1 +(a21x1 + a22x2 + · · ·+ a2nxn)f2 + . . . (am1x1 + am2x2 + · · ·+ amnxn)fm. Thus, we can write theaction of y = A(x) by using matrix multiplication as

y1

y2...ym

=

a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...am1 am2 . . . a1mn

×

x1

x2...xn

,

where x1, x2, . . . , xn are the coordinates of x in the base e1, e2, . . . , en and y1, y2, . . . , ym are thecoordinates of A(x) in the base f1, f2, . . . , fm. Note that it is the same base which gave rise tothe matrix (aij) of the coordinates of f(ei). The matrix formed by (aij) will be called the matrixof the linear operator A and will be denoted by the same capital letter as the linear operator.Its matrix is specific to the bases e1, e2, . . . , en and f1, f2, . . . , fm. In order to construct thematrix of a linear operator, one needs to take images of all base vectors of the space V , findtheir coordinates in the base of the space W , and write down the coordinates as columns. Thisgives the matrix of a linear operator.

Example 6.1. Let A : R2 → R2 be the linear operator of rotation 90◦ counterclockwise on theplane. Consider e1 = f1 = (1, 0) and e2 = f2 = (0, 1). Then, A(e1) = f2 = 0 · f1 + 1 · f2 andA(e2) = −f1 = (−1) · f1 + 0 · f2. Therefore, the matrix of A in these bases is

A =

(0 −11 0

).

7Please pay attention to the notation of indices, which changed to make a more convenient expression below.

49

6.2 Change of base 6 LINEAR OPERATORS

Example 6.2. Let V be the vector space of polynomials of degree 2 or less and let A : V → Vbe the operator of differentiation by x. Consider the bases e1, e2, e3 and f1, f2, f3 such thate1 = f1 = 1, e2 = f2 = x, and e3 = f3 = x2. Then, A(e1) = 0 = 0 · f1 + 0 · f2 + 0 · f3,A(e2) = f1 = 1 · f1 + 0 · f2 + 0 · f3, and A(e3) = 2f2 = 0 · f1 + 2 · f2 + 0 · f3. Hence, the matrixof A is

A =

0 1 00 0 20 0 0

.

Example 6.3. Let A : R3 → R3 be the linear operator of rotation in 3D which induces a cyclicpermutation of the axes x, y, and z. In other words, it is a rotation around the line x = y = z by120◦ clockwise when looking at the origin from the first octant8. Consider e1 = f1 = (1, 0, 0),e2 = f2 = (0, 1, 0), and e3 = f3 = (0, 0, 1). Then, A(e1) = f2 = 0 · f1 + 1 · f2 + 0 · f3,A(e2) = f3 = 0 · f1 + 0 · f2 + 1 · f3, and A(e3) = f1 = 1 · f1 + 0 · f2 + 0 · f3. Therefore, the matrixof A in these bases is

A =

0 0 11 0 00 1 0

.

6.2 Change of base

In the previous section we considered two different vector spaces, V and W , each endowedwith its own base. Consider now a linear vector space V with two bases, e1, e2, . . . , en ande′1, e

′2, . . . , e

′n. In what follows, they will be called the old base and the new base, respectively,

reflecting our intention to change from the old base to the new base. Let x = x1e1 + x2e2 +· · · + xnen = x′1e

′1 + x′2e

′2 + · · · + x′ne

′n be two coordinate decompositions of x over the old and

the new base. Assume that the vectors of the old base are decomposed by the new base

e1 = c11e′1 + c21e

′2 + . . . + cn1e

′n,

e2 = c12e′1 + c22e

′2 + . . . + cn2e

′n,

en = c1ne′1 + c2ne

′2 + . . . + cnne

′n.

Then, we get x = x1e1 + x2e2 + · · ·+ xnen = x1(c11e′1 + c21e

′2 + · · ·+ cn1e

′n) + x2(c12e

′1 + c22e

′2 +

· · ·+ cn2e′n) + · · ·+ xn(c1ne

′1 + c2ne

′2 + · · ·+ cnne

′n) = (x1c11 + x2c12 + · · ·+ xnc1n)e′1 + (x1c21 +

x2c22 + · · ·+ xnc2n)e′2 + · · ·+ (x1cn1 + x2cn2 + · · ·+ xncnn)e′n = x′1e′1 + x′2e

′2 + · · ·+ x′ne

′n. Since

the coordinates of a vector in a base are defined uniquely, we havex′1 = c11x1 + c12x2 + . . . + c1nxnx′2 = c12x1 + c22x2 + . . . + c2nxnx′n = c1nx1 + cn2x2 + . . . + cnnxn

Thus, the change of bases incurs the change of coordinates, which has the form

x′ = Cx,

8Can you rigorously show that the rotation angle if 120◦?.

50

6.3 Eigenvectors and eigenvalues 6 LINEAR OPERATORS

where x′ is the column-vector of coordinates in the new base, x is the column-vector of coor-dinates in the old base, and the (square) matrix C contains the coordinates of the old basevectors in the new base. The backward transformation of coordinates is expressed by using theinverse matrix, i.e.,

x = C−1x′.

The matrix C is called the transition matrix from the base e1, e2, . . . , en to the base e′1, e′2, . . . , e′n.Consider now a linear operator A : V → V with the matrix A in the old base (which plays

the role of both ei and fi in the definition of a matrix of a linear operator) and with the matrixA′ in the new base. That is, y = Ax and y′ = A′x′. Besides that, x′ = Cx and y′ = Cy withthe same transition matrix C. Thus, y′ = Cy = A′x′ = A′Cx, and then y = C−1A′Cx = Ax.Since the matrix of a linear operator is defined uniquely by construction, we get

A = C−1A′C, or A′ = CAC−1.

This is the rule, by which the matrix of a linear operator is transformed under the change ofbase.

Definition 16. Two n×n matrices, A and B, are called conjugate if there exists a non-degenerate matrix C such that B = CAC−1.

Thus, the matrix of a linear operator looks differently in different bases. Two matrices of thesame linear operator in different bases are conjugate to each other. An obvious question is towhat “standard” form can one transform the matrix of a linear operator by using conjugation.An incomplete, but still a reasonable answer to this question will be given in the next sections.

6.3 Eigenvectors and eigenvalues

Definition 17. Let V be a vector space and let A : V → V be a linear operator. The vectorx 6= 0 is said to be an eigenvector of A with the eigenvalue λ if A(x) = λx. In other words, Aacts on x as multiplication by scalars.

Definition 18. Let A be a n×n matrix. The non-zero vector x ∈ Rn is called the eigenvectorof A with the eigenvalue λ if Ax = λx.

Theorem 18. The number λ ∈ R is an eigenvalue of the matrix A if and only if det(A−λE) =0.

Proof. Assume λ is an eigenvalue of the matrix A. Then, for some x 6= 0 we have Ax = λx,i.e.,

a11x1 + a12x2 + . . . + a1nxn = λx1

a21x1 + a22x2 + . . . + a2nxn = λx2

. . .an1x1 + an2x2 + . . . + annxn = λxn

That is, (a11 − λ)x1 + a12x2 + . . . + a1nxn = 0

a21x1 + (a22 − λ)x2 + . . . + a2nxn = 0an1x1 + an2x2 + . . . + (ann − λ)xn = 0

,

51


which in matrix form is (A−λE)x = 0. The latter is a homogeneous system of linear equations,which already has one solution x = 0. In order to have a non-zero solution, it must be degenerate(theorem 11), i.e., det(A− λE) = 0.

Conversely, if det(A−λE) = 0 then (A−λE)x = 0 is degenerate and must have a non-zerosolution x. Then, we have Ax − λx = 0, concluding that x is the eigenvector of A with theeigenvalue λ.

Note that the final result of the decomposition of det(A− λE) is a polynomial, i.e.,∣∣∣∣∣∣∣∣∣(a11 − λ) a12 . . . a1n

a21 (a22 − λ) . . . a2n...

... . . . ...an1 an2 . . . (ann − λ)

∣∣∣∣∣∣∣∣∣ =

= (a11 − λ)(a22 − λ)(a33 − λ) . . . (ann − λ) − a12a21(a33 − λ) . . . (ann − λ) + · · · = (−1)nλn +cn−1λ

n−1 + cn−2λn−2 + c1λ

1 + c0.

Definition 19. The polynomial function fA(λ) = det(A − λE) is called the characteristicpolynomial of the matrix A.

Theorem 19. Characteristic polynomials of conjugate matrices are equal.

Proof. Let B be the matrix that is conjugate to A, i.e., B = CAC−1. Then, by the propertiesof matrix multiplication fB(λ) = det(CAC−1 − λE) = det(CAC−1 − λCEC−1) = det(C(A −λE)C−1) = det(C) det(A− λE) det(C−1) = det(A− λE) = fA(λ).

Corollary 19.1. The sum of diagonal elements of a matrix doesn’t change under conjugation.

Proof. Note that the entire characteristic polynomial doesn’t change under conjugation. Inparticular, the factor cn−1 at λn−1 doesn’t change, too. In fact, the product of n− 1 copies of λcan only appear from the term (a11−λ)(a22−λ)(a33−λ) . . . (ann−λ) because the other termscontain λ to the power of n− 2 or less. Then, cn−1λ

n−1 = (−1)n−1λn−1(a11 + a22 + · · ·+ ann).Thus, the sum of diagonal elements of a matrix doesn’t change under conjugation.

Definition 20. The sum of matrix’s diagonal elements is called its trace and is denoted bytr(A).

Example 6.4. Find eigenvectors and eigenvalues of the matrix

A =

(3 45 2

).

Solution. A − λE =

(3 45 2

)− λ

(1 00 1

)=

(3 45 2

)−(λ 00 λ

)=

(3− λ 4

5 2− λ

).

Thus, we get∣∣∣∣ 3− λ 4

5 2− λ

∣∣∣∣ = (3− λ)(2− λ)− 20 = λ2− 5λ− 14 = 0. Solving the quadratic

52


equation, we get λ1 = −2 and λ2 = 7.

Case 1: λ1 = 7.

A− 7E =

(−4 4

5 −5

)(−4 4

5 −5

)(x1

x2

)=

(00

)(−4 4 0

5 −5 0

)[1] 7→[1]/(−4)−−−−−−−→

(1 −1 05 −5 0

)[2] 7→[2]−[1]×5−−−−−−−−→

(1 −1 0

)There exists a non-zero solution, i.e., the vector x =

(11

)(and all the vectors that are

proportionate to it) are eigenvectors with the eigenvalue λ1 = 7.Case 2: λ2 = 2.

A− λE =

(5 45 4

)Similarly, we get (

5 45 4

)(x1

x2

)=

(00

),

and the vector y =

(4−5

)is the eigenvector with the eigenvalue λ2 = 2.

Example 6.5. Find eigenvectors and eigenvalues of the matrix 5 −7 7−8 3 −2−8 −1 2

Solution.

det

5− λ −7 7−8 3− λ −2−8 −1 2− λ

=

= (5−λ)×(3−λ)×(2−λ)−7×8×2+8×1×7+8×7×(3−λ)−8×7×(2−λ)−1×2×(5−λ) =(5− λ)× (1− λ)× (4− λ) = 0.

Case 1: λ1 = 5.

A− λE =

0 −7 7−8 −2 −2−8 −1 −3

0 −7 7−8 −2 −2−8 −1 −3

→ 0 1 −1−8 −2 −2−8 −1 −3

→ 0 1 −1−8 0 −4−8 0 −4

→ (0 1 −12 0 1

)

53


Then, letting x3 = −2 we get x2 = −2 and x1 = 1. The first eigenvector is x =

1−2−2

.

Case 2: λ1 = 1.

A− λE =

4 −7 7−8 2 −2−8 −1 1

4 −7 7−8 2 −2−8 −1 1

→ 4 −7 7−8 2 −2

1 0 0

→ 0 −7 7

0 1 −11 0 0

→ (1 0 00 1 −1

).

Letting x3 = 1 we get x2 = 1 and x1 = 0. The second eigenvector is y =

011

.

Case 3: λ1 = 4.

A− λE =

1 −7 7−8 −1 −2−8 −1 −2

1 −7 7−8 −1 −2−8 −1 −2

→ (1 −7 78 1 2

)→(

1 −7 70 −57 54

)→(

1 −7 70 19 −18

)Guessing that x3 = 19, we get x2 = 18 and x1 = −7. Therefore, the last eigenvector is

z =

−71819

.

In the example above, one could probably notice that the transition matrix

C =

1 0 −7−2 1 18−2 1 19

,

which is obtained from the vectors x, y, and z by attaching their coordinates to each other,transforms matrix A by the rule C−1AC to the diagonal matrix 1 −7 7

2 5 −40 −1 1

× 5 −7 7−8 3 −2−8 −1 2

× 1 0 −7−2 1 18−2 1 19

=

5 0 00 1 00 0 4

,

where the diagonal entries are the eigenvalues of A. It happens because x, y, and z are linearlyindependent and form a base in R3, while the matrix of a linear operator in a base that consistsof eigenvectors must always be diagonal (by definition).

Theorem 20. Let A be a n×n matrix. Eigenvectors of A corresponding to different eigenvaluesare linearly independent.

54

6.4 Diagonalizable matrices 6 LINEAR OPERATORS

Proof. Let x1, x2, . . . , xk be the eigenvectors corresponding to the eigenvalues λ1, λ2, . . . , λk. Weneed to prove that if ∀i 6= j λi 6= λj then x1, x2, . . . , xk are linearly independent.

The proof of this statement is by induction on k. The base of the induction (k = 1) isobvious because every eigenvector is non-zero by definition and, therefore, forms a linearlyindependent set by itself. Assume that we have proven the statement for every set of k − 1eigenvectors with different eigenvalues, and now want to proceed to k eigenvectors.

Suppose, on the contrary, that x1, x2, . . . , xk are linearly dependent, i.e., there exist coeffi-cients µ1, µ2, . . . , µk, some of which are non-zero, such that

µ1x1 + µ2x2 + · · ·+ µnxn = 0.

By applying A to both sides of the equation, we get A(µ1x1 + µ2x2 + · · ·+ µnxn) = A(µ1x1) +A(µ2x2)+ · · ·+A(µnxn) = µ1A(x1)+µ2A(x2)+ · · ·+µnA(xn) = µ1λ1x1 +µ2λ2x2 + · · ·+µnλnxn.This, in combination with the linear combination of x1, x2, . . . , xk multiplied by λ1, gives{

µ1λ1x1 + µ2λ2x2 + · · ·+ µnλnxn = 0µ1λ1x1 + µ2λ1x2 + · · ·+ µnλ1xn = 0

,

which implies that

µ2(λ2 − λ1)x2 + µ3(λ3 − λ1)x3 + · · ·+ µn(λn − λ1)xn = 0.

Thus, we have a linear combination of x2, . . . , xn which consists of k− 1 vectors and is linearlydependent. By induction, it is possible only if such linear combination is trivial, i.e., if µ2(λ2−λ1) = µ3(λ3− λ1) = · · · = µn(λn− λ1) = 0. Since all λ’s are different, we have λi− λ1 6= 0 and,therefore, µ2 = µ3 = · · · = µn = 0.

Repeating the same argument with multiplication of the linear combination of x1, x2, . . . , xkby λ2 instead of λ1 leads to the conclusion that µ1 = 0, too. Therefore, µ1 = µ2 = · · · = µn = 0and the linear combination we chose is trivial, i.e., x1, x2, . . . , xk are linearly independent.

6.4 Diagonalizable matrices

The property of a linear operator to have a base consisting of eigenvectors (example 6.5) is veryuseful. Such “good” linear operators play an important role in many applications, for instance,in linear ordinary differential equations and in linear differential-difference equations.

Definition 21. The linear operator A : V → V is called diagonalizable if its matrix is diagonalin some base of V . Equivalently, one can find a base of V which consists of eigenvectors of A.

Definition 22. A n×n matrix A is called diagonalizable9 if there exist a non-degenerate matrixC and a diagonal matrix D such that CAC−1 = D.

Obviously, a linear operator is diagonalizable if and only if its matrix (in some base) isdiagonalizable. Considering that the characteristic polynomial of fA(λ) has the degree n and,therefore, can have at most n roots, the following statement follows immediately from theo-rem 20.

9Note that the statements CAC−1 = D and C−1AC = D are equivalent up to the difference between C andC−1. Since the transition matrix C is not specified in this definition, we ignore this difference from now on.

55


Theorem 21. A n×n matrix which has n distinct real eigenvalues is diagonalizable.

However, this sufficient condition is not necessary, i.e., there exist diagonalizable matriceswith coinciding eigenvalues.

Example 6.6.

A =

1 0 00 1 00 0 1

.

det(A − λE) = (1 − λ)3, i.e., λ = 1. However, A − λE = 0, so every vector in R3 is aneigenvector for A, i.e., one can find a base consisting of eigenvectors of A. That is, A isdiagonalizable (already in the diagonal form).

Example 6.7. The matrix A =

2 1 00 2 10 0 2

is not diagonalizable. Indeed, det(A − λE) =

(2− λ)3, i.e., λ = 2. Then, all eigenvectors of A must satisfy the SLE given by(0 1 0 00 0 1 0

).

However, all its solutions are proportional to the vector x = (1, 0, 0) and thus cannot form abase in R3. Therefore, A is not diagonalizable.

The problem with the matrix A in the example 6.7 is that the multiplicity of the root λis equal to three, i.e., the characteristic polynomial contains three copies of the correspondinglinear term, while the corresponding SLE has only one-dimensional subspace of eigenvectors.In example 6.6, the multiplicity of the root was also equal to three, but the corresponding SLEhad a three-dimensional space of solutions.

For each eigenvalue λ0 of a matrix, there are two numbers measuring, roughly speaking, thenumber of eigenvectors corresponding to λ0. These numbers are called multiplicities of λ0.

Definition 23. Let λ0 be a root of the characteristic polynomial for an n×n matrix A. Thealgebraic multiplicity of λ0 is the highest power k such that (λ− λ0)k is a factor of fA(λ). Thegeometric multiplicity of λ0 is the number of linearly independent solutions to (A−λ0E)x = 0,i.e., the geometric multiplicity is equal to n− rk(A− λ0E).

Lemma 22. The algebraic multiplicity of a root is greater or equal to its geometric multiplicity.

Theorem 23. A n×n matrix is diagonalizable if and only if all roots of fA(λ) are real andtheir algebraic multiplicities are equal to their geometric multiplicities.

Proof of these two statements falls beyond the scope of this chapter.

56


Diagonalizable matrixes are very special because they are conjugate to diagonal matrices.Geometrically, a diagonalizable matrix is an inhomogeneous dilation: it scales the vector space,as does a homogeneous dilation, but by a different factor in each dimension, determined by thescale factors on each axis (diagonal entries).

Diagonalization can be used to compute the powers of a matrix A, provided that the matrixis diagonalizable. Suppose we have found that

A = CDC−1,

and we need to compute Ak. Diagonal matrices are especially easy to handle: their eigenvaluesand eigenvectors are known and one can raise a diagonal matrix to a power by simply raisingthe diagonal entries to the same power, i.e., if D = diag(λ1, . . . , λn) then Dk = diag(λk1, . . . , λ

kn).

Then,Ak = CD(C−1C)D(C−1 . . . C)DC−1 = CDkC−1.

That is, in order to power a diagonalizable matrix, one needs to transform it to the diagonalform, take the desired power of the diagonal form, and transform back by using the sametransition matrix.

Example 6.8. Compute A20 for A =

5 −5 −8−8 2 8−1 −5 −2

.

Solution. First, since A has three distinct eigenvalues (λ1 = −3, λ2 = 2, λ3 = 6), it isdiagonalizable. Next, we find the diagonalization 5 −5 −8

−8 2 8−1 −5 −2

=

1 −1 −20 1 21 −1 −1

× −3 0 0

0 2 00 0 6

× 1 1 0

2 1 −2−1 0 1

.

Finally,

A20 =

1 −1 −20 1 21 −1 −1

× 320 0 0

0 220 00 0 620

× 1 1 0

2 1 −2−1 0 1

.

Computation of matrix’s power can be extended to define a function of a matrix for everyanalytical real function10. Let

ϕ(x) =∞∑k=0

ϕ(k)(0)

k!xk

be such a function. Then, having defined the convergence of matrix series, we can define

ϕ(A) =∞∑k=0

ϕ(k)(0)

k!Ak,

10A function is called analytical if it is equal to its (absolutely converging) Mc’Laurin series.

57

6.5 Exercises 6 LINEAR OPERATORS

since matrix addition and matrix multiplication are correctly defined. Additionally, if A isdiagonalizable then A = CDC−1 and

ϕ(A) = ϕ(CDC−1) =∞∑k=0

ϕ(k)(0)

k!(CDC−1)k = C

(∞∑k=0

ϕ(k)(0)

k!Dk

)C−1.

If D = diag(λ1, λ2, . . . , λn) then ϕ(D) = diag(ϕ(λ1), ϕ(λ2), . . . , ϕ(λn)) and

ϕ(A) = C × diag(ϕ(λ1), ϕ(λ2), . . . , ϕ(λn))× C−1.

Example 6.9. Compute exp(A) for A =

−7 −8 −84 5 42 4 −1

.

Solution. First, we compute eigenvalues λ1 = −1, λ2 = −3, λ3 = 1 and note that A isdiagonalizable because it has three distinct eigenvalues. Next,

A =

−4 −2 32 1 −21 0 −1

× −1 0 0

0 −3 00 0 1

× −1 −2 1

0 1 −2−1 −2 0

, and

exp(A) =

−4 −2 32 1 −21 0 −1

× 1

e0 0

0 1e3

00 0 e

× −1 −2 1

0 1 −2−1 −2 0

.

6.5 Exercises

Problem 6.1. Find eigenvalues of the following matrices

(a)(

5 −21 8

)(b)(

7 6−8 −7

)(c)(

4 −84 −8

)(d)(−8 −1

6 −3

)(e)

1 −7 −77 −5 1−7 7 1

(f)

−6 2 −4−8 −3 2−4 2 −6

(g)

−5 −2 −2−8 1 −2

8 −8 −5

(h)

−4 −4 −4 −4

0 −2 5 52 4 −7 −5−2 −4 6 4

(i)

−8 7 −1 −3−2 1 2 −4

4 −4 1 −14 −4 6 −6

(j)

6 0 −424 6 −448 0 −6

Problem 6.2. Find eigenvalues and eigenvectors of the following matrices and perform diag-onalization, when possible.

(a)(

3 −64 −7

)(b)

(5 40 7

)(c)(−6 −2

3 −1

)(d)

(6 140 −8

)(e)

−2 −6 −22 2 −2−2 −6 −2

(f)

3 −8 3−3 −2 3

2 −8 4

(g)

−7 0 −100 7 40 0 3

58

7 QUADRATIC FORMS

(h)

−2 5 −82 1 −8−6 6 −1

(i)

−4 2 −2−1 −7 2−1 −2 −3

(j)

3 7 −33 −1 3−5 −7 1

(k)

3 −2 3 −2−2 6 0 2

2 −8 −2 −23 −6 3 −2

(l)

0 6 1 8−8 −8 1 0−6 0 −7 −2

3 3 4 5

7 Quadratic formsLet x1, x2, . . . , xn be a set of variables. A monomial is a product of non-negative integer powersof x1, x2, . . . , xn. For instance, x2

1x2x43 is a monomial. The degree of a monomial with respect

to the variable xi is the power, at which xi occurs in it. For instance, the degree of x21x2x

43 with

respect to x1 is equal to two. The total degree of the monomial is the sum of its degrees withrespect to all the variables. For instance, the total degree of x2

1x2x43 is equal to seven.

A homogeneous polynomial is the sum of monomials, whose terms have the same total degree.For example, x5

1 + 2x31x

22 + 9x5

2 is a homogeneous polynomial of degree five. The polynomialx3

1 + 3x21x

32 + x7

2 is not homogeneous, because the sum of exponents does not match from termto term.

A quadratic form is a homogeneous polynomial of degree two in n variables. Quadratic formsplay an important role in many branches of mathematics, including calculus and optimizationtheory.

7.1 Matrix of a quadratic form

Each quadratic form in n variables can be written in a canonical form which involves matrixnotation. For instance, Q(x1, x2) = x2

1 + 6x1x2 + 5x22 can be written as matrix product

(x1, x2)

(1 33 5

)(x1

x2

)= (x1, x2)

(x1 + 3x2

3x1 + 5x2

)= x2

1 + 6x1x2 + 5x22.

Note that the matrix product contains two copies of the monomial 3x1x2, which sum up to6x1x2. In general, the matrix product of this form can be transformed as

(x1x2 . . . xn)

a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...an1 an2 . . . ann

x1

x2...xn

=

= (x1x2 . . . xn)

a11x1 + a12x2 + · · ·+ a1nxna21x1 + a22x2 + · · ·+ a2nxn

...an1x1 + an2x2 + · · ·+ annxn

=

59

7.1 Matrix of a quadratic form 7 QUADRATIC FORMS

= x1(a11x1 +a12x2 + · · ·+a1n) +x2(a21x1 +a22x2 + · · ·+a2nxn) + · · ·+xn(an1x1 +an2x2 + · · ·+annxn) = a11x

21 + (a12 + a21)x1x2 + · · ·+ a22x

22 + · · ·+ (a1n + an1)x1xn + · · ·+ annx

2n. Since the

coefficients aij are yet to be defined, it is convenient to assume that aij = aji and to includethe factor of 2 in the general expression of a quadratic form:

Q(x1, x2, . . . , xn) =n∑i=1

aiix2i + 2

∑i<j

aijxixj.

Definition 24. A n×n matrix A = (aij) is called symmetric (anti-symmetric) if ∀i, j aij = aji(∀i, j aij = −aji). In other words, A is symmetric (anti-symmetric) if A = AT (A = −AT ).

Thus, every symmetric matrix defines a quadratic form and vice versa, every quadratic formcan be defined by a symmetric matrix by the identity

Q(x) = xTAx,

where x = (x1x2 . . . xn)T is a column vector. As a rule, one must put the full square coefficientsin the diagonal of the matrix, while the coefficients at the cross-product terms are divided bytwo and placed in the corresponding cells, one above and one below the diagonal.

Example 7.1. Q(x1, x2, x3) = x21 + 2x2

2 + 3x23 + 4x1x3 =

1 0 20 2 02 0 3

Example 7.2. Q(x1, x2, x3) = 4x1x2 + 2x1x3 + 10x2x3 =

0 2 12 0 51 5 0

Definition 25. A quadratic formQ(x1, x2, . . . , xn) is called positive definite11 ifQ(x1, x2, . . . , xn) >0 for every non-zero vector (x1, x2, . . . , xn) ∈ Rn. Similarly, Q(x1, x2, . . . , xn) is called negativedefinite (positive semidefinite, negative semidefinite) ifQ(x1, x2, . . . , xn) < 0 (Q(x1, x2, . . . , xn) ≥0, Q(x1, x2, . . . , xn) ≤ 0, respectively). Otherwise, a quadratic form is called not definite.

Figure 2 gives a simple and intuitive explanation of what is a positive definite quadraticform. First of all, if z = f(x, y) is a quadratic form then its planar sections z(y) = f(0, y) andz(x) = f(x, 0) both have to be quadratic functions.

For instance, the 3D-graph of z = x2 + y2 shown in part (a) is a surface with concave upplanar cross-sections for both x = 0 (z = y2) and y = 0 (z = x2). The quadratic form z = x2+y2

is positive definite because it takes only positive values for non-zero pairs (x, y) and is equal tozero if and only if both x and y are equal to zero. Now, if we don’t require z be non-zero for everynon-zero pair (x, y) then we get a non-positive definite quadratic form whose surface is shown inpart (b). An example of such function would be the function z = x2− 2xy+ y2 = (x− y)2 ≥ 0,which is zero for x = y = 1. Similarly, the negative-definite quadratic function z = −x2 − y2

has planar cross-sections for both x = 0 (z = −y2) and y = 0 (z = −x2) but this time the cross-sections are concave down (figure 2 (c)). A non-definite quadratic form is such that some of its

60

7.2 Change of base 7 QUADRATIC FORMS

(a) (b)

(c) (d)

Figure 2: 3D shapes of (a) positive definite, (b) non-negative definite, (c) negative-definite, and(d) not definite quadratic functions.

cross-sections are are concave up, while others are concave down. For instance, the quadraticform z = x2 + y2 is non-definite as z = x2 for y = 0 and z = −y2 for x = 0.

One characteristic spot on the surface of a non-definite quadratic function is called saddlepoint. Saddle point has zero slope, but the surface is concave up in one direction and concavedown in the other direction. It is the point located right in the middle of the surface shownin figure 2 (d). One can think of a saddle point as of unstable equilibrium, where it is a localminimum in one cross-section but at the same time it is a local maximum in another cross-section. Generic surfaces may have various numbers of maxima, minima, and saddle points, andquadratic forms are often used as a tool to approximate the local behavior of generic surfaces.For instance, the surface shown in figure 7.1(a) can be described as two hills (or mountains)and two respective valleys. The ’best’ path from one valley to the other passes through thesaddle point. Saddle points play an important role in multi-dimensional calculus and in thetheory of ordinary differential equations.

7.2 Change of base

We have seen in section 6.2 that the matrix of a linear operator changes according to the ruleA 7→ CAC−1 under the linear transformation of the vector space induced by the matrix C.In this section we investigate how does the matrix of a quadratic form change under a similar

11Do not confuse the adjective definite with the past participle defined.

61

7.2 Change of base 7 QUADRATIC FORMS

(a) (b)

(c) (d)

Figure 3: Examples of extreme points on generic surfaces. (a) two local maxima separated bya saddle-point, (b) a local maximum and a local minimum, no saddle point in between, (c)one local minimum and degenerate local maxima and minima, (d) four local maxima, one localminimum, and four saddle points.

transformation.Consider two bases, e1, e2, . . . , en and e′1, e′2, . . . , e′n of the vector space V and let x = x1e1 +

x2e2 + · · · + xnen = x′1e′1 + x′2e

′2 + · · · + x′ne

′n be two coordinate decompositions of the vector

x over these two bases. In section 6.2 we saw that the new coordinates are expressed throughthe old coordinates as

x′1 = c11x1 + c12x2 + . . . + c1nxnx′2 = c12x1 + c22x2 + . . . + c2nxnx′n = c1nx1 + cn2x2 + . . . + cnnxn

,

where matrix C consists of the coordinates of old base vectors in the new base, i.e., x′ = Cx.Let A and A′ be the matrices of a quadratic form in the old and new base, respectively. We

haveQ(x) = (x′)TA′x′ = (Cx)TA′Cx = xT (CTA′C)x = xTAx.

Therefore, the matrix of a quadratic form transforms as A 7→ CTAC, where C is a transitionmatrix12. In the next section we will ask to what standard form can one transform a quadraticform by a change of base.

12Perhaps, we should be more careful with notation because the matrix C here is inverse to the matrix Cdiscussed in the previous paragraph. However, since C is unknown anyway, we will keep using C instead of C−1

for simplicity of notation.

62

7.3 Canonical diagonal form 7 QUADRATIC FORMS

7.3 Canonical diagonal form

Theorem 24. Let Q(x) be a quadratic form in n variables. There exists a change of base suchthat in the new base

Q(x) = x21 + x2

2 + · · ·+ x2k − x2

k+1 − x2k+2 − · · · − x2

k+l.

Proof. Assume that Q(x1, x2, . . . , xn) =n∑i=1

aiix2i + 2

∑i<j

aijxixj and suppose that a11 6= 0. We

will look for the change of coordinates of the form x1 = p1y1 + p2y2 + · · ·+ pnyn and xi = yi for

i ≥ 2. We get Q(x1, x2, . . . , xn) = a11x21 +2a12x1x2 + · · ·+2a1nx1xn+

n∑i=2

aiix2i +2

∑2≤i<j

aijxixj =

a11x21 + 2x1(a12x2 + · · ·+a1nxn) +R(x2, . . . , xn), where R(x2, . . . , xn) is a quadratic form in the

n− 1 variables.Then, Q(y) = a11(p1y1+p2y2+· · ·+pnyn)2+2p1y1(a12y2+· · ·+a1nyn)+R1(y2, . . . , yn), where

the terms containing y2, . . . , yn went to R1(y2, . . . , yn). Then, Q(y) = a11(p21y

21 + 2p1p2y1y2 +

2p1p3y1y3 + · · ·+ 2p1pny1yn) + 2(p1a12y1y2 +p1a13y1y3 + · · ·+p1a1ny1yn) +R2(y2, . . . , yn), whereR2(y2, . . . , yn) contains all the terms with y2, . . . , yn. If we set p2 = −a12, p3 = −a13, . . . , pn =−a1n and p1 =

√|a11| then Q(y) = ±y2

1 + R1(y2, . . . , yn), where the sign at y21 is the same as

the sign of a11.Now consider the case when a11 = 0 but at least one of a1i 6= 0. Without loss of generality,

assume that a12 6= 0. Consider the change of coordinates of the form x1 = y1 + y2, x2 = y1− y2,and xi = yi for i ≥ 3. We are now back to the case when a11 6= 0 because Q(y) = (y1 + y2)(y1−y2) + · · · = y2

1 − y22 + . . . contains y2

1.Finally, if a1i = 0 for all i, we can proceed to a quadratic form with fewer number of

variables.

The pair of numbers (k, l), which are the number of positive and negative terms in thecanonical representation, respectively, is defined uniquely and will be called the signature ofthe quadratic form. When there are no zero terms in the canonical form (i.e., k + l = n) andthe dimension n of the enveloping vector space is known, it is enough to know the differencek − l, which is sometimes also called signature13. For instance, the signature (3, 1) in R4 (alsoknown as Minkowski space) is also called “signature 2”. We will no longer pay any attention tothe distinction between these two formulations.

The proof of this theorem contains the procedure called the completion of full squares, whichcan be used to find the canonical base.

Example 7.3. Find the canonical form of

Q(x1, x2, x3) = x21 − 4x1x2 + 2x1x3 + 6x2

2 + 2x23

13This second notation is found frequently in physics books

63

7.4 Sylvester’s criterion 7 QUADRATIC FORMS

Solution. Q(x1, x2, x3) = (x21−4x1x2+4x2

2)+2x1x3+2x22+2x2

3 = (x1−2x2)2+2x1x3+2x22+2x2

3.Denote y1 = x1 − 2x2, y2 = x2, and y3 = x3. Then, x1 = y1 + 2y2 and we get Q(y1, y2, y3) =y2

1 +2(y1+2y2)y3+2y22 +2y2

3 = y21 +2y1y3+4y2y3+2y2

2 +2y23 = (y2

1 +2y1y3+y23)+4y2y3+2y2

2 +y23 =

(y1 + y3)2 + 4y2y3 + 2y22 + y2

3. Now denote y1 + y3 = z1, y2 = z2, and y3 = z3. We getQ(z1, z2, z3) = z2

1 + 2z22 + 4z2z3 + z2

3 = z21 + 2(z2

2 + 2z2z3 + z23) − z2

3 = z21 + 2(z2 + z3)2 − z2

3 .Finally, denote t1 = z1, t2 = z2 + z3, and t3 = z3. We get Q(z1, z2, z3) = t21 + 2t22 − t23. Thus,the signature of this quadratic form is (2, 1).

Theorem 25. A quadratic form Q(x1, x2, . . . , xn) is positive definite (negative definite, positivesemidefinite, negative semidefinite) if and only if its signature is (n, 0) (its signature is (0, n),(k, 0), and (0, k), respectively, where k < n). A quadratic form Q(x1, x2, . . . , xn) is not definiteif and only if its signature is (k, l), where k > 0 and l > 0.

Proof follows immediately from the definition.

7.4 Sylvester’s criterion

Sometimes the completion of least squares is not the most convenient way to check whetheror not the quadratic form is positive definite. The following theorem helps to compute thesignature of a quadratic form without cumbersome coordinate changes.

Theorem 26 (Sylvester’s criterion). Let Q(x) = xTAx be a quadratic form in n variables whereA = AT . The form Q(x) is positive definite if and only if all upper left corner k×k minors ofA are positive, i.e.,

∆1 = a11 > 0,

∆2 =

∣∣∣∣ a11 a12

a21 a22

∣∣∣∣ > 0,

∆3 =

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ > 0, . . .

∆n =

∣∣∣∣∣∣∣∣∣a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...an1 an2 . . . ann

∣∣∣∣∣∣∣∣∣ > 0.

This theorem follows from the following lemma.

Lemma 27. Let x1 = p1y1 + p2y2 + · · ·+ pnyn and xi = yi for i ≥ 2 be the change of variablesin the proof of theorem 24. All upper-left corner k×k minors of A do not change if p1 = 1.

Corollary 27.1. Let Q(x) = xTAx be a quadratic form in n variables and let λ1x21 + λ2x

22 +

· · ·+ λnx2n be its diagonal representation obtained by a series of full square completion steps of

64

7.4 Sylvester’s criterion 7 QUADRATIC FORMS

the form x1 = p1y1 + p2y2 + · · ·+ pnyn, xi = yi for i ≥ 2 with p1 = 1. Assume that ∆i 6= 0 forall i. Then,

λi =∆i

∆i−1

,

where ∆0 = 1.

Example 7.4. Consider Q(x1, x2, x3) = x21 − 4x1x2 + 2x1x3 + 6x2

2 + 2x23.

A =

1 −2 1−2 6 0

1 0 2

,

∆1 = 1, ∆2 =

∣∣∣∣ 1 −2−2 6

∣∣∣∣ = 6− 4 = 2,

∆3 =

∣∣∣∣∣∣1 −2 1−2 6 0

1 0 2

∣∣∣∣∣∣ = 12− 6− 8 = −2,

λ1 = 1, λ2 =2

1= 2, λ3 =

−2

2= −1,

in accordance with the canonical form obtained in example 7.3.

Note that every positive definite quadratic form Q(x1, x2, . . . , xn) is turned into a neg-ative definite quadratic form when multiplied by −1 (indeed, if Q(x1, x2, . . . , xn) > 0 for(x1, x2, . . . , xn) 6= 0 then −Q(x1, x2, . . . , xn) < 0 for (x1, x2, . . . , xn) 6= 0). Considering thatthe matrix of a quadratic form is also multiplied by −1,

∆n =

∣∣∣∣∣∣∣∣∣−a11 −a12 . . . −a1n

−a21 −a22 . . . −a2n...

... . . . ...−an1 −an2 . . . −ann

∣∣∣∣∣∣∣∣∣ = (−1)n2

∣∣∣∣∣∣∣∣∣a11 a12 . . . a1n

a21 a22 . . . a2n...

... . . . ...an1 an2 . . . ann

∣∣∣∣∣∣∣∣∣ ,we get that ∆n(−Q) = (−1)n

2∆n(Q) = (−1)n∆n(Q) because n2 is an even number if and

only if n also is. We thus get the following re-formulation of theorem 26 for negative-definitequadratic forms.

Theorem 28. Let Q(x) = xTAx be a quadratic form in n variables where A = AT . The formQ(x) is negative definite if and only if the upper left corner k×k minors of A form an alternatingseries starting with a negative number, i.e., ∆1 < 0,∆2 > 0,∆3 < 0, . . . etc.

Note that Sylvester’s criterion is a useful tool for the analysis of quadratic forms, but it isnot always applicable because some of the upper left corner minors can be equal to zero. Inthe latter case one should use least squares completion.

Example 7.5. Consider Q(x1, x2) = x1x2. The Sylvester’s criterion cannot be used as ∆1 = 0.However, the linear transformation x1 = y1−y2 and x2 = y1 +y2 gives Q(y1, y2) = (y1−y2)(y1 +y2) = y2

1 − y22, and we can safely conclude that the quadratic form Q(x1, x2) is not definite.

65

7.5 Exercises 8 EUCLIDEAN VECTOR SPACES

7.5 Exercises

Problem 7.1. Use the completion of full squares to find the canonical representation of thefollowing quadratic forms.

(a) −4x21 + 8x1x2 + 10x1x3 − 2x2

2 − 10x2x3 − 5x23

(b) −x21 − 6x1x2 − 2x1x3 + 4x2

2 + 8x2x3 + 2x23

(c) −5x21 − 4x1x2 + 10x1x3 − 2x2

2 + 10x2x3 − 5x23

(d) x21 + 4x1x2 + 10x1x3 + 2x2

2 + 8x2x3 + 5x23

(e) 2x21 − 4x1x2 + 8x1x3 − 4x1x4 − x2

2 + 4x2x3 + 4x2x4 − x23 − 2x3x4

(f) −x21 − 6x1x2 − 6x1x3 − 8x1x4 + 5x2

2 + 6x2x4 + 3x23 + 6x3x4 + 4x2

4

(g) x21 − 4x1x2 − 2x1x3 + 5x2

2 − 4x2x3 + 4x2x4 − x23 − 4x3x4 + 4x2

4

(h) 2x21 + 6x1x2 − 2x1x3 − 8x1x4 − 3x2

2 − 6x2x3 + x23 + 8x3x4 + 5x2

4

Problem 7.2. Use Sylvester’s criterion to find signatures of the following quadratic forms

(a) 5x21 − 6x1x2 − 6x1x3 + 3x2

2 + 6x2x3 + 3x23

(b) 3x21 − 6x1x2 + 8x1x3 + x2

2 − 6x2x3 + 4x23

(c) −3x21 − 6x1x2 + 6x1x3 + 6x2x3 + x2

3

(d) −x21 + 8x1x2 − 8x1x3 − 4x2

2 + 8x2x3 − x23

(e) −x21 − 2x1x2 − 8x1x3 − 6x1x4 − 2x2

2 + 2x2x3 + 4x23 + 6x3x4 − 3x2

4

(f) −2x21 − 4x1x2 − 4x1x3 + 2x2

2 + 4x2x3 − 8x2x4 + x23 − 10x3x4 + x2

4

(g) −x21 − 2x1x3 − 4x2

2 − 8x2x4 − 3x23 − 4x3x4 − 9x2

4

(h) −4x22 + 16x2x4 − 3x2

3 − 20x24

8 Euclidean vector spacesThe notion of a bilinear form occupies central place in the theory of Eucledian vector spaces.

Definition 26. A bilinear form on a real vector space V is the mapping b : V × V → R suchthat

1. ∀x1, x2, y ∈ V b(x1 + x2, y) = b(x1, y) + b(x2, y),

2. ∀x, y1, y2 ∈ V b(x, y1 + y2) = b(x, y1) + b(x, y2),

66

8.1 Linear spaces with dot product 8 EUCLIDEAN VECTOR SPACES

3. ∀x, y ∈ V, λ ∈ R b(λx, y) = b(x, λy) = λb(x, y).

In other words, b(x, y) is a linear function of both its arguments. The bilinear form b(x, y) iscalled symmetric if

4. ∀x, y ∈ V b(x, y) = b(y, x).

The bilinear form b(x, y) is called positive definite if

5. ∀x ∈ V b(x, x) ≥ 0 and b(x, x) = 0 only if x = 0.

Similarly to what was discussed in the previous section, each bilinear form admits thecoordinate representation, in which it is expressed by using matrix products:

b(x, y) = xTBy,

where x, y ∈ Rn, and B is a n×n matrix, which is called thematrix of a bilinear form. Obviously,the bilinear form is symmetric if and only if its matrix is symmetric. The change of basefor bilinear forms occurs similarly to what we saw for quadratic forms. If x = Cx′ thenb(x, y) = xTBy = (x′)TCTBCx′, i.e., the matrix of a bilinear form also transforms by the ruleB 7→ CTBC.

Each symmetric bilinear form b(x, y) defines a quadratic form with the same matrix bythe identity Q(x) = b(x, x). Conversely, if Q(x) = xTAx is a quadratic form then we candefine a bilinear form by using the same matrix, i.e., b(x, y) = xTAy. This can be doneeven without matrices because since b(x + y, x + y) = b(x, x) + b(x, y) + b(y, x) + b(y, y) =Q(x) +Q(y) + 2b(x, y) = Q(x+ y) and

b(x, y) =1

2(Q(x+ y)−Q(x)−Q(y)) .

Thus, each quadratic form defines a symmetric bilinear form and, vice versa, each symmetricbilinear form defines a quadratic form.

8.1 Linear spaces with dot product

A symmetric, positive definite bilinear form b(x, y) on a real vector space V is called dot product(also called scalar product). By theorem 24, there is a base in V such that the corresponding tob(x, y) quadratic formQ(x) (one with the same symmetric matrix) has the form x2

1+x22+· · ·+x2

n,i.e., in some base of V , the matrix of Q(x) is the identity matrix. In the same base the matrixof b(x, y) must also be the identity matrix, i.e., the bilinear form is expressed as14

b(x, y) = xTy = x1y1 + x2y2 + · · ·+ xnyn.

A linear vector space, along with a dot product defined on it, is called a Euclidean vector space.The scalar product in the Euclidean vector space can be considered as a “source of geometry”.

Indeed, the colloquial dot product of vectors can be used to define lengths (by setting l(x) =

14Note that the product of a 1×n (row) and n×1 (column) matrices gives a 1×1 matrix, i.e. a number.

67

8.2 Gram-Schmidt orthogonalization 8 EUCLIDEAN VECTOR SPACES

√Q(x) =

√b(x, x)) and angles (by setting the angle between vectors x and y be equal to

α(x, y) = cos−1 b(x,y)l(x)l(y)

).In what follows we will discuss well-known Euclidean vector spaces with a dot product which

is so “standard” that we omit the name of the function, i.e., instead of b(x, y) we write (x, y).

Example 8.1. The n-dimensional real vector space Rn is a Euclidean vector space with respectto the dot product defined by

(x, y) = x1y1 + x2y2 + · · ·+ xnyn.

Example 8.2. The vector space C[a, b] of continuous real functions defined on [a, b] is a Eu-clidean vector space with respect to the dot product defined by

(f, g) =

b∫a

f(x) · g(x)dx.

Indeed, it is linear, symmetric, and positive definite because

(f, f) =

b∫a

f 2(x)dx ≥ 0,

and it could be equal to zero only if f(x) differs from zero on a zero-measure set, which impliesthat f(x) = 0 for a continuous function.

Example 8.3. Consider the space V of discrete random numbers (a random number consistsof a finite collection of values, each endowed with a number called probability, such that allprobabilities are positive and sum up to 1). It is a linear vector space with respect to the additionof random numbers and multiplication by scalars. There is a linear operator of expected valueacting on V . It is used to introduce variance and covariance as follows:

cov(X, Y ) = E(X · Y )− E(X) · E(Y ),

var(X) = cov(X,X) = E(X2)− E2(X).

Here, cov(X, Y ) brings Euclidean structure to the vector space V , thus establishing an associa-tion between vectors and random numbers, one in which the length of a random number (as ofa vector) is

√cov(X,X) =

√var(X) = σ(X), i.e., the length is somewhat similar to standard

deviation, while the cosine of the angle between two random numbers is their correlation. Here,independent random numbers, which necessarily should have zero covariance, turn out to beorthogonal, i.e., the angle between them is 90◦.

8.2 Gram-Schmidt orthogonalization

Definition 27. Vectors x and y of a Euclidean vector space are called orthogonal if their dotproduct is equal to zero, i.e. if (x, y) = 0. A base of V consisting of vectors e1, e2, . . . , en suchthat (ei, ej) = 0 for i 6= j is called an orthogonal base. An orthogonal base is called orthonormalif, additionally, (ei, ei) = 1 for all i.

68

8.2 Gram-Schmidt orthogonalization 8 EUCLIDEAN VECTOR SPACES

An orthogonal base consists of vectors that are perpendicular to each other. The vectorsforming an orthonormal base are, in addition to this, each of length one. Obviously, eachorthogonal base can be transformed to the orthonormal base by dividing its vector by theirlengths. It is very convenient to deal with an orthonormal base because the components of thevector x = x1e1 +x2e2 + · · ·+xnen in this base can be conveniently calculated by using the dotproduct:

(x, ei) = (x1e1 + x2e2 + · · ·+ xnen, ei) = 0 + · · ·+ xi(ei, ei) + · · ·+ 0 = xi.

If e1, e2, . . . , en is a (not necessarily orthogonal) base in V then the matrix G = (gij), wheregij = (ei, ej) is called theGram matrix of this base. Each Gram matrix is symmetric and positivedefinite because so is the dot product. The Gram matrix of an orthogonal base is diagonal;the Gram matrix of an orthonormal base is the identity matrix. That is, if e1, e2, . . . , en is anorthonormal base then (ei, ej) = δij, where δij is so called Kronecker’s symbol such that δij = 1if i = j and δij = 0 otherwise.

Let e1, e2, . . . , en be a base in V which is not orthogonal. The procedure below, called Gram-Schmidt orthogonalization process, can be used to transform (e1, e2, . . . , en) to an orthonormalbase (f1, f2, . . . , fn) so that the flags of linear hulls generated by these two bases coincide, i.e.for all i

L(e1) = L(f1),

L(e1, e2) = L(f1, f2),

L(e1, . . . , ei) = L(f1, . . . , fi)

for all i. We will construct (f1, f2, . . . , fn) in a series of steps. First, let f1 = e1. Assumethat f1, f2, . . . , fk−1 have been constructed and we want to find fk such that f1, f2, . . . , fk arepairwise orthogonal and L(f1, . . . , fk) = L(e1, . . . , ek). We will search for fk in a form of linearcombination of ek and already constructed vectors, i.e.,

fk = ek − λ1f1 − λ2f2 − · · · − λk−1fk−1.

We need that (fk, f1) = (fk, f2) = · · · = (fk, fk−1) = 0. Then,

(ek − λ1f1 − λ2f2 − · · · − λk−1fk−1, f1) = (ek, f1)− λ1(f1, f1) = 0,

(ek − λ1f1 − λ2f2 − · · · − λk−1fk−1, f2) = (ek, f2)− λ1(f2, f2) = 0,

(ek − λ1f1 − λ2f2 − · · · − λk−1fk−1, fk−1) = (ek, fk−1)− λ1(fk−1, fk−1) = 0.

That is,

λi =(ek, fi)

(fi, fi).

Example 8.4. Use Gram-Schmidt orthogonalization process to find an orthogonal base in thelinear hull of the following system of vectors

e1 =

111

, e2 =

310

, e3 =

1−4

0

69

8.3 Projection of a vector on a linear subspace 8 EUCLIDEAN VECTOR SPACES

Solution. As explained above, we first set f1 =

111

. Then, we look for f2 in the following

formf2 = e2 − λ1f1.

We have (f2, f1) = (e2, f1) − λ1(f1, f1) = 0. Then, (e2, f1) = 4, (f1, f1) = 3, λ1 = 43, and

f2 = e2− 43f1 =

53

−13

−43

. One may want to stretch f2 three times to obtain integer components,

f2 =

5−1−4

.

Next, we look for f3 as f3 = e3 − λ1f1 − λ2f2. Then, (f3, f1) = (e3, f1) − λ1(f1, f1) −λ2(f2, f1) = (e3, f1) − λ1(f1, f1) = 0. We have (e3, f1) = −3, (f1, f1) = 3. Therefore, λ1 =−1. Similarly, (f3, f2) = (e3, f2) − λ1(f1, f2) − λ2(f2, f2) = (e3, f2) − λ2(f2, f2) = 0. Here,(e3, f2) = 9, (f2, f2) = 42, and λ2 = 9/42 = 3/14. Then, we have f3 = e3 + f1 − 3

14f2 = 2

−31

− 15

14

− 314

−1214

=

1314

−39142614

, which gives f3 =

1−3

2

after stretching 14/13 times.

8.3 Projection of a vector on a linear subspace

In many parts of mathematics appears the problem of the optimal solution or, more naıvely, theproblem of approximation or finding the “closest point”. For instance, the linear regression inStatistics deals with fitting a line through the points which are not on the same line; still, thereexists the “optimal” line, which the the closest one among all lines in the plane with respect tothe given set of measurements.

Below we provide a general framework in the context of Euclidean space. Suppose youhave a vector x in the Euclidean space V and a vector subspace U ⊆ V generated by a set ofvectors, U = L(a1, a2, . . . , ak). If x /∈ U then it cannot be represented as a linear combinationλ1a1+λ2a2+· · ·+λkak. However, one can ask when the length of the difference x−λ1a1−λ2a2−· · · − λkak is minimal. This can happen if and only if the vector x− λ1a1 − λ2a2 − · · · − λkakis orthogonal15 to the subspace U . Denote the difference vector by n, i.e.,

n = x− λ1a1 − λ2a2 − · · · − λkak.

It will be called the normal vector16. The normal vector must be orthogonal to each of the15Can you explain, why? Note that your colloquial reasoning such as “the shortest path is the perpendicular”

in general fails to work.16Here “normal” is the opposite to “tangent”.

70

8.3 Projection of a vector on a linear subspace 8 EUCLIDEAN VECTOR SPACES

vectors in U , including a1, a2, . . . , ak. Then, we have(n, a1) = (x, a1)− λ1(a1, a1)− λ2(a2, a1)− · · · − λn(ak, a1) = 0(n, a2) = (x, a2)− λ1(a1, a2)− λ2(a2, a2)− · · · − λn(ak, a2) = 0

. . .(n, ak) = (x, ak)− λ1(a1, ak)− λ2(a2, ak)− · · · − λn(ak, ak) = 0

.

Therefore, λ1(a1, a1) + λ2(a2, a1) + · · ·+ λn(ak, a1) = (x, a1)λ1(a1, a2) + λ2(a2, a2) + · · ·+ λn(ak, a2) = (x, a2)

. . .λ1(a1, ak) + λ2(a2, ak) + · · ·+ λn(ak, ak) = (x, ak)

,

that is, the set of coefficients λ1, λ2, . . . , λk is obtained from solving the system of linear equa-tions whose matrix is the Gram matrix of a1, a2, . . . , ak and the column of free terms consistsof dot products of x with a1, a2, . . . , ak.

Example 8.5. For the vectors x, e1, and e2 below, find the projection of the vector x onto thelinear hull of e1 and e2. Evaluate the distance between the endpoint of x and the linear hull.

x =

−1−7−1

, e1 =

212

, e2 =

−20−6

Solution. Compute the Gram matrix: (e1, e1) = 9, (e1, e2) = −16, (e2, e2) = 40, (x, e1) = −11,(x, e2) = 8. Then,(

9 −16 −11−16 40 8

)−→(

2 8 −14−16 40 8

)−→(

1 4 −7−16 40 8

)−→(

1 4 −70 104 −104

)−→(

1 4 −70 1 −1

)−→(

1 0 −30 1 −1

).

Therefore, λ1 = −3, λ2 = −1, and

λ1e1 + λ2e2 = −3

212

− −2

0−6

=

−4−3

0

.

n =

−1−7−1

− −4−3

0

=

3−4

1

,

and d2 = 32 + 42 + 12 = 26.

Example 8.6. For the vectors x, e1, e2, and e3 below, find the projection of the vector x ontothe linear hull of e1, e2, and e3. Evaluate the distance between the endpoint of x and the linearhull.

x =

16−1

1

, e1 =

−2

2−1−3

, e2 =

−4

4−3−5

, e3 =

−3

4−3−5

.

71

8.4 Orthogonal complement 8 EUCLIDEAN VECTOR SPACES

Solution. Compute the Gram matrix: (e1, e1) = 18, (e1, e2) = 34, (e1, e3) = 32, (e2, e2) = 66,(e2, e3) = 62,(e3, e3) = 59, (x, e1) = 8, (x, e2) = 18, (x, e3) = 19. Then, 18 34 32 8

34 66 62 1832 62 59 19

−→ 9 17 16 4

17 33 31 932 62 59 19

2[1]−[2]−−−−→ . . .

λ1e1 + λ2e2 + λ3e3 =

12−3−1

, n =

0422

, and

d2 = 24.

Example 8.7. For the function y = x3, find its projection onto the linear space generated by1, x and x2 with respect to Euclidean structure defined by

(f, g) =

1∫0

f(x)g(x)dx

.

Solution. (1, 1) = 1, (1, x) =∫ 1

0xdx = 1/2, (x, x) = (1, x2) =

∫ 1

0x2dx = 1/3, (1, x3) =

(x, x2) =∫ 1

0x3dx = 1/4, (x2, x2) = (x, x3) =

∫ 1

0x4dx = 1/5, (x2, x3) =

∫ 1

0x5dx = 1/6. 1 1

213

14

12

13

14

15

13

14

15

16

−→ 12 6 4 3

30 20 15 1240 30 24 20

−→ 12 6 4 3

30 20 15 101 1 .9 .8

−→ . . .

λ1 = .05, λ1 = −.6, λ2 = 1.5,

i.e., the projection of y = x3 onto the linear space generated by 1, x and x2 is y = .05−.6x+1.5x2.Spend some time with your graphic calculator to draw graphs of these two functions.

8.4 Orthogonal complement

Euclidean structure on a linear vector space gives rise to the notion of orthogonal subspaces,which in our colloquial understanding corresponds to perpendicular planes. Let V be a Eu-clidean vector space and let U ⊆ V be its subspace.

Definition 28. The orthogonal complement to U in V is a subset

U⊥ = {x ∈ V |∀y ∈ U : (x, y) = 0},

i.e., U⊥ consists of all vectors of V that are orthogonal to every vector in U .

72

8.4 Orthogonal complement 8 EUCLIDEAN VECTOR SPACES

For instance, if U is the line defined by x = y in R2 then U⊥ is the line x+ y = 0, which isperpendicular to the original line. If U is the line defined by x = y = z in R3 then U⊥ is theplane that passes through the origin at 90◦ to the original line.

It follows immediately from this definition that U⊥ is a correctly defined subspace of Vbecause if x1 and x2 are such that ∀y ∈ U : (x1, y) = (x2, y) = 0 then x1 ± x2 also have thesame property due to linear nature of the dot product.

Theorem 29.dim(U) + dim(U⊥) = dim(V )

Proof. One can choose an arbitrary base e1, . . . , ek, where k = dim(U), in U and complementit by vectors ek+1, . . . , en so that e1, . . . , en is a base of V . Applying the Gram-Schmidt or-thogonalization procedure, we get an orthogonal base e′1, . . . , e′n of V such that L(e′1, . . . , e

′k) =

L(e1, . . . , ek) = U . Let x be a vector that is orthogonal to all vectors in U and let x =x1e′1 + x2e

′2 + · · ·+ xne

′n be its decomposition over the base of V . Then, (x, e′i) = xi = 0 for all

1 ≤ i ≤ k. Therefore, x = xk+1e′k+1 + · · ·+ xne

′n, i.e., e′k+1, . . . , e

′n is the base of U⊥. Therefore,

dim(U⊥) = n− k = n− dim(U).

Corollary 29.1.(U⊥)⊥

= U

Proof. It is obvious that U is a subset of(U⊥)⊥. By the previous theorem, we get that

dim((U⊥)⊥)

= dim(V )−dim(U⊥) = dim(V )−(dim(V )−dim(U)) = dim(U), i.e., U =(U⊥)⊥.

Theorem 30. If U1 ⊆ U2 then U⊥2 ⊆ U⊥1 .

Proof. If x ∈ U⊥2 then (x, y) = 0 for all y ∈ U2. Since every vector of U1 also belongs to U2, wehave x ∈ U⊥1 , i.e., U⊥2 ⊆ U⊥1 .

Example 8.8. Find the orthogonal complement to the linear vector space generated by e1 and

e2 in R3, where e1 =

13−2

and e2 =

5−1−10

.

Solution. If x = (x1, x2, x3) be a vector in the orthogonal complement, then{x1 + 3x2 − 2x3 = 05x1 − x2 − 10x3 = 0

,

as x must be orthogonal to both e1 and e2. We get(1 3 −25 −1 −10

)−→(

1 3 −20 −16 0

)−→(

1 3 −20 1 0

)−→(

1 0 −20 1 0

).

The free variable is x3. By setting x3 = 1, we get x1 = 2 and x2 = 0. Hence, the vectorsthat belong to the orthogonal complement must be proportionate to x, i.e., the orthogonal

complement is the line generated by x =

201

.

73

8.5 Intersection and sum of vector spaces 8 EUCLIDEAN VECTOR SPACES

8.5 Intersection and sum of vector spaces

Given two linear subspaces, U1 and U2, of a vector space V , one may want to consider theirunion and intersection, U1 ∪ U2 and U1 ∩ U2. While the latter is a correctly defined linearsubspace (if x, y ∈ U1 ∩ U2 then x, y ∈ U1 and x, y ∈ U2, and thus x + y ∈ U1 and x + y ∈ U2,i.e., x+ y ∈ U1 ∩ U2), the former is not a linear subspace because x ∈ U1 and y ∈ U2 does notimply that x+ y belongs to either of the two subspaces.

However, one can define the sum U1 + U2 of two linear subspaces to be not precisely theunion, but the linear hull of U1 ∪ U2. This way we “force” it to be a linear subspace. Followingthis definition, U1 +U2 is a bigger set than U1∪U2, but it is the smallest of all linear subspacesof V that contain U1 ∪ U2.

If U1 and U2 are defined by their bases, it is easy to find the base of U1 +U2. Indeed, if U1 =L(e1, e2, . . . , ek) and U2 = L(f1, f2, . . . , fl) then U1+U2 is generated by e1, e2, . . . , ek, f1, f2, . . . , fl.If they are not linearly independent, we already know how to select a base (section 2.4).

Example 8.9. Find the sum of the vector spaces U1 = L(e1, e2) and U2 = L(f1, f2), where

e1 =

2110

, e2 =

0−1

11

, f1 =

112−1

, and f2 =

2−1

32

.

Solution. The problem is equivalent to building a base from a set vectors, i.e., we have toexamine whether or not e1, e2, f1, f2 are linearly independent and, if not, choose a base of theirlinear hull.

2 0 1 21 −1 1 −11 1 2 30 1 −1 2

−→

1 −1 1 −10 2 −1 40 2 1 40 1 −1 2

−→

−→

1 −1 1 −10 1 −1 20 0 1 00 0 3 0

−→

1 −1 1 −10 1 −1 20 0 1 00 0 0 0

.

In this case, the vectors e1, e2, f1, f2 are linearly dependent, but e1, e2, f1 form a base, so thesum of two 2-dimensional subspaces U1 and U2 is a three dimensional subspace spanned one1,e2, and f1. It means that U1 and U2 are “planes” in 4D situated so that they have a common“line” and U1 + U2 is three-dimensional.

Theorem 31.dim(U1 + U2) = dim(U1) + dim(U2)− dim(U1 ∩ U2).

Proof. Chose a base in U1∩U2. One can add vectors to this base to make it the base of U1 sinceU1 ∩ U2 ⊆ U1. Similarly, one can add vectors to the base of U1 ∩ U2 to make it the base of U2

as U1 ∩ U2 ⊆ U2, too. If we now combine the two bases, but without counting the base vectorsof U1 ∩U2 twice, we get the base of U1 +U2, which implies that dim(U1 +U2) + dim(U1 ∩U2) =dim(U1) + dim(U2).

74

8.5 Intersection and sum of vector spaces 8 EUCLIDEAN VECTOR SPACES

In the particular case when dim(U1 ∩ U2) = 0, which is possible only when U1 ∩ U2 = {0},i.e., the zero vector is the only intersection of U1 and U2, the sum of vector spaces is calleda direct sum and is denoted by U1 ⊕ U2. For the direct sum of vector spaces, the dimensionformula becomes

dim(U1 ⊕ U2) = dim(U1) + dim(U2).

Computations are not that straightforward for the intersection. Indeed, if U1 and U2 arethe same as in example 8.9, the base vectors of U1 ∩ U2 could be neither of e1,e2, f1, or f2.However, it can be found very easily by using orthogonal complements and corollary 8.4.

Theorem 32.(U1 + U2)⊥ = U⊥1 ∩ U⊥2 .

Proof. Since U1 ⊆ U1∪U2 ⊆ U1+U2, we have (U1+U2)⊥ ⊆ U⊥1 . Similarly, we have (U1+U2)⊥ ⊆U⊥2 and, therefore, (U1 +U2)⊥ ⊆ U⊥1 ∩U⊥2 . On the other hand, if x ∈ U⊥1 ∩U⊥2 then (x, y1) = 0for every y1 ∈ U1 and also (x, y2) = 0 for every y2 ∈ U2. Therefore, U⊥1 ∩U⊥2 ⊆ (U1 +U2)⊥, i.e.,it is actually an equality.

Theorem 33.(U1 ∩ U2)⊥ = U⊥1 + U⊥2 .

Proof. Similar to the proof of theorem 8.5.

Example 8.10. Find the intersection of vector spaces U1 = L(e1, e2) and U2 = L(f1, f2), wheree1, e2, f1, and f2 are the same as in example 8.9.

Solution. First, let us find U⊥1 and U⊥2 .(2 1 1 00 −1 1 1

)−→(

2 0 2 10 −1 1 1

)−→(

1 0 1 12

0 1 −1 −1

)

The base of U⊥1 is a1 =

−1

110

and a2 =

−1

202

. Similarly,

(1 1 2 −12 −1 3 2

)−→(

1 1 2 −10 −3 −1 4

)−→(

3 3 6 −30 −3 −1 4

)−→(

3 0 5 10 3 1 −4

)

The base of U⊥2 is b1 =

−5−1

30

and b2 =

−1

403

. Now we find the base of their sum and,

simultaneously, the base of(U⊥1 + U⊥2

)⊥.−1 1 1 0−1 2 0 2−5 −1 3 0−1 4 0 3

−→

1 −1 −1 00 1 −1 20 −6 −2 00 3 −1 3

−→

1 −1 −1 00 1 −1 20 0 −8 120 0 2 −3

−→75


−→

2 −2 −2 00 2 −2 40 0 2 −30 0 0 0

−→ 2 −2 0 −3

0 2 0 10 0 2 −3

−→ 2 0 0 −2

0 2 0 10 0 2 −3

.

Thus, the base of(U⊥1 + U⊥2

)⊥= U1 ∩ U2 is x =

2−1

32

, which is the same as the vector

f2, and which turns out to be the direction vector of the line that is common to both U1 andU2.

8.6 Exercises

Problem 8.1. Use Gram-Schmidt orthogonalization process to find an orthogonal base in thelinear hull of the following system of vectors

(a) e1 =

113

, e2 =

−114

, e3 =

−421

(b) e1 =

321

, e2 =

−113

, e3 =

1−2−1

(c) e1 =

322

, e2 =

−11−3

, e3 =

03−1

(d) e1 =

210

, e2 =

−401

, e3 =

−3−2−1

(e) e1 =

2231

, e2 =

2332

, e3 =

1−2−2−2

(f) e1 =

3412

, e2 =

4133

, e3 =

4231

76


(g) e1 =

31311

, e2 =

0−1−4

1−3

, e3 =

1−1−1

03

, e4 =

2−4

3−2−4

(h) e1 =

10133

, e2 =

4−4−2

24

, e3 =

3−2−3

12

, e4 =

−2−3−2

10

Problem 8.2. Find the Gram matrix of the following vectors

(a) e1 =

311

, e2 =

11−2

, e3 =

2−1

3

(b) e1 =

222

, e2 =

−101

, e3 =

−21−1

(c) e1 =

131

, e2 =

−321

, e3 =

010

(d) e1 =

112

, e2 =

3−2

2

, e3 =

31−1

(e) e1 =

3222

, e2 =

−2

2−2

1

, e3 =

1−1−4−1

, e4 =

−3−1−3

3

(f) e1 =

1243

, e2 =

−1

3−2−1

, e3 =

0−2

01

, e4 =

12−3

3

(g) e1 =

0132

, e2 =

3−1

3−3

, e3 =

2−1

2−1

, e4 =

11−4−4

77


(h) e1 =

2411

, e2 =

−4

2−3

0

, e3 =

0−1−2−2

, e4 =

301−1

Problem 8.3. For the vectors x, e1, e2, . . . , ek below, find the projection of the vector x ontothe linear hull of e1, e2, . . . , ek. Evaluate the distance between the endpoint of x and the linearhull.

(a) x =

−1−6−2

, e1 =

100

, e2 =

2−1

1

(b) x =

−2−7

4

, e1 =

42−4

, e2 =

−1−1−1

(c) x =

33−3

, e1 =

−330

, e2 =

2−1−1

(d) x =

−42−3

, e1 =

−1−5−4

, e2 =

011

(e) x =

3442

, e1 =

403−3

, e2 =

−1

3−3

3

, e3 =

−2−5

4−4

(f) x =

1−6−3−3

, e1 =

−5−2−1

3

, e2 =

01−1−3

, e3 =

3−2

04

(g) x =

−4

20−2

, e1 =

−4

1−5

4

, e2 =

−5

4−5

5

, e3 =

1−4

0−1

(h) x =

5−4−3−4

, e1 =

−2

430

, e2 =

2−1−3

0

, e3 =

1−5−3

0

Problem 8.4. Find the orthogonal complement to the linear vector space generated by thefollowing vectors in R4

78


(a) a1 =

31−5

3

, a2 =

1117

(b) a1 =

11−6−7

, a2 =

12

−12−8

(c) a1 =

1−2−5

4

, a2 =

152

11

(d) a1 =

4−318−5

, a2 =

2145

(e) a1 =

10548

, a2 =

22−1

8

, a3 =

2−4

58

(f) a1 =

7−8−4−13

, a2 =

7−8−5−16

, a3 =

3−1

517

Problem 8.5. Find the intersection of linear vector spaces U1 = L(e1, e2, . . . , ek) and U2 =L(f1, f2, . . . , fl) for the vector sets listed below.

(a) a1 =

−90−9

, a2 =

−7−8−3

,

b1 =

−4−2−3

, b2 =

−46

11

(b) a1 =

−1−3−3

, a2 =

553

,

b1 =

00−1

, b2 =

−1−3−2

79


(c) a1 =

−31−3

, a2 =

1−3

1

,

b1 =

−79−5

, b2 =

102

10

(d) a1 =

197

, a2 =

26−1

,

b1 =

4−4

1

, b2 =

0−4−5

(e) a1 =

1454

, a2 =

−1

0−3−4

, a3 =

4596

,

b1 =

10116

−12

, b2 =

653−7

, b3 =

0−4−1

1

(f) a1 =

−2

40−4

, a2 =

16−1−3

, a3 =

−4

42−3

,

b1 =

5−4−6−2

, b2 =

446

12

, b3 =

−3

4−2−8

(g) a1 =

−8−4

1−1

, a2 =

−8−7

15

, a3 =

−4−8

48

,

b1 =

0212

, b2 =

−8−7

15

, b3 =

−10−4−5−4

(h) a1 =

24−2−6

, a2 =

−7

44−4

, a3 =

4−1

5−1

,

80


b1 =

10−1−4−6

, b2 =

1110

−10

, b3 =

42−3

1

81

9 ANSWERS TO PROBLEMS

9 Answers to problems

[1.1] (a) x1 = 2, x2 = 2, x3 = 1, x4 = −5. (b) x1 = 2, x2 = −3, x3 = −1, x4 = −2. (c)x1 = −2, x2 = 0, x3 = 1, x4 = −4, x5 = −4. (d) x1 = 0, x2 = −4, x3 = 0, x4 = 3. (e) x1 = −1,x2 = 2, x3 = −2, x4 = −2. (f) x1 = 4, x2 = −2, x3 = 4, x4 = 3. (g) x1 = 0, x2 = 4, x3 = 1,x4 = −2. (h) x1 = 0, x2 = 0, x3 = −4, x4 = −3. (i) x1 = 3, x2 = 0, x3 = 1, x4 = −4. (j)x1 = −3, x2 = −3, x3 = 0, x4 = −1.

[2.1] No answer given.[2.2] (a) λ1 = −16, λ2 = −1, λ3 = −14. (b) λ1 = 3, λ2 = −2, λ3 = 0. (c) x1 = 3, x2 = −1,

x3 = −1, x4 = 1. (d) x1 = −1, x2 = 1, x3 = −2, x4 = 2. (e) x1 = 2, x2 = −5, x3 = −8,x4 = −6.

[3.1] (a) 1 (b) 1 (c) 4 (d) 2 (e) 1 (f) 2 (g) 2 (h) 1 (i) 2 (j) 2[3.2]

(a) x =

0−3

00

+ λ1

−2

310

+ λ2

6−7

01

(b) x =

−5−5

00

+ λ1

−4−3

10

+ λ2

−1

201

(c) x =

−2

300

+ λ1

2210

+ λ2

1−3

01

(d) x =

−2

0000

+ λ1

04−4

10

+ λ2

−3−5

801

(e) x =

0−2

300

+ λ1

6−6−5

10

+ λ2

6−1

201

82


(f) x =

−7−1

000

+ λ1

5−4

410

+ λ2

5−5

601

(g) x =

17000

+ λ1

−2−5

100

+ λ2

63010

+ λ3

−1

2001

(h) x =

2−2

000

+ λ1

2−4

100

+ λ2

5−7

010

+ λ3

−3

5001

(i) x =

42000

+ λ1

−1

3100

+ λ2

−7−4

010

+ λ3

10001

(j) x =

11000

+ λ1

−2−5

100

+ λ2

11010

+ λ3

−7

0001

[4.1] (a) −1152 (b) 4 (c) −96 (d) −288 (e) −48 (f) 6 (g) 240 (h) 9 (i) −4 (j) 48 (k) 10 (l) 144

[4.2] (a) 2 (b) 2 (c) 4 (d) 2 (e) 2 (f) 3[4.3] (a) x1 = 1, x2 = −1, x3 = 4. (b) x1 = 3, x2 = −2, x3 = −3. (c) x1 = −4, x2 = 1,

x3 = −5. (d) x1 = 2, x2 = 4, x3 = −2, x4 = −5. (e) x1 = −5, x2 = 0, x3 = 4, x4 = 2. (f)x1 = 1, x2 = 3, x3 = −2. (g) x1 = 1, x2 = −2, x3 = 0. (h) x1 = −1, x2 = −2, x3 = 1, x4 = 1.

[5.1] (a)

−2 −3 −11 −7 −12−9 −9 −12 5 −5−1 −1 −5 −3 −5−1 −1 −1 1 0

2 3 2 −2 1

(b)

−5 1 5 −3−1 8 2 −1

0 1 0 0−11 −5 10 −6

(c)

4 11 −2 122 6 −1 6−1 0 1 0−3 −4 2 −5

(d)

−1 1 2 −2−10 −1 7 −1

8 1 −5 00 −3 −4 6

83


(e)

−1 4 −2−2 0 −1

0 3 −1

(f)

1 −2 1 −1

10 −12 11 −113 −1 4 −37 −10 7 −8

(g)

1 −2 00 −5 3−2 12 −5

(h)

−3 −1 6 10 −10−2 −1 3 5 −5−5 −3 5 8 −8

8 6 −5 −5 6−7 −4 7 12 −12

(i)

−3 −9 −6 10 −7−2 9 5 −6 −2

1 0 0 −1 2−1 −5 −4 4 −2

0 11 5 −11 3

(j)

−6 −2 12 −1 −8

2 2 1 7 62 1 −2 2 3−1 −1 0 −3 −3−2 1 8 7 2

[5.2] (a)

−1 −12 −1012 −12 −28−8 −1 10

(b)

−3 −6 −4 −6

1 23 36 5727 45 18 262 17 23 37

(c)

4 −17 −26 −22 −20−1 −14 −14 −18 −14−8 −39 −27 −37 −25

9 −48 −70 −61 −55

(d)

2 4 −32 16 12 9 −3−1 7 6

4 3 −8

(e)

1 −1 −20 2 27 −7 −13

(f)

−7 19 0 −20 −12−7 15 −2 −15 −1026 −31 22 22 2820 −11 29 −2 18

[6.1] (a)λ1 = 7, λ2 = 6 (b)λ1 = 1, λ2 = −1 (c)λ1 = 0, λ2 = −4 (d)λ1 = −6, λ2 = −5

(e)λ1 = 1, λ2 = −6, λ3 = 2 (f)λ1 = −2, λ2 = −7, λ3 = −6 (g)λ1 = −7, λ2 = 3, λ3 = −5(h)λ1 = −4, λ2 = −2, λ3 = −2, λ4 = −1 (i)λ1 = −4, λ2 = −1, λ3 = −5, λ4 = −2 (j)λ1 = 6,λ2 = −2, λ3 = 2

[6.2]

(a)(

1 −1−2 3

)×(

3 −64 −7

)×(

3 12 1

)=

(−1 0

0 −3

)(b)

(−1 2

0 −1

)×(

5 40 7

)×(−1 −2

0 −1

)=

(5 00 7

)(c)

(3 21 1

)×(−6 −2

3 −1

)×(

1 −2−1 3

)=

(−4 0

0 −3

)(d)

(1 00 1

)×(

6 140 −8

)×(

1 00 1

)=

(6 00 −8

)84


(e)

−1 0 11 1 −10 −1 −1

× −2 −6 −2

2 2 −2−2 −6 −2

× −2 −1 −1

1 1 0−1 −1 −1

=

0 0 00 2 00 0 −4

(f)

1 1 −1−1 1 0−1 0 1

× 3 −8 3−3 −2 3

2 −8 4

× 1 −1 1

1 0 11 −1 2

=

−2 0 00 6 00 0 1

(g)

1 0 00 0 −10 1 1

× −7 0 −10

0 7 40 0 3

× 1 0 0

0 1 10 −1 0

=

−7 0 00 3 00 0 7

(h)

−2 2 1−2 1 2

1 −1 0

× −2 5 −8

2 1 −8−6 6 −1

× 2 −1 3

2 −1 21 0 2

=

−1 0 00 3 00 0 −4

(i)

−71 −65 −6−1 −2 212 11 1

× −4 2 −2−1 −7 2−1 −2 −3

× −24 −1 −142

25 1 14813 1 77

=

−5 0 00 −4 00 0 −5

(j)

1 0 11 1 03 1 3

× 3 7 −3

3 −1 3−5 −7 1

× 3 1 −1−3 0 1−2 −1 1

=

−2 0 00 6 00 0 −1

(k)

−1 1 0 1

1 2 1 00 1 1 0−1 2 0 1

×

3 −2 3 −2−2 6 0 2

2 −8 −2 −23 −6 3 −2

×

1 1 −1 −1−1 0 0 1

1 0 1 −13 1 −1 −2

=

2 0 0 00 1 0 00 0 −2 00 0 0 4

(l)

−2 −3 3 2

5 −1 −6 −131 0 −1 −2−1 −1 1 1

×

0 6 1 8−8 −8 1 0−6 0 −7 −2

3 3 4 5

×

0 −1 7 11 1 −8 −52 1 −8 −7−1 −1 7 4

=

=

−6 0 0 0

0 1 0 00 0 0 00 0 0 −5

[7.1]

(a) 2x21 + x2

2 − 5x23

(b) −5x21 − 2x2

2 + x23

(c) −5x21 − 3x2

2 + 3x23

(d) −x21 − 2x2

2 + 2x23

85


(e) 2x21 − 3x2

2 + 3x23 − 5x2

4

(f) −5x21 − 4x2

2 + x23 + 3x2

4

(g) −3x21 + 2x2

2 + 3x23 + 2x2

4

(h) x21 − 3x2

2 + 4x23 + x2

4

[7.2] (a) n+ = 2, n− = 0, positive semidefinite (b) n+ = 1, n− = 2, not definite (c) n+ = 2,n− = 1, not definite (d) n+ = 2, n− = 1, not definite (e) n+ = 1, n− = 3, not definite (f)n+ = 1, n− = 3, not definite(g) n+ = 0, n− = 4, negative definite(h) n+ = 0, n− = 3, negativesemidefinite

[8.1]

(a) f1 =

113

, f2 =

−23−1

8

, f3 =

−17−2

(b) f1 =

321

, f2 =

−214

, f3 =

1−2

1

(c) f1 =

322

, f2 =

431−37

, f3 =

−875

(d) f1 =

210

, f2 =

−485

, f3 =

−12−4

(e) f1 =

2231

, f2 =

−2

4−3

5

, f3 =

434

−30−4

(f) f1 =

3412

, f2 =

9

−14138

, f3 =

93

13−26

(g) f1 =

31311

, f2 =

15−2−13

12−16

, f3 =

12−13−18

229

, f4 =

7

−269

−14−8

86


(h) f1 =

10133

, f2 =

3−4−3−1

1

, f3 =

3

10−12

4−1

, f4 =

−30−16−15

17−2

[8.2]

(a)

11 2 82 6 −58 −5 14

(b)

12 0 −40 2 1−4 1 6

(c)

11 4 34 14 23 2 1

(d)

6 5 25 17 52 5 11

(e)

21 −4 −9 −11−4 13 3 13−9 3 19 7−11 13 7 28

(f)

30 −6 −1 2−6 15 −7 8−1 −7 5 −1

2 8 −1 23

(g)

14 2 3 −192 28 16 23 16 10 −3

−19 2 −3 34

(h)

22 −3 −8 6−3 29 4 −15−8 4 9 0

6 −15 0 11

[8.3]

87


(a) p =

−1−2

2

, d2 = 32

(b) p =

−5−3

3

, d2 = 26

(c) p =

22−4

, d2 = 3

(d) p =

−1−1

0

, d2 = 27

(e) p =

341−1

, d2 = 18

(f) p =

1−2−5−1

, d2 = 24

(g) p =

−1

201

, d2 = 18

(h) p =

5−4−3

0

, d2 = 16

[8.4]

(a) b1 =

3−4

10

, b2 =

2−9

01

(b) b1 =

0610

, b2 =

6101

88


(c) b1 =

3−1

10

, b2 =

−6−1

01

(d) b1 =

−3

210

, b2 =

−1−3

01

(e) b =

−8

881

(f) b =

−1−1−3

1

[8.5]

(a) c =

423

(b) c =

−1−3−3

(c) c =

−5−1−5

(d) c =

0−4−5

(e) c1 =

−2−1−1

2

, c2 =

−2−3−2

2

(f) c1 =

4−4−2

3

, c2 =

−3−2

1−1

(g) c1 =

24−2−4

, c2 =

4−1

33

(h) c1 =

4−1−5

1

, c2 =

−5−1

13

89

10 SAMPLE MIDTERM EXAM I

10 Sample Midterm Exam I

SECT ION I: MULT IPLE CHO ICE

Directions: Each of the following problems is followed by five choices. Select the best choice and putthe corresponding mark in your answer sheet. Calculators may NOT be used at any part of theexam.

1. Which of the following statements are true for any pair of non-degenerate n× n matricesA and B?

I. (AB)T = ATBT

II. (AB)T = BTAT

III. (AT )−1 = (A−1)T

(A) I only(B) II only(C) I and III(D) II and III(E) None of the above options

2. Let A = (aij) be a 5×5 matrix such that a11 = 1, a22 = 2, and detA = 0. The largest(by absolute value) third-order minor of A is

(A) 0(B) 1(C) 2(D) 3(E) Could be any number

3. Let {a, b} and {c, d} be two sets of vectors, both linearly independent. Which of thefollowing MUST be true?

I. {a, b, c, d} is also linearly independent.

II. Both {a, c} and {b, d} are linearly independent

III. Either {a, c} or {b, d} is linearly independent

(A) I(B) II(C) III(D) All are true(E) Neither is true

90


4. Let A be a matrix such that A · B = 0 for any compartible matrix B. Which of thefollowing MUST be true?

I. detA = 0

II. A = 0

III. Rows of A are linearly dependent

(A) I(B) II(C) III(D) I & III(E) II & III

5. Let (x1, x2, x3) be a solution to the following SLE7x1 + 5x2 = 03x1 + x2 + x3 = −53x1 + 2x2 − x3 = 5

Then x3 is(A) 3(B) -3(C) 5(D) -5(E) None of these

6. The determinant

∣∣∣∣∣∣−1 −12 −8

0 4 80 8 −8

∣∣∣∣∣∣ is equal to(A) 0(B) 4(C) 52(D) 64(E) None of these

7. Let A be the matrix that is inverse to

6 −5 −2−1 0 −1−2 2 1

. Then a22 =

(A) 1(B) 2(C) 3(D) 5(E) None of these

91


8. Let A be a 3×4 matrix and B be a 4×5 matrix. rk(A ·B) is not greater than(A) 0(B) 3(C) 4(D) 5(E) Could be any number

9. Let A, B, and C be square matrices such that A ·B = E and B · C = E, where E is theidentity matrix. Which of the following MUST be true?

I. det(A) = det(C)

II. A = C

III. If A = B then | det(A)| = 1

(A) I(B) II(C) III(D) I & II(E) I, II, and & III

10. The rank of the matrix

2 0 12 6−3 0 2 3−7 −4 −4 −1

is

(A) 0(B) 1(C) 2(D) 3(E) 4

92


SECT ION I I: FREE RESPONSE

Directions: Show all your work. Indicate clearly the methods you use because you will be graded onthe correctness of your methods as well as on the accuracy of your results and explanations. You mayuse the back of this sheet for writing.

1. Evaluate the following determinant using row or column decomposition.∣∣∣∣∣∣∣∣12 4 12 −2−7 10 4 1

6 −3 16 −113 −18 16 −2

∣∣∣∣∣∣∣∣2. Solve the following matrix equation. Explain all steps in your calculations. −9 −11 10

9 10 −98 9 −8

X =

−3 4−5 −4

1 1

3. Find the fundamental set of solutions to the following system of linear equations. Indicatethe dimension of the solution space.

x1 + 2x2 − 2x3 − 17x4 − 3x5 = 03x1 − 5x2 + 2x3 − 11x4 − x5 = 14x1 − 4x2 − 7x4 − x5 = 10

93


ANSWERS

Multiple Choice:

1. D

2. E

3. E

4. E

5. D

6. E

7. B

8. B

9. E

10. D

Free Response:

1. det = −60

2.

−14 −549 4141 41

3. For instance, x =

2−2−1

00

+ λ1

70−5

10

+ λ2

10−1

01

dim = 2Note: the answer is not unique.

94

11 SAMPLE MIDTERM EXAM II

11 Sample Midterm Exam II



1. Which of the following sets form linear vector spaces with respect to the function additionoperation?

I. {y = f(x) | f ′(0) = 1}II. {y = f(x) | f ′(1) = 0}III. {y = f(x) | f ′(0) = f ′(1)}

(A) I(B) II(C) I & II(D) II & III(E) I, II, and III

2. Let A = (aij) be a 4×4 matrix such that a11 = 1, a22 = 2, and detA = 0. The smallest(by absolute value) third-order minor of A is

(A) 0(B) 1(C) 2(D) 3(E) Could be any number

3. Which of the following sets form linear vector spaces with respect to matrix addition?

I. The set of all 2×2 matrices that are equal to their transpose

II. The set of all 3×3 matrices with the property detA = 0

III. The set of all 4×4 matrices with the property detA = 1.

(A) I(B) II(C) III(D) I & II(E) I and III

95


4. Let A = (aij) be a 3×4 matrix such that a11 = 0. Whe rank of A could be each of thefollowing EXCEPT

(A) 0(B) 1(C) 2(D) 3(E) 4

5. Let a, b, and c be vectors in R3 that are linearly independent. Which of the followingMUST be true?

I. Vectors a and b also form a linearly independent set of vectors

II. The set of vectors {a, b, c} is full in R3

III. The set {a, b, c, d} is linearly dependent for any vector d ∈ R3

(A) I(B) II(C) I & II(D) I & III(E) I, II and III

6. Let A and B be non-degenerate 6×6 matrices. Which of the following MUST be true?

I. (A ·B)−1 = A−1 ·B−1

II. (A ·B)−1 = B−1 · A−1

III. The transpose to A−1 is the inverse to AT.

(A) I(B) II(C) I & III(D) I & II(E) II & III

7. Let (x1, x2, x3) be a solution to the following SLE2x1 − 7x2 − 5x3 = 4x1 + x2 + 2x3 = −76x1 − 5x2 + 5x3 = −12

Then x1 is(A) -7

96


(B) -4(C) 0(D) 7(E) None of these

8. The determinant

∣∣∣∣∣∣−3 0 12

4 1 1−11 −2 1


9. If A is the matrix inverse to

−5 −2 29 5 −34 1 −2

then a13 =

(A) 4(B) -4(C) 11(D) -11(E) None of these

10. The rank of the matrix

−10 0 −6 −45 −10 −1 −4−5 10 1 4

is

(A) 0(B) 1(C) 2(D) 3(E) 4

97




1. Evaluate the following determinant using row or column decomposition.∣∣∣∣∣∣∣∣10 5 −18 −411 5 −17 −2−2 0 13 311 10 10 1

∣∣∣∣∣∣∣∣2. Solve the following matrix equation. Explain all steps in your calculations.

X

3 11 −11−1 −2 3−1 −4 4

=

(2 4 −3−5 −3 −2

)

3. Find the fundamental set of solutions to the following system of linear equations. Indicatethe dimension of the solution space.

x1 + 5x2 + 5x3 + x4 − 5x5 = −1x1 − 5x2 − 7x3 + 13x4 + x5 = 32x2 + 5x3 − 5x4 − 9x5 = −6

98


ANSWERS

Multiple Choice:

1. D

2. A

3. A

4. E

5. E

6. E

7. A

8. D

9. B

10. C

Free Response:

1. det = −100

2.(

6 1 15−27 −5 −71

)

3. For instance, x =

−1

2−2

00

+ λ1

−6

0110

+ λ2

5−3

301

dim = 2Note: the answer is not unique.

99

12 SAMPLE MIDTERM EXAM III

12 Sample Midterm Exam III



1. Let A be the inverse matrix for

−4 3 5−1 1 2−3 3 5

. Then a13 =

(A) 1(B) 0(C) -1(D) 3(E) None of these

2. Let A and B be two 7×7 matrices such that rk(A) = 2 and rk(B) = 3. Which of thefollowing COULD be true?

I. rk(A+B) = 0

II. rk(A+B) = 1

III. rk(A+B) = 6

(A) I only(B) II only(C) I and II(D) I, II, and III(E) None

3. The determinant

∣∣∣∣∣∣7 1 46 0 −6

10 3 15


4. Let a, b, c, and d be four vectors such that {a, b, c} is linearly dependent and {b, c, d} isalso linearly dependent. Which of the following MUST be true?

I. {a, b, d} is linearly dependent

II. dimL(a, b, c, d) ≤ 2

100


III. d− a can be expressed as a linear combination of b and c

(A) I only(B) I and II(C) I, II, and III(D) III only(E) None

5. The rank of the following matrix is equal to −3 2 2 1 −49 −6 −6 −3 12

−15 10 10 5 −20

(A) 1(B) 2(C) 3(D) 4(E) 5

6. Let a, b, c, and d be four vectors such that {b, c} is linearly independent, {a, b, c} islinearly dependent, and {b, c, d} is also linearly dependent. Which of the following MUSTbe true?

I. {a, b, d} is linearly dependent

II. dimL(a, b, c, d) ≤ 2

III. d− a can be expressed as a linear combination of b and c

(A) I(B) I and II(C) I, II, and III(D) III only(E) None

7. If A=

(−1 4

3 1

), B=

(2 31 0

), and C = BT × A, then c21 =

(A) 1(B) 9(C) -3(D) 12(E) -1

8. The dimension of the space of solutions of x1 + 3x2 + 15x3 + 12x4 + 15x5 = 0 is(A) 1(B) 2(C) 3

101


(D) 4(E) 5

9. Let (x1, x2, x3) be a solution to the following SLE8x1 + 7x2 − 3x3 = −29x1 + 3x2 + 4x3 = −12x1 − 3x2 − 5x3 = −8

Then x1 + x2 + x3 =(A) -2(B) 0(C) 2(D) 10(E) None of these

10. Which of the following statements are true for any pair of non-degenerate n× n matricesA and B?

I. (B−1AB)−1 = B−1A−1B

II. (ATBT )−1 = (A−1B−1)T

III. (ATBT )−1 = (B−1A−1)T

(A) I(B) I and II(C) I and III(D) III only(E) None is true

102




1. ∣∣∣∣∣∣∣∣∣∣−3 0 −7 −1 7−6 0 0 0 0−7 0 −1 0 4

9 −1 0 5 42 −1 5 6 −2

∣∣∣∣∣∣∣∣∣∣2. Solve the following matrix equation. Explain each step in your solution. −3 1 −1

0 1 1−1 −2 −3

X =

−2 02 −6−7 1

3. Find the fundamental set of solutions to the following system of linear equations. Indicate

the dimension of the solution space.5x1 − 7x2 − 8x3 − 7x4 − 21x5 + 7x6 = −103x1 − 4x2 − 4x3 + 2x4 − 10x5 + 6x6 = −34x1 − 5x2 − 3x3 + 22x4 − 5x5 + 14x6 = 7

103


ANSWERS

Multiple Choice:

1. A

2. B

3. D

4. E

5. A

6. C

7. C

8. D

9. B

10. B

Free Response:

1. det = −42

2. X =

−2 −28−3 −45

5 39

3. x =

−5−9

6000

+ λ1

−6

5−9

100

+ λ2

23−4

010

+ λ3

−2

3−3

001

104

13 SAMPLE FINAL EXAM I

13 Sample Final Exam I



1. Let A be a 3×3 matrix and let B =

5 0 00 1 00 0 1

. If A is multiplied by B from the left

then

(A) The 1st column of A is multipled by 5(B) The 1st row of A is multipled by 5(C) The 1st column of A is divided by 5(D) The 1st row of A is divided by 5(E) None of these

2. The signature of the quadratic form −10x21 − 6x1x2 + 6x1x3 − x2

2 + 2x2x3 − x23 is

(A) -2(B) -1(C) 0(D) 1(E) 2

3. Let A be a 3×3 matrix with eigenvalues λ1, λ2, and λ3 such that λ1 6= λ2, λ1 6= λ3, andλ2 6= λ3. Which of the following MUST be true?

I If a and b are eigenvectors corresponding to λ1 then the set {a, b} is linearly depen-dent

II. If a1 and a2 are eigenvectors corresponding to λ1 and λ2, respectively, then {a1, a2}is linearly independent

III. det(A) 6= 0

(A) I(B) II(C) III(D) I and II(E) II and III

105


4. Which of the following could be the set of eigenvalues of

−2 −4 −42 −2 2−2 4 0

?

(A) λ1 = 2, λ2 = −4, λ3 = −2(B) λ1 = 2, λ2 = −4, λ3 = 2(C) λ1 = 2, λ2 = 4, λ3 = −2(D) λ1 = 2, λ2 = 4, λ3 = 2(E) None of these

5. The quadratic form 8x21 + 16x1x2 + 16x1x3 + 4x2

2 + 8x2x3 + 8x23 is

(A) positively definite(B) negatively definite(C) non-positively definite(D) non-negatively definite(E) not definite

6. Let d be the distance between the endpoint of vector x and the linear space generated by

vectors x =

13−1

3

, e1 =

−2−2−4

, e2 =

4−3

1

Then, d2 =


7. Let A be a 3×3 matrix. Which of the following MIGHT be true?

I. A has no real eigenvalues

II. A is diagonalizable but A2 is not

III. A2 is diagonalizable but A is not

(A) I(B) II(C) III(D) I and II(E) I and III

8. Which of the following is NOT an eigenvalue of A =

−3 −3 1 −1

0 −5 1 00 0 −4 00 −3 3 −2

?

(A) -2

106


(B) -3(C) -4(D) -5(E) -6

9. The element g12 of the Gram matrix of e1 =

013

, e2 =

−421

, e3 =

102

is

(A) 5(B) 6(C) -5(D) 21(E) None of these

10. Let A be a 3×3 matrix with the property A = A2. Which of the following MUST betrue?

I det(A) = 0

II. det(A) = 1

III. A is the identity matrix

(A) I(B) II(C) III(D) II and III(E) None is true

107




1. [6 pts] For the following matrix A find matrices C and C−1 such that C−1AC is diagonaland write a product expression for eA. Show your work.

A =

−1 3 −12 −5 −22 3 −4

2. [4 pts] Use Gram-Schmidt orthogonalization process to find an orthogonal base in the

linear hull of the following system of vectors. Show your work.

e1 =

1433

, e2 =

−3

4−1−1

, e3 =

1110

3. [4 pts] For the vectors x, e1, e2, and e3 below, find the projection of the vector x onto

the linear hull of e1, e2, and e3. Evaluate the distance between the endpoint of x and thelinear hull.

x =

−4−14

0−6

, e1 =

3−4

3−5

, e2 =

−2

0−1

5

, e3 =

0−2−1

3

108


ANSWERS

Multiple Choice:

1. B

2. B

3. D

4. A

5. E

6. C

7. C

8. E

9. A

10. E

Free Response:

1.

−4 −1 3−1 0 1−1 1 1

× −1 3 −1

2 −5 −22 3 −4

× −1 4 −1

0 −1 1−1 5 −1

=

−2 0 00 −3 00 0 −5

2. f1 =

1433

, f2 =

−2

2−1−1

, f3 =

843

−11

3. p =

4

−104−2

, d2 = 112

109

14 SAMPLE FINAL EXAM II

14 Sample Final Exam II



1. Let A be a 3×3 matrix and let B =

1 0 00 3 00 0 1

. If A is multiplied by B from the left

then

(A) The 2nd column of A is multipled by 3(B) The 2nd row of A is multipled by 3(C) The 2nd column of A is divided by 3(D) The 2nd row of A is divided by 3(E) None of these

2. The signature of the quadratic form 4x21 − 8x1x2 + 16x1x3 + 9x2

2 − 16x2x3 + 10x23 is

(A) -2(B) -1(C) 0(D) 1(E) 2

3. Let A be a 3×3 matrix with eigenvalues λ1, λ2, and λ3 such that λ1 6= λ2, λ1 6= λ3, andλ2 6= λ3. Which of the following MUST be true?

I There exists a basis of R3 that consists of eigenvectors of A

II. A is diagonalizable

III. Rows of A are linearly independent

(A) I(B) II(C) I and II(D) II and III(E) I and III

110


4. Which of the following could be the set of eigenvalues of

5 −3 33 −5 3−3 −1 −1

?

(A) λ1 = 2, λ2 = 1, λ3 = 2(B) λ1 = 2, λ2 = −1, λ3 = 2(C) λ1 = 2, λ2 = 1, λ3 = −2(D) λ1 = 2, λ2 = −1, λ3 = −2(E) None of these

5. The quadratic form 8x21 − 2x1x2 + 2x1x3 + x2

2 − 2x2x3 − 7x23 is

(A) positively definite(B) negatively definite(C) non-positively definite(D) non-negatively definite(E) not definite

6. Let d be the distance between the endpoint of vector x and the linear space generated by

vectors e1, and e2, where x =

−2−7−11

, e1 =

233

, e2 =

−2−1−1

Then, d2 =


7. Which of the following is NOT an eigenvalue of A =

3 0 0 03 0 0 05 −5 −5 −40 0 0 −3

?

(A) -5(B) -3(C) 0(D) 3(E) 5

8. Let A be a 4×4 matrix with det(A) 6= 0. Which of the following MIGHT be true?

I. A has no real eigenvalues

II. λ = 0 is an eigenvalue of A

III. A and A2 have the same set of eigenvalues

111


(A) I(B) II(C) III(D) I and II(E) I and III

9. The element g32 of the Gram matrix of e1 =

230

, e2 =

2−1

1

, e3 =

−34−1

is

(A) -11(B) 1(C) 6(D) 13(E) None of these

10. Let A be a 3×3 matrix whose characteristic polynomial f(λ) = 1 + λ− λ2 − λ3. Whichof the following MUST be true?

I rk(A− λE) = 3 for some λ

II. A−1 exists

III. A is NOT diagonalizable

(A) I(B) II(C) I and II(D) II and III(E) I, II, and III

112




1. [6 pts] For the following matrix A find matrices C and C−1 such that C−1AC is diagonaland write a product expression for A100. Show your work.

A =

−1 5 3−3 3 3

3 1 −1



e1 =

1213

, e2 =

−3

0−3

1

, e3 =

03−3

1

3. [4 pts] For the vectors x, e1, e2, and e3 below, find the projection of the vector x onto

the linear hull of e1, e2, and e3. Evaluate the distance between the endpoint of x and thelinear hull.

x =

3221

, e1 =

−4

440

, e2 =

221−4

, e3 =

−3

231

113


ANSWERS

Multiple Choice:

1. B

2. D

3. C

4. D

5. E

6. C

7. E

8. E

9. A

10. C

Free Response:

1.

−3 4 31 −3 −2−1 1 1

× −1 5 3−3 3 3

3 1 −1

× −1 −1 1

1 0 −3−2 −1 5

=

0 0 00 2 00 0 −1

2. f1 =

1213

, f2 =

−7

1−7

4

, f3 =

1015−13−9

3. p =

102−1

, d2 = 12

114

15 SAMPLE FINAL EXAM III

15 Sample Final Exam III



1. The quadratic form x21 − 2x1x2 − 4x1x3 + 4x2

2 + 4x2x3 + 4x23 is

(A) positive semi-definite(B) positive definite(C) negative semi-definite(D) negative definite(E) not definite

2. Let A and B be n×n matrices such that B · A · B = E. Which of the following COULDbe true?

I. det(A) ≤ 0

II. rkA < rkB

III. A ·B 6= B · A

(A) I only(B) II only(C) III only(D) II and III(E) None could be true

3. If d is the distance between the endpoint of vector x = (−2, 7,−1) and the linear hull ofvectors e1 = (1, 0, 0) and e2 = (1,−3,−3), then d2 =(A) 4(B) 16(C) 32(D) 48(E) None of these

4. Let A : R3 → R3 be a linear operator and let e1, e2, e3 be some of its eigenvectors. Giventhat e1, e2, e3 are linearly independent, which of the following MUST be true?

I. A is diagonalizable

II. The eigenvalues corresponding to e1, e2, and e3 are different

III. A−1 is diagonalizable, if exists

115


(A) I only(B) I and II(C) I and III(D) I, II, and III(E) None

5. Let A be a n×n matrix and let B be the matrix defined by B = AT · A. Which of thefollowing are true statements?

I. Regardless of A, matrix B is always symmetric

II. B is the matrix of a positive semi-definite quadratic form

III. B is the matrix of a positive definite quadratic form if and only if det(A) 6= 0

(A) I only(B) I and II(C) I and III(D) II and III(E) I, II, and III

6. Let A be the linear operator acting on infinite-differentiable real functions according tothe rule A(f) = ∂2f

∂x2+ f . Which of the following is an eigenvector of A?

(A) f(x) = cos 2x(B) f(x) = sin 3x(C) f(x) = e−4x

(D) All of the above(E) None of the above

7. Which of the following matrix equations, where X and Y are the unknown matrices,MUST have no solutions?

I. X2 = −EII. XT ·X = −EIII. X · Y − Y ·X = E

(A) I only(B) I and III(C) III only(D) II and III(E) I, II, and III only

8. Which of the following numbers is NOT an eigenvalue of the matrixA =

3 −7 −77 −5 −71 −7 −5

?

(A) -5

116


(B) 2(C) 7(D) -4(E) All are eigenvalues

9. The SLE

ax1 − x2 + 2x3 = 112x1 + 9x2 − 4x3 = 02x1 − 5x2 + 3x3 = −7

has no solutions when (A) a = −5

(B) a = 6(C) a = 0(D) a = 5(E) There is no such a

10. Let e1, e2, and e3 be three distinct eigenvectors of a linear operator A : R3 → R3. If e1,e2, and e3 are linearly dependent, which of the following MUST be true?

I. det(A) = 0

II. At least two of the eigenvalues corresponding to e1, e2, e3 must be equal to eachother

III. A is not diagonalizable

(A) I only(B) II only(C) II and III(D) I, II, and III(E) None

117




1. [3 pts] Find the fundamental set of solutions to the following system of linear equations.Indicate the dimension of the solution space.{

x1 + 2x2 + 5x3 + 5x4 + 8x5 = −2x1 + x2 + 6x3 + 4x4 = −5

2. [4 pts] For the following matrix A find matrices C and C−1 such that C−1AC is diagonaland write a product expression for An. Show your work. 5 1 −2

−2 2 4−1 −1 6



e1 =

0111

, e2 =

1142

, e3 =

001−3

.

118


ANSWERS

Multiple Choice:

1. A

2. E

3. C

4. C

5. E

6. D

7. D

8. C

9. B

10. B

Free Response:

1. x =

−8

3000

+ λ1

−7

1100

+ λ2

−3−1

010

+ λ3

8−8

001

,

dim = 3.

2.

2 3 −41 1 −1−1 −1 2

× 5 1 −2−2 2 4−1 −1 6

× −1 2 −1

1 0 20 1 1

=

4 0 00 4 00 0 5

,

which is also correct.

3. f1 =

0111

, f2 =

−3

4−5

1

, f3 =

−82215−37

Note: some answers are not unique.

119

IndexAddition, 15Anti-symmetric matrix, 60Area, 27

Base, orthogonal, 69Base, orthonormal, 69

Cartesian product, 13Characteristic polynomial, 52Cofactor matrix, 41Commutative operation, 37Compatible matrices, 37Complementation of bases, 20Completion of bases, 20Completion of full squares, 64Conjugate matrices, 51Cuboid, 27

Determinant, 27, 29Diagonal matrix, 9Differentiation, 48Direct sum of vector spaces, 75

Echelon form, 12Eigenvalue, 55Eigenvector, 55Elementary transformation matrices, 43Euclidean space, 70Expected value, 48Extended matrix, 8

Factorial of an integer, 29Factors, 16Function, 14Function of several variables, 14Fundamental set of solutions, 24

Gauss elimination, backward steps, 10Gram matrix, 69Gram-Schmidt orthogonalization, 69Ground set, 15

Homogeneous dilation, 57

Homogeneous polynomial, 59

Identity matrix, 28, 38Inhomogeneous dilation, 57Integration, 48Inverse matrix, 38Inversion, 29

Kronecker’s symbol, 69

Laplace’s formula, 29Leading element, 10Linear combination, 16Linear dependence, 16Linear expression, 16Linear function, 48Linear independence, 16, 23, 55Linear map, 48Linear mapping, 48Linear operator, 48Linear space, 13Linear subspace, 70Linear transformation, 48Linear vector space, 13, 15Linear vector space over the field of real num-

bers, 15Linear vector subspace, 16Lower-triangular matrix, 9

Mapping, 14Matrix, 8Matrix conjugation, 51Matrix multiplication, 37Matrix of a quadratic form, 60Matrix of the linear operator, 49Matrix, anti-symmetric, 60Matrix, degenerate, 34Matrix, diagonal, 9Matrix, lower-triangular, 9Matrix, non-degenerate, 34Matrix, regular, 34Matrix, singular, 34

120

INDEX INDEX

Matrix, symmetric, 60Matrix, transpose, 22Matrix, upper-triangular, 9Minkowski space, 63Monomial, 59Multiplication by scalars, 15Multiplicity, algebraic, 56Multiplicity, geometric, 56Multiplicity, of a root, 56

Negative definite, 60Negative semidefinite, 60Neutral element, 38Normal vector, 70Not definite, 60

Optimization problem, 70Order, 29Ordered pairs, 13Orthogonal base, 69Orthogonal complement, 72Orthogonalization, 69Orthonormal base, 69

Parallelogram, 27Permutation, 29Pivot element, 10Pivoting element, 24Polynomial, 59Positive definite, 60Positive semidefinite, 60Projection, 48, 70

Quadratic form, 59Quadratic form, negative definite, 60Quadratic form, negative semidefinite, 60Quadratic form, not definite, 60Quadratic form, positive definite, 60Quadratic form, positive semidefinite, 60

Rank of a matrix, 23Relation, 14Rotation, 48

Saddle point, 61

Signature of a quadratic form, 64SLE, 8Subspace, 15Sum of vector spaces, 74Sylvester’s criterion, 64, 65Symmetric matrix, 60Symmetry, 48System of the linear equations, 8

Trace of a matrix, 52Transformations of columns, 28Transformations of rows, 8, 28Transition matrix, 51Transpose matrix, 22Transposition of a matrix, 22Trivial linear combination, 16

Upper-left corner minor, 64Upper-triangular matrix, 9

Vector space, 13, 15Volume, 27Volume n-dimensional, 27

121

LECTURE NOTES IN LINEAR ALGEBRA - Pokrovka11's Blog · PDF fileInternational College of...

Documents

Transcript of LECTURE NOTES IN LINEAR ALGEBRA - Pokrovka11's Blog · PDF fileInternational College of...