Write your solutions to the following problems either by writing them on a piece of paper or on a tablet and scanning your answers as a PDF. Note that you are not allowed to use LaTeX, Google Docs, or any other digital document creation software to type your answers. Homeworks are due to Gradescope by 11:59PM on the due date. See the syllabus for details on the slip day policy.
Homework will be evaluated not only on the correctness of your answers, but on your ability to present your ideas clearly and logically. You should always explain and justify your conclusions, using sound reasoning. Your goal should be to convince the reader of your assertions. If a question does not require explanation, it will be explicitly stated.
Note: In some of the problems in this homework, we’ll explicitly mention that you can use Python and numpy to perform some of the relevant calculations. For a reference on how to do so, consult Chapter 5.1. In other problems, we’ll explicitly state that you must execute all calculations by hand.
Problem 1: Midterm 1 Solutions Review (10 pts)
Review the solutions to Midterm 1. Pick two problem parts (for example, Problem 3b and Problem 7a) from Midterm 1 in which your solutions have the most room for improvement, i.e., where they have unsound reasoning, could be significantly more efficient or clearer, etc. Include a screenshot of your solution to each problem part, and in a few sentences, explain what was deficient and how it could be fixed.
Alternatively, if you think one of your solutions is significantly better than the posted one, copy it here and explain why you think it is better. If you didn’t do Midterm 1, choose two problem parts from it that look challenging to you, and in a few sentences, explain the key ideas behind their solutions in your own words.
(4 pts) In each subpart, state whether the resulting object is a matrix, vector, or scalar. If the result is a matrix or vector, state its dimensions. If the result is not defined, state why. You don’t need to actually compute the resulting objects.
\(A^T\)
\(A^TA\)
\(AA^T\)
\(A^TA + AA^T\)
\(A^T \vec x\), where \(\vec x \in \mathbb{R}^3\)
\(A^T \vec x\), where \(\vec x \in \mathbb{R}^5\)
\(\vec x^T A^T A \vec x\), where \(\vec x \in \mathbb{R}^3\)
There are two interpretations of the resulting vector, based on what we’ve seen in Chapter 5.1 — what are they?
c)
(5 pts) In both subparts, try and find a vector \(\vec x \in \mathbb{R}^3\) such that \(A \vec x = \vec b\). If it’s not possible to do so, explain why.
(3 pts) Explain why it’s the case that — for this particular matrix \(A\) — if \(A \vec x_1 = \vec b\) and \(A \vec x_2 = \vec b\), then \(\vec x_1 = \vec x_2\).
Problem 3: Correlation, Revisited (11 pts)
In this problem, we’ll see how the correlation coefficient between two variables, \(r\), can be expressed as a matrix multiplication.
Consider a dataset of \(n\) points, \((x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\), and let
where \(\bar{x}\) and \(\bar{y}\) are the means of \(x\) and \(y\), respectively. Note that \(D\) is an \(n \times 2\) matrix, and it is mean-centered, meaning that the mean of each column is 0.
Define the matrix \(\Sigma\) as follows.
$$ \Sigma = \frac{1}{n} D^TD $$
\(\Sigma\) is a \(2 \times 2\) matrix. Its name is pronounced “sigma”, just like in summation notation and standard deviation. Don’t confuse it with summation notation; \(\Sigma\) is just a single matrix.
a)
(4 pts) For this particular matrix \(D\), find \(\Sigma\). All four components of \(\Sigma\) should be expressions involving the points \(x_1, x_2, \ldots, x_n\) and/or \(y_1, y_2, \ldots, y_n\). Feel free to use summation notation in your answers.
b)
(2 pts) In English, what do the two elements on the diagonal (top-left and bottom-right) of \(\Sigma\) represent?
c)
(3 pts) You should notice that \(\Sigma\) is a symmetric matrix, meaning \(\Sigma^T = \Sigma\). (See Chapter 5.2 for more on symmetric matrices.) The elements off the diagonal (top-right and bottom-left) are both equal, and are called the covariance of \(x\) and \(y\). For that reason, \(\Sigma\) is often called the covariance matrix.
Find an expression for the off-diagonal elements of \(\Sigma\) in terms of the correlation coefficient, \(r\), \(\sigma_x\), and \(\sigma_y\), but with no summation notation or other variables.
Hint: This only requires 1-2 lines of work. Remember the definition of \(r\) from Chapter 2.4.
d)
(2 pts) In general, suppose \(X \in \mathbb{R}^{n \times d}\) is a matrix containing \(n\) observations for each of \(d\) variables/features. The covariance matrix of \(X\) is defined similarly.
$$ \Sigma = \frac{1}{n} X^TX $$
In English, explain what the element in row 3 and column 5 of this \(\Sigma\) represents.
Problem 4: Projections, Revisited (14 pts)
As we first saw in Chapter 3.4, the projection of \(\vec u\) onto \(\vec v\) is the vector
$$ \vec p = \left( \frac{\vec u \cdot \vec v}{\vec v \cdot \vec v} \right) \vec v $$
If we assume that \(\vec v\) is a unit vector, meaning \(\lVert \vec v \rVert = 1\), then the projection of \(\vec u\) onto \(\vec v\) has a simpler form,
$$ \vec p = (\vec u \cdot \vec v) \vec v $$
For simplicity, assume that \(\vec u = \begin{bmatrix} u_1 \\ u_2 \end{bmatrix}\) is some arbitrary (not-necessarily unit) vector in \(\mathbb{R}^2\), and \(\vec v = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix}\) is a unit vector in \(\mathbb{R}^2\).
a)
(6 pts) Find a \(2 \times 2\) matrix \(P\), called a projection matrix, such that
$$ P \vec u = \vec p = (\vec u \cdot \vec v) \vec v $$
Think of \(P\) as a matrix that transforms \(\vec u\) into an approximation of it, in the direction of \(\vec v\) (or “projects” \(\vec u\) onto \(\vec v\)).
Hint: Start by writing \(P = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\) and solve for \(a, b, c, d\) in terms of \(v_1\) and \(v_2\); \(P\) should not involve \(u_1\) or \(u_2\). Don’t forget that \(\vec v\) is a unit vector, and both \(\vec u, \vec v \in \mathbb{R}^2\).
b)
(4 pts) Find the projection of \(\vec u = \begin{bmatrix} 9 \\ -3 \end{bmatrix}\) onto the unit vector \(\vec v = \begin{bmatrix} 3 / 5 \\ 4 / 5 \end{bmatrix}\) using:
The formula for the projection of \(\vec u\) onto \(\vec v\)
The projection matrix \(P\) you found in part a)
Feel free to use Python and numpy to compute the relevant products as we do in Chapter 5.2, but if you do so, include screenshots of your code and results, and also write out the final result by hand. If you just write the final result with no work shown, you will not receive any credit.
c)
(4 pts) Show that \(P\) satisfies the following property:
$$ P^2 = P $$
This means that \(P\) is an idempotent matrix, meaning that applying \(P\) twice (or three times, or four times, etc.) to a vector is the same as applying it once.
Hint: You’ll likely end up with terms of the form \(v_1^4\). Remember that \(\vec v\) is a unit vector; use this to help you simplify.
(6 pts) For each of the following matrices, compute \(A^TA\) and \(AA^T\); \(B^TB\) and \(BB^T\); ... and use that to determine whether it is orthogonal. If a matrix is not orthogonal, explain which of the conditions for being orthogonal it does and does not satisfy.
Note: Feel free to use Python and numpy to compute the relevant products as we do in Chapter 5.2, but if you do so, include screenshots of your code and results, and also write out the final result by hand. If you just write the final result with no work shown, you will not receive any credit.
b)
(3 pts) Explain why the following statement is true: If \(Q\) is an orthogonal matrix, then the rows of \(Q\) form an orthonormal set, in addition to the columns.
Hint: Think about what \(Q^TQ\) and \(QQ^T\) each are.
c)
(3 pts) Orthogonal matrices have many useful properties. One is that they preserve the norm of vectors. In other words, if \(Q \in \mathbb{R}^{n \times n}\) is orthogonal and \(\vec x \in \mathbb{R}^n\), then:
$$ \lVert Q \vec x \rVert = \lVert \vec x \rVert $$
Prove the statement above.
d)
(3 pts) At the end of Chapter 5.2, we presented the matrix
and visualized three vectors, \(\vec u\), \(\vec v\), and \(\vec w\), and the result of multiplying each one by \(A\). We defined \(A\) as a rotation matrix; specifically, one that rotates vectors by \(\theta = 30^\circ\) counterclockwise. (Go and look at the picture there for context; we’re intentionally not providing it here so that you have to look at the notes!)
In general, the \(2 \times 2\) rotation matrix by an angle \(\theta\) is given by
To understand where the numbers in \(R\) came from, read the solutions to Homework 4, linked above.
By hand, find a CR decomposition of the matrices below, by placing the linearly independent columns (reading from left to right) in \(C\) and the values needed to “mix” the linearly independent columns in \(C\) to get back the original matrix in \(R\).
Hint: Most of these can be done quickly by eyeballing the relationships between columns.