Write your solutions to the following problems either by writing them on a piece of paper or on a tablet and scanning your answers as a PDF. Note that you are not allowed to use LaTeX, Google Docs, or any other digital document creation software to type your answers. Homeworks are due to Gradescope by 11:59PM on the due date. See the syllabus for details on the slip day policy.
Homework will be evaluated not only on the correctness of your answers, but on your ability to present your ideas clearly and logically. You should always explain and justify your conclusions, using sound reasoning. Your goal should be to convince the reader of your assertions. If a question does not require explanation, it will be explicitly stated.
Review the solutions to Homework 6. Pick two problem parts (for example, Problem 2a and Problem 5b) from Homework 6 in which your solutions have the most room for improvement, i.e., where they have unsound reasoning, could be significantly more efficient or clearer, etc. Include a screenshot of your solution to each problem part, and in a few sentences, explain what was deficient and how it could be fixed.
Alternatively, if you think one of your solutions is significantly better than the posted one, copy it here and explain why you think it is better. If you didn’t do Homework 6, choose two problem parts from it that look challenging to you, and in a few sentences, explain the key ideas behind their solutions in your own words.
Problem 2: Anonymous Feedback (6 pts)
We’d like to get your feedback on how the course has been going so far, now that we’re past the halfway point and Midterm 2 is fast approaching.
You can find the survey at this link, which you should complete after you’ve finished the rest of Homework 7. Unlike the Homework 3 survey, it is anonymous, so feel free to provide candid feedback.
In order to earn the 6 points for Homework 7, Problem 2, include a screenshot of the confirmation message you see after submitting the form. (We consider it an honor code violation to include a screenshot if you didn’t actually submit the form!)
Thank you for your feedback once again!
Problem 3: The Complete Solution (18 pts)
Before beginning this problem, make sure you’ve read both Chapter 6.3 and Chapter 6.4.
a)
(4 pts) Consider the matrix \(A\) and vector \(\vec b\) defined below.
Find the vector \(\vec x^{\ast}\) that minimizes \(\lVert \vec b - A \vec x \rVert^2\). Show your work, but feel free to use numpy to compute some of the matrix operations; here is a video that walks through how to do this. (We’re intentionally using different variables here than in the notes to have you think about the problem in general terms.)
For the remainder of the problem, we will use the same vector \(\vec b\), but instead use the following matrix \(A\).
(2 pts) Now, find one vector \(\vec x^{\ast}\) that minimizes \(\lVert \vec b - A \vec x \rVert^2\) for this new matrix \(A\). Again, show your work. If you’ve read Chapter 6.4 closely, this should not require much calculation.
c)
(4 pts) Find a basis for \(\text{nullsp}(A)\). Hint: Try and do so efficiently, since this is the type of problem we’ll see on Midterm 2.
d)
(2 pts) Show that if \(\vec x’\) satisfies the normal equation, \(A^TA \vec x’ = A^T \vec b\), and \(\vec x_0 \in \text{nullsp}(A)\), then \(\vec x’ + \vec x_0\) also satisfies the normal equation. Hint: This is two-line solution; we’re mostly asking it so that you interalize what this means and why it’s true.
e)
(3 pts) Describe, using set notation, the complete set of vectors \(\vec x^{\ast}\) that minimize \(\lVert \vec b - A \vec x \rVert^2\). Is this set a subspace?
f)
(3 pts) There are infinitely many vectors \(\vec x^{\ast}\) that minimize \(\lVert \vec b - A \vec x \rVert^2\). If we try and use code to find a solution, it can’t return all of them — it’ll pick a particular one.
In Python, use np.linalg.lstsq to find a vector \(\vec x^{\ast}\) that minimizes \(\lVert \vec b - A \vec x \rVert^2\). Include a screenshot of your code and the vector \(\vec x^{\ast}\) it returns, and in your PDF, write out the coefficients of \(\vec x^{\ast}\) as a vector (in addition to the screenshot). Then, provide an educated guess of why you think it picked the \(\vec x^{\ast}\) that it did.
Problem 4: Orthogonalization (27 pts)
Before starting, refer to Chapter 6.5, written just for this problem. It won’t be possible to do this problem without referencing it.
In parts a) through d), we’ll refer to the vectors \(\vec v_1 = \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix}\), \(\vec v_2 = \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \end{bmatrix}\), and \(\vec v_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}\).
a)
(6 pts) By hand, apply the Gram-Schmidt process to the vectors \(\vec v_1, \vec v_2, \vec v_3\) to find an orthonormal set of vectors \(\vec q_1, \vec q_2, \vec q_3\). Show your work; you cannot use numpy for this.
Then, create the matrix \(Q = \begin{bmatrix} | & | & | \\ \vec q_1 & \vec q_2 & \vec q_3 \\ | & | & | \end{bmatrix}\) and confirm that \(Q^TQ = I\), but that \(QQ^T \neq I\).
b)
(3 pts) Suppose \(\vec v_4 = \begin{bmatrix} 2 \\ 2 \\ 3 \\ 3 \end{bmatrix}\). If you were to apply the Gram-Schmidt process to the vectors \(\vec v_1, \vec v_2, \vec v_3, \vec v_4\), what would the vector \(\vec Q_4\) be? Why?
Find scalars \(a\), \(b\), and \(c\) such that \(a\vec q_1 + b\vec q_2 + c\vec q_3 = \vec y\), without solving a system of 3 equations and 3 unknowns. Instead, use the fact that \(\vec q_1, \vec q_2, \vec q_3\) are orthonormal. Hint: There’s a relevant problem from Lab 5.
d)
(4 pts) Consider \(\vec y = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix}\). Unlike in c), \(\vec y\)is not in \(\text{span}(\lbrace \vec v_1, \vec v_2, \vec v_3 \rbrace)\).
Find the vector in \(\text{span}(\lbrace \vec v_1, \vec v_2, \vec v_3 \rbrace)\) that is closest to \(\vec y\). Do not stack the \(\vec v_i\)’s into a matrix \(X\) and then use \(X(X^TX)^{-1}X^T\vec y\). Instead, use the fact that \(\vec q_1, \vec q_2, \vec q_3\) are orthonormal and have the same span as \(\vec v_1, \vec v_2, \vec v_3\). How does this simplify the problem?
e)
(5 pts) Open the the supplemental Jupyter Notebook we’ve created for Homework 7, which can either be found here in the course GitHub repository, or here on DataHub.
There, you’re asked to implement the function orthogonalize, which takes in an \(n \times d\) matrix \(V\) whose columns are linearly independent, and returns a matrix \(Q\) whose columns are orthonormal and have the same span as \(V\). This problem is not autograded. Rather, in your submission to this part, include a screenshot of your implementation and sample output in your PDF for Homework 7.
f)
(5 pts) A QR decomposition of a matrix \(A\) is a factorization of the form
$$ A = QR $$
where \(Q\) is an \(n \times d\) matrix with orthonormal columns and \(R\) is an \(d \times d\)upper triangular matrix (a matrix that has 0s below the diagonal).
For example, if \(A = \begin{bmatrix} 1 & 1 & 1 \\ -1 & 0 & 1 \\ 1 & 1 & 2 \end{bmatrix}\), a \(QR\) decomposition of \(A\) is
Finding the \(Q\) in a \(QR\) decomposition is straightforward: apply Gram-Schmidt to the columns of \(A\), assuming \(A\)’s columns are linearly independent. The question is how to find \(R\).
In the supplemental Jupyter Notebook, we’ve defined an arbitrary matrix \(A\) and call your orthogonalize function on it, and give you hints as to how to find \(R\). Using the experimentation there, and what you know about \(Q\), explain how to find \(R\).
Find a \(QR\) decomposition of \(A = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix}\). Note that the columns of \(A\) are made up of the same three vectors you worked with in parts a) through d) of this problem.
Given a\(QR\) decomposition of \(A\), explain how to find another\(QR\) decomposition of \(A\) with a (slightly) different \(Q\) and/or \(R\).
Problem 5: Same, but Different (13 pts)
In Chapter 2.4, we were introduced to one of many formulas for the optimal slope, \(w_1^{\ast}\), and optimal intercept, \(w_0^{\ast}\), for the simple linear regression model \(h(x_i) = w_0 + w_1 x_i\) when using squared loss:
$$ w_1^* = r \frac{\sigma_{y}}{\sigma_{x}} \qquad w_0^* = \bar y - w_1^* \bar x $$
The end goal of Chapters 3 through 6 has been to give us the tools to revisit the simple linear regression model in terms of linear algebra, so that we can extend our model to allow for multiple input variables. As we see in Chapter 7.1, the solution is to define the \(n \times 2\) “design matrix” \(X\) and observation vector \(\vec y \in \mathbb{R}^n\) as follows:
It’s not immediately obvious why the components of \(\vec w^{\ast}\) should have anything to do with the correlation, means of \(x\) and \(y\), and standard deviations of \(x\) and \(y\). In this problem, we will prove that both of these formulations are equivalent, for any dataset \((x_1, y_1)\), \((x_2, y_2)\), ..., \((x_n, y_n)\).
a)
(5 pts) Express the matrix \((X^TX)^{-1}\) using constants and/or summations involving \(x_i\) and/or \(y_i\).
Hint: Start by proving \(\sum_{i = 1}^n x_i^2 = n \sigma_x^2 + n \bar{x}^2\).
c)
(5 pts) Finally, prove that
$$ (X^TX)^{-1}X^T \vec{y} = \begin{bmatrix} \bar{y} - r \frac{\sigma_y}{\sigma_x} \bar{x} \\\\ r \frac{\sigma_y}{\sigma_x} \end{bmatrix} $$
Hint: Start by proving that \(\sum_{i=1}^n x_i y_i = nr \sigma_x \sigma_y + n \bar{x}\bar{y}\).
Note that the second component of the vector above is \(w_1^{\ast} = r \frac{\sigma_y}{\sigma_x}\) and the first component of the vector above is \(w_0^{\ast} = \bar{y} - r \frac{\sigma_y}{\sigma_x} \bar{x} = \bar{y} - w_1^{\ast} \bar{x}\), as we first saw in Chapter 2.4! I think this is beautiful.
Problem 6: Putting it into Practice (8 pts)
This problem asks you to apply the concepts in Chapter 7.2, and follows the last problem.
Suppose we’d like to fit a hypothesis function of the form \(h(x_i) = w_0 + w_1 x_i^2\). Notice the squared term; this is not a simple linear regression line.
To do so, we’ll find the optimal parameter vector \(\vec w^{\ast}\) that satisfies the normal equations. The first 5 rows of our dataset are as follows, though note that our dataset has \(n\) rows in total.
Suppose \(x_1, x_2, …, x_n\) have a mean of \(\bar{x} = 5\) and a variance of \(\sigma_x^2 = 8\).
a)
(2 pts) Write the first five rows of the design matrix, \(X\).
b)
(3 pts) Suppose that after solving the normal equations, we find \(\vec w^{\ast} = \begin{bmatrix} 5 \\ -2 \end{bmatrix}\). Find the augmented feature vector \(\text{Aug}(\vec x_3)\) and squared loss for row 3 of the dataset.
c)
(3 pts) Let \(X_\text{tri} = 3X\), where \(X\) is the full design matrix for our dataset with \(n\) rows. Determine the bottom-left value in the matrix \(X_\text{tri}^TX_\text{tri}\), i.e. the value in the second row and first column. Your answer should be an expression involving \(n\). Hint: You can use any of the hints or results from Problem 2 without needing to re-prove them.
Problem 7: Billy the Waiter (14 pts)
This problem involves writing code and submitting it to the Gradescope autograder. The goal of this problem is to give you a taste of how linear algebra can be used to implement linear regression in code, and show you how to build models that involve multiple features (including categorical variables).
Open the the supplemental Jupyter Notebook we’ve created for Homework 7, which can either be found here in the course GitHub repository, or here on DataHub.
This problem is entirely autograded; to receive credit for Problem 7 of this homework, you’ll need to submit your completed notebook to the autograder on Gradescope. Your submission time for Homework 7 is the latter of your PDF and code submission times.