1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively high negative value. This suggests that

  1. This feature has a strong effect on the model (should be retained)
  2. This feature does not have a strong effect on the model (should be ignored)
  3. It is not possible to comment on the importance of this feature without additional information

2. We have seen methods like ridge and lasso to reduce variance among the coefficients. We can use these methods to do feature selection also. Which one of them is more appropriate?

  1. Ridge
  2. Lasso

3. Given a set of n data points, (X1, Y1), (X2.Y2), …, (Xn, Yn), the best least squares fit f (X) is obtained by minimization of:

  1. n i=1 [yi – f(xi)]
  2. min(yi – f(xi))
  3. n i=1 [yi – f(xi)]2
  4. max (yi – f(xo))2

4. During linear regression, with regards to residuals, which among the following is true?

  1. Lower is better
  2. Higher is better
  3. Depends upon the data
  4. None of the above

5. In the lecture on Multivariate Regression, you learn about using orthogonalization iteratively to obtain regression co-efficient. This method is generally referred to as Multiple Regression using Successive Orthogonalization

In the formulation of the method, we observe that in iteration k, we regress the entire dataset on Z0, Z1, … Zk-1. it seems like a waste of computation to recompute the coefficients for Z0 a total of p times, Z1 a total of p – 1 times and so on. Can we reuse the coefficients computed in iteration j for iteration j+1 for z,-1?

  1. No. Doing so will result in the wrong γ matrix. and hence, the wrong βi ‘s.
  2. Yes. Since zj-1 is orthogonal to 2j-l∀l ≤ jl, the multiple regression in each iteration is essentially a univariate regression on each of the previous residuals. Since the regression coefficients for the previous residuals don’t change over iterations, we can reuse the coefficients for further iterations.

6. You decide to reduce the dimensionality of your data (N x p) using Best Subset Selection. The library you’re using has a function regress(X, Y) that takes in X and Y and regresses Y on X. What is the expected number of times regress(· , ·) will be called during your dimensionality reduction?

  1. O(2N)
  2. O(2P)
  3. O(NP)
  4. O(p2)

7. If the number of features is larger than the number of training data points, to identify a suitable subset of the features for use with linear regression, we would prefer

  1. Forward stepwise selection
  2. Backward stepwise selection

8. Assume you have a five-dimensional input data for a three-class classification problem. Further assume that all five dimensions of the input are independent to each other. In this scenario, is it possible for linear regression using lasso to result in one or more coefficients to become zero?

  1. Yes
  2. No

9. You are given the following five three-dimensional training data instances (along with one- dimensional output)

  • X1 = 5, X2 = 7, X3 = 3, y = 4
  • X1 = 2, X2 = 4, X3 = 9, y= 8
  • X1 = 3, X2 = 8, X3 = 1, y= 2
  • X1 = 7, X2 = 7, X3 = 2, y=3
  • X1 = 1, X2 = 9, X3 = 7, y = 8

Using the K-nearest neighbour technique for performing regression, what will be the predicted y value corresponding to the query point (X1 = 5, X2 =3, X3 = 4), for K= 2?

  1. 3
  2. 2.5
  3. 3.5
  4. 2

10. For the dataset given in the previous question, what will be the predicted y value corresponding to the query point (X1 = 5, X2 = 3, X3 = 4), for K= 3?

  1. 4.66
  2. 5
  3. 3
  4. 3.5
