Multiple Choice Questions:
Question 1
If you have a dataframe d, then the expression d[1] will return:
A dataframe with one row (which is going to be the first row of d)
A dataframe with a single column (and that column will contain the values from the first column of d)
The values from the first row of d (returned as a vector)
The values from the first column of d (returned as a vector)

Question 2
Suppose you run the following piece of code:
f=function(x=sqrt(z)) {
print (x)
Then somewhere later in your code you are going to use that function (assume Z was not touched in between):
What will be the results of the three function calls shown above?
2,4, Error

Question 3
Suppose you have a vector X of some values (observations). Which of the following commands will calculate the 83% quantile of the data (approximately - do not worry about ties, smoothing between discrete data points or rounding up vs rounding down):
sort(x)[round(0.83*length(x)) )]
x[ round(0.83*length(x)) ]
x round(0.83*sum(x))]

Question 4
Bias of a statistical model is
- an error introduced by training the model on a specific training set and thus fitting the particular realization of the noise in that dataset
- the error introduced by approximating real-life (usually unknown) dependence of outcome Y on the explanatory variables X with a simpler model
- The error due to unmeasured/unmeasurable variation in the data
- The average sum of squared differences between observed and predicted values of the outcome Y

Question 5
In the prediction problem, we are trying to learn from existing data with the primary goal of predicting the outcome in new cases. The simplicity/interpretability of the model is valuable, but may be of less importance than the accuracy of the predictions.

Question 6
In the inference problem we are more interested in understanding the data at hand: finding out which predictors are important, discovering the relationships between the outcome and the predictors, etc. Simplicity/interpretability of the model has higher importance in this setting.

Question 7
In order to perform initial data data exploration/summarization, it is often useful to:
Look at the empirical distributions (histograms) of the variables in the dataset
All of the suggested options
Look at pairwise scatterplots between continuous variables
Look at summary statistics/boxplots of continuous variables, stratified by categorical variables (if any)

Question 8
We say that the model overfits the data when it achieves very small MSE on the training set, but fails to predict well (results in a large MSE) on the test set.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

Q1) A dataframe with a single column (second choice)
Q2) 2,4, Error...

By purchasing this solution you'll be able to access the following files:

for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Advanced Statistics Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats