## Question

You are attempting to model the net hourly electrical energy output (MW) of the plant based on one or more of the other variables. We will assume that the relationships are linear (you will comment on the appropriateness of this assumption). You will produce the following, using either Excel (suggested for parts 1 and 2) or the SAS add-in for Excel (suggested for the remainder of the problem):

1. Individual scatter plots for each of the dependent variables (temperature, pressure, relative humidity, and exhaust vacuum) versus the net hourly electrical energy output. Comment on the linearity assumption and which dependent variables you believe most impact the net hourly electrical energy output.

2. Compute the parameters of the individual linear regression models and produce a graph of the data points and the linear regression model developed. Compute the value of R² for each variable. Comment on which dependent variable(s) you believe most impact the net hourly electrical energy output and justify your response numerically, if possible.

3. Compute the parameters of a multiple linear regression model using all of the independent variables to predict the net hourly electrical energy output of the plant. Compute the value of R². Produce a graph of the residuals of each of the independent variables. Comment on the fit of your model and the prediction of the net hourly electrical energy output and justify your response numerically, if possible. Comment on the heteroscedasticity/ homoscedasticity of the residuals of each of the independent variables.

4. Remove the independent variable(s) that you believe least contributes to the linear regression model of the net hourly electrical energy output of the plant. Compute the parameters of a multiple linear regression model using the revised data set. Compute the value of R² and adjusted R² of the new model.

Justify your selection of the independent variable(s) you removed. Compare the R² and adjusted R² of your revised model versus the R² and adjusted R² of the model using all independent variables. Comment on the difference. Which model do you believe is better –the model with all variables or the simplified model? Explain.

Problem 2:

This problem will revisit the power plant data set that you just analyzed so that you can compare the results obtained using multiple regression with principal component regression.

a. Using the Excel SAS plug-in, conduct a principal component analysis of the independent variables only (i.e., do not include the power generated).

b. Based on the Scree plot, how many principal components do you believe should be included?

c. What percent of the variance do those principal components you chose to include explain?

d. Based on the principal component analysis, which of the independent variables are most influential?

e. How does your answer to part (d) compare to the p-values of the independent variables that you calculated previously during the multiple regression?

## Solution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

By purchasing this solution you'll be able to access the following files:

Solution.docx and Solution.zip.