For n = 27 healthy females between 8 and 25 years, the variables are y = StrdLvl, the level of a steroid and x = Age, age in years.

a) Draw a scatterplot of the two variables. Briefly discuss the important features of the plot.

b) Create a new variable that is the squared value of Age. In Minitab, use Calc > Calculator, enter a new variable name (such as Agesqrd) in the Store result in variable box, and in the Expression box enter Age*Age. Then, carry out a regression analysis in order to estimate the quadratic response function relating StrdLvl to Age. In other words, use Age and Agesqrd as predictors of StrdLvl.

i. Write the estimated sample equation.

ii. What is the evidence in the output that it was useful to include the quadratic term (the squared term)?

c) For the regression model that you fit in part (b), create and include a plot of residuals versus fits. Briefly discuss whether any difficulties with either the data or the model are indicated.

d) One possible interpretation of the scatterplot in part (a) is that the mean steroid level more or less approaches an asymptote. (The slight hint of a decrease at the oldest age might simply be a random drop in those last two observed data points.] One class of models that approach asymptotes involves straight-line relationships between y and powers of the reciprocal of x. By some trial and error, I found that the model fits the sample data fairly well. In this part and the next, you'll verify that. Create a new variable 1/Agesqrd. Graph StrdLvl versus this new variable. Discuss any important features of the plot.

e) Carry out a simple linear regression in which the y-variable is StrdLvl and the only x-variable is the reciprocal of squared age (the variable you created in the last part). Note that there is no need to also include 1/Age as the hierarchy principle only applies to positive integer powers. Create a plot of residuals versus fits for this regression.

i. Write the estimated sample regression equation.

ii. Using values of MSE and R², compare the model estimated in this part with the quadratic model estimated in part (b).

iii. Interpret the plot of residuals versus fits. Briefly discuss whether any difficulties with either the data or the model are indicated.

f) The two models estimated in part (b) and part (e) fit the sample data about equally well, but they extrapolate much differently. This part will demonstrate that.

i. Use each of the two models to calculate a prediction of the mean steroid level for women who are 20 years old.

ii. Use each of the two models to calculate a prediction of the mean steroid level for women who are 32 years old. [Note: With no data for women older than 25, and with no physiological theory for this situation, we should not really be extrapolating outside the range of the observed data. In the observed age range (8 to 25 years old), both models seem to be working well. This example simply illustrates the additional hazards of extrapolating with nonlinear functions.]

**Subject Mathematics Advanced Statistics**