1. Recall the mutual fund example from lecture. The data set Mutual.RData contains two objects, mu and V, that are the mean vector and covariance matrix of annual rates of return of five investment funds. Here is mu:

SP500 HighTech SmallCap USTreas CorpBond
0.06 0.10 0.08 0.02 0.04

This says, for example, that the annual mean return on investment for the US Treasuries fund is 2 percent. Assume that the rates of return are jointly distributed as multivariate normal.

Suppose that Tom and Ellen each invest $1000 at the beginning of the year. Tom puts $500 in the High Tech fund and $500 in the Small Cap fund. Ellen puts $200 in each of the five funds. What is the probability that Ellen will have made more money than Tom at the end of the year?

2. The data frame BCMort88 found in the file BCMort88.RData gives breast cancer mortality rates for 217 counties in nine states (six New England states plus NY, NJ, and PA) for the year 1988. The rates are adjusted to account for differing demographic characteristics across the counties. Here is a data description:

Variables:
Pop = Population of county
AdjRate = Adjusted mortality rate (per 100,000)
SE = Estimated standard error (per 100,000)

We want to identify counties that have mortality rates that are significantly more than 18 per 100,000 population, in order to devote resources to them. For this problem assume that the adjusted rates are unbiased and normally distributed although the latter clearly is not the case for smaller counties.

Limit your analysis to those counties that have a population of at least 20,000. Use a multiple testing method to identify those counties that should receive resources according to the criterion stated above, and explain why you chose it over alternative methods.

3. The data frame MTVR found in the file MTVR.RData gives information on n = 112 USMC Medium Terrain Vehicle Replacements (MTVRs) at Camp Lejeune, NC. Here is a data description:

Data on n = 112 MTVRs taken from internal Caterpillar engine diagnostic readings. Variables are as follows:

PTO
Idle
Miles
Load.factor
MPG

Percent of time in Power Takeoff (PTO) mode
Percent of time in idle mode
Number of miles driven

Percent of max. available power used by the engine
Fuel efficiency (miles per gallon)

Source: Penn State Applied Research Laboratory, 2013

(a) Use the pairs() command to identify two outliers: describe what they are and whether you believe it would be justified to delete these observations.

(b) Fit a least-square regression model to predict MPG from the other variables both with and

without the two outliers included. Do the outliers have a large effect on the fitted model?

(c) Refer to the regression with the two outliers removed. Plot the residuals (y-axis) versus the fitted values (x-axis) and comment on the pattern that you see. Is it consistent with what you know the correlation is between the residuals and fitted values?

1. Recall the mutual fund example from lecture. The data set Mutual...