For the following 3 problems use the computer science data that we have been discussing in class. You can get a copy of the data set cadata dat from the class website. The variables are: id, a numerical identifier for each student; GPA, the grade point average after three semesters; HSM; HSS; HSE; SATM; SATV, which were all explained in class; and GENDER, coded as 1 for men and 2 for women.

1. In a data step, create a new variable GENDERW that has values 1 for women and 0 for men (use arithmetic on the original variable GENDER). Run a regression to predict GPA using the explanatory variables HSM, HSS, HSE, SATM, SATV, and GENDERW (Do not include any interaction terms.)
(a) Give the equation of the fitted regression line using all six explanatory variables.
(b) Give the fitted regression line for women (use part a).
(c) Give the fitted regression line for men (use part a).
DO NOT attempt to run proc reg on a subset of the data to answer this question.
2. Use the Cp criterion to select the best subset of variables for this problem (i.e. use the options - / selection = cp b;") Use only the original six explanatory variables, not HS or SAT, and use either GENDER or GENDERW, not both. Summarize the results and explain your choice of the best model.
3. Check the assumptions of this "best" model using all the usual plots (you know what they are by now). Explain in detail whether or not each assumption appears to be substantially violated.
4. Plot the data for the two populations as a symbolic scatter plot on the same graph, using different symbols (v=) and lines. Does the relationship between valuation and selling price appear to be the same for the two lot locations?
5. Examine the question of whether or not the two lines are the same. Write a model that allows the two lot locations to have different intercepts and slopes. Then, perform the general linear test to determine whether the two lines are equal. State the null and alternative hypothesis, the test statistic with degrees of freedom, the p-value and your conclusion.
6. Using the model that fits two different lines, give a 90% confidence interval for the difference in slopes. (Hint: what parameter represents the difference between the slopes?)

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

    By purchasing this solution you'll be able to access the following files:
    Solution.docx and

    for this solution

    PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

    Find A Tutor

    View available Statistics-R Programming Tutors

    Get College Homework Help.

    Are you sure you don't want to upload any files?

    Fast tutor response requires as much info as possible.

    Upload a file
    Continue without uploading

    We couldn't find that subject.
    Please select the best match from the list below.

    We'll send you an email right away. If it's not in your inbox, check your spam folder.

    • 1
    • 2
    • 3
    Live Chats