QuestionQuestion

Transcribed TextTranscribed Text

Problem Set 5 Problem 1. This problem uses data from the Prevention of REnal and Vascular END-stage Disease (PRE- VEND) study, which took place between 2003 and 2006 in the Netherlands. Clinical and demo- graphic data for 4,095 individuals are stored in the prevend dataset in the oibiostat package. Body mass index (BMI) is a measure of body fat that is based on both height and weight. The World Health Organization and National Institutes for Health define a BMI of over 25.0 as overweight; this guideline is typically applied to adults in all age groups. However, a recent study has reported that individuals of ages 65 or older with the greatest mortality risk were those with BMI lower than 23.0, while those with BMI between 24.0 and 30.9 were at lower risk of mortality. These findings suggest that the ideal weight-for-height in older adults may not be the same as in younger adults. Explore the relationship between BMI (BMI) and age (age), using the same sample of 500 indi- viduals from the prevend data as used in the Unit 6 Labs. The code to create prevend. sample is provided in the problem set template. a) Create a plot that shows the association between BMI and age. Based on the plot, comment briefly on the nature of the association. b) Fit a linear regression model to relate BMI and age. i. Write the equation of the linear model. ii. Interpret the slope and intercept values in the context of the data. Comment on whether the intercept value has any interpretive meaning in this setting. iii. Is it valid to use the linear model to estimate BMI for an individual who is 30 years old? Explain your answer. iv. According to the linear model, estimate the average BMI for an individual who is 60 years old. V. Based on the linear model, how much does BMI differ, on average, between an individ- - ual who is 70 years old versus an individual who is 50 years old? c) Create residual plots to assess the model assumptions of linearity, constant variability, and normally distributed residuals. In your assessment of whether an assumption is reasonable, be sure to clearly reference and interpret relevant features of the appropriate plot. i. Assess linearity. ii. Assess constant variance. iii. Assess normality of residuals. iv. Suppose that a point is located in the uppermost right corner on a Q-Q plot of residuals (from a linear model). In one sentence, describe where that point would necessarily be located on a scatterplot of the data. d) Conduct a formal hypothesis test of no association between BMI and age, at the a = 0.05 significance level. Summarize your conclusions. e) Report the R² of the linear model relating BMI and age. Based on the R² value, briefly comment on whether you think the estimated average BMI values calculated in part b) are accurate. Problem 2. This problem uses data from the National Health and Nutrition Examination Survey (NHANES), a survey conducted annually by the US Centers for Disease Control (CDC). The data can be treated as if it were a simple random sample from the American population. The dataset nhanes.samp.adult.500 in the oibiostat package contains data for 500 participants ages 21 years or older that were randomly sampled from the complete NHANES dataset that contains 10,000 observations. Regular physical activity is important for maintaining a healthy weight, boosting mood, and reducing risk for diabetes, heart attack, and stroke. In this problem, you will be exploring the relationship between weight (Weight) and physical activity (PhysActive) using the data in nhanes samp. adult. 500. Weight is measured in kilograms. The variable PhysActive is coded Yes if the participant does moderate or vigorous-intensity sports, fitness, or recreational activities, and No if otherwise. a) Explore the data. i. Identify how many individuals are physically active. ii. Create a plot that shows the association between weight and physical activity. Describe what you see. b) Fit a linear regression model to relate weight and physical activity. Report the estimated coefficients from the model and interpret them in the context of the data. c) Report a 95% confidence interval for the slope parameter and interpret the interval in the context of the data. Based on the interval, is there sufficient evidence at a = 0.05 to reject the null hypothesis of no association between weight and physical activity? d) Suppose that upon seeing the results from part c), your friend claims that these data rep- resent evidence that being physically active promotes weight loss. Do you agree with your friend? Explain your answer. e) In the context of these data, would you prefer to conduct inference using the linear regres- sion approach or the two-sample t-test approach? Explain your answer. f) Suppose that the estimated slope coefficient from the model were positive (and statistically significant). Propose at least two possible explanations for such a trend. Problem 3. The file low_bwt. Rdata contains information for a random sample of 100 low birth weight infants born in two teaching hospitals in Boston, Massachusetts. The data appear in Table B7 in Principles of Biostatistics, 2nd ed. Pagano and Gauvreau. (Mother's age is present in the dataset but not documented in the table). The dataset contains the following variables: - birthwt: the weight of the infant at birth, measured in grams - gestage: the gestational age of the infant at birth, measured in weeks - momage: the mother's age at the birth of the child, measured in years - toxemia: recorded as Yes if the mother was diagnosed with toxemia during pregnancy, and No otherwise - length: length of the infant at birth, measured in centimeters - headcirc: head circumference of the infant at birth, measured in centimeters The condition toxemia, also known as preeclampsia, is characterized by high blood pressure and protein in urine by the 20th week of pregnancy; left untreated, toxemia can be life-threatening. a) Fit a linear model estimating the association between birth weight and toxemia status. i. Write the model equation. ii. Report a 95% confidence interval for the slope and interpret the interval. b) Using graphical summaries, explore the relationship between birth weight and toxemia sta- tus, birth weight and gestational age, and gestational age and toxemia. Summarize your findings. c) Fit a multiple regression model with toxemia and gestational age as predictors of birth weight. i. Evaluate whether the assumptions for linear regression are reasonably satisfied. ii. Interpret the coefficients of the model, and comment on whether the intercept has a meaningful interpretation. iii. Write the model equation and predict the average birth weight for an infant born to a mother diagnosed with toxemia with gestational age 31 weeks. iv. The simple regression model and multiple regression model disagree regarding the nature of the association between birth weight and toxemia. Briefly explain the reason behind the discrepancy. Which model do you prefer for understanding the relationship between birth weight and toxemia, and why? Problem 4. The National Health and Nutrition Examination Survey (NHANES) is a yearly survey conducted by the US Centers for Disease Control. This question uses the nhanes.samp.adult.500 dataset in the oibiostat package, which consists of information on a subset of 500 individuals ages 21 years and older from the larger NHANES dataset. Poverty (Poverty) is measured as a ratio of family income to poverty guidelines. Smaller num- bers indicate more poverty, and ratios of 5 or larger were recorded as 5. Education (Education) is reported for individuals ages 20 years or older and indicates the highest level of education achieved: either 8th Grade, 9 - 11th Grade, High School, Some College, or College Grad. The variable HomeOwn records whether a participant rents or owns their home; the levels of the variable are Own, Rent, and Other. a) Create a plot showing the association between poverty and educational level. Describe what you see. b) Fit a linear model to predict poverty from educational level. i. Interpret the model coefficients and associated p-values. ii. Assess whether educational level, overall, is associated with poverty. Be sure to include any relevant numerical evidence as part of your answer. c) Create a plot showing the association between poverty and home ownership. Based on what you see, speculate briefly about the home ownership status of individuals who responded with Other. d) Fit a linear model to predict poverty from educational level and home ownership. Comment on whether this model is an improvement from the model in part b). Problem 5. Do men and women think differently about their body weight? To address this question, you will be using data from the Behavioral Risk Factor Surveillance System (BRFSS). The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey of 350,000 people in the United States collected by the Centers for Disease Control and Prevention (CDC). As its name implies, the BRFSS is designed to identify risk factors in the adult population and report emerging health trends. For example, respondents are asked about diet and weekly physical activity, HIV/AIDS status, possible tobacco use, and level of healthcare coverage. The cdc. sample dataset contains data on 500 individuals from a random sample of 20,000 respon- dents to the BRFSS survey conducted in 2000, on the following nine variables: - genhlth: general health status, with categories excellent, very good, good, fair, and poor - exerany: recorded as 1 if the respondent exercised in the past month and 0 otherwise - hlthplan: recorded as 1 if the respondent has some form of health coverage and 0 otherwise - smoke100: recorded as 1 if the respondent has smoked at least 100 cigarettes in their entire life and 0 otherwise - height: height in inches - weight: weight in pounds - wtdesire: desired weight in pounds - age: age in years - gender: gender, recorded as m for male and f for female a) Create a variable called wt. discr that is a measure of the discrepancy between an indi- vidual's desired weight and their actual weight, expressed as a proportion of their actual weight: actual weight - desired weight weight discrepancy = actual weight b) Fit a linear model to predict weight discrepancy from age and gender. Interpret the slope coefficients in the model. c) Investigate whether the association between weight discrepancy and age is different for males versus females. i. Fit a linear model to predict weight discrepancy from age, gender, and the interaction between age and gender. Write the model equation. ii. Write the prediction equation for males and the prediction equation for females. iii. Is there statistically significant evidence of an interaction between age and gender? Explain your answer. d) Comment on whether the results from part c) suggest that men and women think differently about their body weight. Do you find the results surprising; why or why not? Limit your response to at most five sentences. Problem 6. The American Psychological Association defines resilience as "the process of adapting well in the face of adversity, trauma, tragedy, threats or even significant sources of stress". Resilience refers to a person's capacity to resist adversity and is closely related to qualities such as self-confidence and persistence. Studies have suggested that resilience is an important factor in contributing to how medical students perceive their quality of life and educational environment. Survey data were collected from 1,350 students across 25 medical schools in the United States as part of a study examining the life of students and residents in healthcare professions. At each school, 54 students were randomly selected to participate in the study. Participants completed assessments measuring resilience, quality of life, perception of educational environment, depres- sion symptoms, and anxiety symptoms. - Resilience. Higher scores on the resilience assessment are indicative of greater resilience; possible scores range from 14 to 98. The scores are reported according to a standardized scale: very low (14 to 56 points), low (57 to 64 points), moderately low (65 to 73 points), moderately high (74 to 81 points), high (82 to 90 points), and very high (91 to 98 points). - Quality of Life. Quality of life was assessed via three measures: overall quality of life (overall QoL), medical school quality of life (MSQoL), and a questionnaire from the World Health Organization (WHOQOL). For the overall QoL and MSQoL, students were asked to rate, on a scale from 0 to 10 with a higher score indicating better QoL, their overall quality of life and their quality of life in medical school. The WHOQOL is a 26-question survey measuring quality of life in four domains (environment, psychological health, social relationships, and physical health); participant responses to questions such as "Do you have enough energy for everyday life?" and "How well are you able to concentrate?' are converted to a 0 to 100 point score for each domain, with higher scores representing better quality of life. 1 - Educational Environment. Perception of educational environment was assessed via the DREEM questionnaire; possible scores range from 0 to 200, with higher scores representing a more positive perception about educational environment. Questions include "I feel I am being well prepared for my profession" and "The atmosphere motivates me as a learner". 2 - Depression Symptoms. The BDI questionnaire was used to assess depressive symptoms. Pos- sible scores vary from 0 to 63, with higher scores indicating either more numerous or more severe depressive symptoms: no depressive symptoms (0 to 9 points), mild depressive symp- toms (10 to 17 points), moderate depressive symptoms (18 to 29 points), severe depressive symptoms (30 to 63 points). - Anxiety Symptoms. Anxiety symptoms were assessed based on two dimensions of anxiety: state anxiety (feelings of anxiety arising specifically when faced with a stressful event) and trait anxiety (feelings of anxiety on a daily basis). Possible scores range from 20 to 80 points. A score of 50 or higher for either state anxiety or trait anxiety (or both) is considered indica- tive of an anxiety disorder. 1 Participants chose from: "Not at all", "A little", "A moderate amount", "Very much", "An extreme amount". 2 Participants chose from: "Strongly agree", "Agree", "Neutral", "Disagree", "Strongly disagree". Information was also collected on participant age and sex. Year in medical school was recorded as current level of training. The first two years of medical school are focused on basic science education (pre-clinical curriculum) and the last two years consist of rotations in clinical settings (clinical curriculum). After medical school, students undergo residency training in which they work as practicing physicians under the supervision of a senior clinician. Data from the study are in the file resilience. Rdata. The following table provides a list of the variables in the dataset and their descriptions. Variable Description age age in years sex sex, coded female for female and male for male train level of training, either pre-clinical, clinical, or residency res resilience level, either VeryHigh, High, ModHigh, ModLow, Low, or VeryLow qol. overall overall quality of life score, on 0-10 point scale qol medical quality of life in medical school score, on 0-10 point scale whoqol. phys WHOQOL score for physical health domain, on 0-100 point scale whoqol psych WHOQOL score for psychological health domain, on 0-100 point scale whoqol.soc WHOQOL score for social relationships domain, on 0-100 point scale whoqol.env WHOQOL score for environmental domain, on 0-100 point scale dreem DREEM score, on 0-200 point scale bdi BDI score, on 0-63 point scale anxiety . state state anxiety score, on 20-80 point scale anxiety. trait trait anxiety score, on 20-80 point scale Use the data to answer the following questions. a) Briefly summarize features of the study participants with respect to the variables age, sex, and train. Reference appropriate graphical and numerical summaries as needed. b) Participants were asked to rate their overall quality of life and their medical school quality of life, each on a 0-10 point scale. i. Create a plot illustrating the difference between perception of overall QoL and percep- tion of MSQoL. Describe what you see. ii. Conduct a formal statistical comparison of overall QoL score and MSQoL score. Sum- marize your findings. c) Investigate the relationship between resilience and level of training. i. Prior to conducting any analysis, comment briefly on whether you think there may or may not be an association between resilience and level of training, and explain your reasoning. Limit your answer to at most five sentences. ii. Formally assess whether there is evidence of an association between resilience and level of training. Summarize your findings. d) Investigate the relationship between resilience and severity of depressive symptoms. i. Create a plot illustrating the relationship between resilience and depressive symptoms. Describe what you see. ii. Conduct a formal analysis of the relationship between resilience and depressive symp- toms. Summarize your findings. You may proceed with the analysis method you choose even if the assumptions do not seem to be reasonably satisfied; i.e., it is not necessary to check assumptions for this sub-question. e) Investigate the association between resilience and quality of life as measured by the psycho- logical health domain of the WHOQOL. i. Without adjusting for any potential confounders, fit a model estimating the association between resilience and WHOQOL score in the psychological health domain. Describe the nature of the association. ii. Is there evidence that resilience overall is a useful variable for predicting WHOQOL score in the psychological health domain? Explain your answer. iii. Report and interpret the model R² for the model fit in part i. iv. After adjusting for the potential confounders of age, sex, training level, and BDI score, would you describe the apparent association from part i. any differently? Explain your answer. V. Evaluate the assumptions behind the analysis from part iv. vi. Calculate the predicted mean psychological health WHOQOL score for a 21-year-old female with moderately high resilience who scored 4.00 points on the BDI and is in her third year of medical school. f) Anxiety scores were reported separately for state anxiety and trait anxiety. i. Create a new variable, inxiety.disorder, that records whether an individual qualifies as having an anxiety disorder, based on the values of anxiety.s state and anxiety. trait. Briefly explain the logic behind the code you use to create anxiety.disorder. ii. Report the number of individuals that qualify as having an anxiety disorder. g) Fit a model estimating the association between resilience and perception of educational en- vironment, after adjusting for age, sex, training level, BDI score, and presence of an anxiety disorder. In no more than three sentences, summarize the main finding(s). h) A New York Times reporter is potentially interested in writing a piece about the research you have conducted. They have requested that you prepare a short statement, no more than ten sentences long. Selectively drawing from the results of your analyses, summarize the main conclusions about the relationship between resilience and perceptions of quality of life and educational environment for medical students and residents. Be sure to use language accessible to a general audience. You do not need to reference specific numerical results/models from the analysis, but you may choose to do so if you like.

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

    By purchasing this solution you'll be able to access the following files:
    Solution1.zip and Solution2.pdf.

    $90.00
    for this solution

    PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

    Find A Tutor

    View available Statistics-R Programming Tutors

    Get College Homework Help.

    Are you sure you don't want to upload any files?

    Fast tutor response requires as much info as possible.

    Decision:
    Upload a file
    Continue without uploading

    SUBMIT YOUR HOMEWORK
    We couldn't find that subject.
    Please select the best match from the list below.

    We'll send you an email right away. If it's not in your inbox, check your spam folder.

    • 1
    • 2
    • 3
    Live Chats