 # Statistics Problems Using R Programming

## Transcribed Text

Exercise 1. Consider the Hidalgo data set. Load the data, and calculate mean and median. Part a. Calculate a jackknife estimate of the mean, and jackknife standard error of the mean, of these data. Are these values what you expect? Part b. Calculate a jackknife estimate of the median, and jackknife standard error of the median, of these data. Are these values what you expect? Part c. Calculate a bootstrap estimate of the mean, and of the median, of the Hidalgo data. Use B = 1000 samples. Part d. When data are normally distributed, and for large samples, the standard error of the median can be approximated by s.e.med = 1.253 × s.e.mean where s.e.mean = σ/√ n. How do the jackknife and bootstrap estimates of standard error compare to the parametric estimates? Exercise 2. Consider the following data: NATR332.DAT <- data.frame( Y1 = c(146,141,135,142,140,143,138,137,142,136), Y2 = c(141,143,139,139,140,141,138,140,142,138) ) Let θ be the ratio of the two population means: θ = µY 1 µY 2 1 Calculate jackknife and bootstrap estimates for θb, and for the standard error for θb. Part a. Jacknife. Part b. Bootstrap. Exercise 3 Part a. Consider the ELO data. Subset the data to exclude non-qualifiers - NQ - then create a factor AA. This will indicate if the wrestler that as All-American (top 8 places), or did not place in the tournament. Use ActualFinish equals AA. Next, calculate an effect size d for the difference in ELO scores between All-American and non-All-American wrestlers; you will need to calculate means and standard deviations as necessary. Since the populations are unbalanced, you will need to use a pooled sd of the form spooled = s (n1 − 1)s 2 1 + (n2 − 1)s 2 2 n1 + n2 − 2 Part b. Calculate jackknife and bootstrap estimates of the error of d. Since ELO is determined by a wrestlers success within a weight class, you will need to honor this grouping (or sampling) of the data. Calculate the jackknife by excluding one Weight at a time from the data, and recalculating d; since there are 10 weight classes there should be 10 jackknife replicates. For the bootstrap, sample from the 10 weight classes (use unique or levels). Note that you will not be able simply subset the data on something like Weight %in% samples, since the bootstrap will require duplicate samples. Instead, iterate over weight class samples and merge subsets of the original data. Part c. Compare your estimates of standard error to the parametric estimate, approximated by s.e.d s n1 + n2 n1n2 + d 2 2(n1 + n2) Exercise 4 Consider the data for U.S. Wholesale price for pumpkins 2018 in pumpkins.csv. Part a. Load the data, and calculate the F test and the parametric P(> F) using the code below. (set eval=TRUE). summary(aov(Price ~ Class, data=pumpkins.dat)) 2 Part b. Permute Price over Class - that assume create a new data set on the assumption that Class has not influence on Price. Do this 1000 times, and calculate the F ratio for each. Plot the distribution of F, and calculate how many F are greater than the F from part a. How does this compare with the parametric estimate for P(> F? Do you need to increase the number of permutations? Part c. Repeat part b, but this time, honor the Week grouping. That is, permute Price over Class only within observations grouped by Week. Compare this to summary(aov(Price ~ Class + as.factor(Week), data=pumpkins.dat)) Which are more appropriate for these data?

## Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

# Exercise 1.
hidalgo.dat <- matrix(unlist(read.table("hidalgo.dat", header=F)))
mean(hidalgo.dat)
# 0.08602474
median(hidalgo.dat)
# 0.08

# part a.
n <- length(hidalgo.dat)
jackknifed.means <- sapply(1:n, function(i) {
mean(hidalgo.dat[-i])
})
mean.jackknifed.mean <- mean(jackknifed.means)
#  0.08602474
se.jackknifed <- sqrt(sum((jackknifed.means-mean(jackknifed.means))^2)*(n-1)/n)
#  0.0006794796
sd(hidalgo.dat)/sqrt(n)
#  0.0006794796

# Jackkife mean is unbiased estimator for the population mean
# So, its mean and standard error are identical.

# part b.
jackknifed.medians <- sapply(1:n, function(i) {
median(hidalgo.dat[-i])
})...
\$58.50 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

### Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

• 1
• 2
• 3
Live Chats