## Transcribed Text

Exercise 1.
Consider the Hidalgo data set. Load the data, and calculate mean and median.
Part a.
Calculate a jackknife estimate of the mean, and jackknife standard error of the mean, of these data. Are
these values what you expect?
Part b.
Calculate a jackknife estimate of the median, and jackknife standard error of the median, of these data. Are
these values what you expect?
Part c.
Calculate a bootstrap estimate of the mean, and of the median, of the Hidalgo data. Use B = 1000 samples.
Part d.
When data are normally distributed, and for large samples, the standard error of the median can be
approximated by
s.e.med = 1.253 × s.e.mean
where s.e.mean = σ/√
n.
How do the jackknife and bootstrap estimates of standard error compare to the parametric estimates?
Exercise 2.
Consider the following data:
NATR332.DAT <- data.frame(
Y1 = c(146,141,135,142,140,143,138,137,142,136),
Y2 = c(141,143,139,139,140,141,138,140,142,138)
)
Let θ be the ratio of the two population means:
θ =
µY 1
µY 2
1
Calculate jackknife and bootstrap estimates for θb, and for the standard error for θb.
Part a. Jacknife.
Part b. Bootstrap.
Exercise 3
Part a.
Consider the ELO data. Subset the data to exclude non-qualifiers - NQ - then create a factor AA. This
will indicate if the wrestler that as All-American (top 8 places), or did not place in the tournament. Use
ActualFinish equals AA. Next, calculate an effect size d for the difference in ELO scores between All-American
and non-All-American wrestlers; you will need to calculate means and standard deviations as necessary. Since
the populations are unbalanced, you will need to use a pooled sd of the form
spooled =
s
(n1 − 1)s
2
1 + (n2 − 1)s
2
2
n1 + n2 − 2
Part b.
Calculate jackknife and bootstrap estimates of the error of d. Since ELO is determined by a wrestlers success
within a weight class, you will need to honor this grouping (or sampling) of the data. Calculate the jackknife
by excluding one Weight at a time from the data, and recalculating d; since there are 10 weight classes there
should be 10 jackknife replicates.
For the bootstrap, sample from the 10 weight classes (use unique or levels). Note that you will not be able
simply subset the data on something like Weight %in% samples, since the bootstrap will require duplicate
samples. Instead, iterate over weight class samples and merge subsets of the original data.
Part c.
Compare your estimates of standard error to the parametric estimate, approximated by
s.e.d
s
n1 + n2
n1n2
+
d
2
2(n1 + n2)
Exercise 4
Consider the data for U.S. Wholesale price for pumpkins 2018 in pumpkins.csv.
Part a.
Load the data, and calculate the F test and the parametric P(> F) using the code below. (set eval=TRUE).
summary(aov(Price ~ Class, data=pumpkins.dat))
2
Part b.
Permute Price over Class - that assume create a new data set on the assumption that Class has not
influence on Price. Do this 1000 times, and calculate the F ratio for each. Plot the distribution of F, and
calculate how many F are greater than the F from part a. How does this compare with the parametric
estimate for P(> F? Do you need to increase the number of permutations?
Part c.
Repeat part b, but this time, honor the Week grouping. That is, permute Price over Class only within
observations grouped by Week. Compare this to
summary(aov(Price ~ Class + as.factor(Week), data=pumpkins.dat))
Which are more appropriate for these data?

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

# Exercise 1.

hidalgo.dat <- matrix(unlist(read.table("hidalgo.dat", header=F)))

mean(hidalgo.dat)

# 0.08602474

median(hidalgo.dat)

# 0.08

# part a.

n <- length(hidalgo.dat)

jackknifed.means <- sapply(1:n, function(i) {

mean(hidalgo.dat[-i])

})

mean.jackknifed.mean <- mean(jackknifed.means)

# [1] 0.08602474

se.jackknifed <- sqrt(sum((jackknifed.means-mean(jackknifed.means))^2)*(n-1)/n)

# [1] 0.0006794796

sd(hidalgo.dat)/sqrt(n)

# [1] 0.0006794796

# Jackkife mean is unbiased estimator for the population mean

# So, its mean and standard error are identical.

# part b.

jackknifed.medians <- sapply(1:n, function(i) {

median(hidalgo.dat[-i])

})...