## Question 1

Is type of fish consumption ("fishpart") independent of "fisherman" classification?

```{r}
dat<-read.csv("fishermen_mercury.csv")
with(dat, boxplot(fishpart ~ fisherman))
```

From the boxplot, it looks there is a difference in the distribution of fishpart in the different category of fisherman.

## Question 2

Are the data consistent with the null hypothesis that the mean of total mercury is equal in the fisherman and non-fisherman population?

```{r}
res <- aov(fishpart ~ fisherman, data=dat)
summary(res)
```

p-value is less than 0.05 level of significance, it gives an evidence to reject the null hypothesis that the mean of total mercury is equal in the fisherman and non-fisherman population.

## Question 3

* Please fit a regression model "TotHg"" on the numeric variables "fishmlwk" and "weight" and the categorical variable "fishpart".

* Please display a pairs-type plot for these variables.

* Also assess the diagnostic plots for the model.

* Would you consider removing any data points?

```{r}
model <- lm(TotHg ~ fishmlwk + weight + fishpart, data = dat)
pairs(dat[,c("TotHg", "fishmlwk","weight","fishpart")])
par(mfrow=c(2,2))
plot(model, ask = FALSE)
```

Residual vs fitted plot shows a consistant variance, however qqplot shows the residual is almost normal except few points are deviated from the normal. The cooks distance of observation 7 is far and a possible outlier in the data to removed.

## Question 4

* Please fit a regression model "log(TotHg)" on the numeric variables "fishmlwk" and "weight" and the categorical variable "fishpart".

Exercise 1. Load the ncaa2018.csv data set and create histograms,...