 Hypothesis Tests on Data Frames

Subject Mathematics Statistics-R Programming

Question

Part 1
Merge the most relevant data found in the 3 tables (golub.gnames, golub, and golub.cl) that make-up the golub data in the library(multtest) into one data.frame with the following properties:

Name: golub.df
Dimensions: patient rows and named gene columns, and an additional named column for the cancer classifications
Column Names: use the gene name (column 2) from golub.gnames and “classification”

Classification Column: use a factor column in golub.df that uses "ALL" and "AML" as the classifications

Part 2
Answer the Chapter 3 Exercise 9 and a new Exercise 8 below using your new golub.df data.frame. Try not to cheat by =’the answers in the book – unless you get really stuck.

Exercise (Re-worded so as to use your golub.df dataframe)
a) Perform a hypothesis test to see if the distribution of the expression values for the Zyxin gene for the ALL patients are normally distributed.
b) Plot the PDF of N(0.3, 0.752) on top of the histogram of the distribution of the expression values for the Zyxin gene for the ALL patients, for the range -2 < x < 2. Does it look like N(0.3, 0.752) models the data well?
c) Extra credit: Perform a hypothesis test to see if the distribution of the expression values for the Zyxin gene for the ALL patients are distributed according to N(0.3, 0.752).

Part 3
Answer the Chapter 4 Exercises 1, 3, 6, 8, and 10 using your new golub.df data.frame. Try not to cheat by reformulating the answers in the book – unless you get really stuck.

Notes and Hints:
1. In Parts 2 and 3, use only your golub.df data.frame from Part 1 to answer the questions. (Do not use the original golub, golub.cl, and golub.gnames matrices and vector.)
2. Don't use the gene index or gene ID (columns 1 and 3 in golub.gnames) in your new data.frame. Just have named gene columns and one extra named column for the cancer classification.

Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

data(golub, package = "multtest")

dim(golub)
#this returns 3051 by 38

#let's transpose the matrix
golubt = t(golub)

dim(golubt)

#this returns 38 by 3051

#this creates a data frame filled with NAs
golub.df = as.data.frame(matrix(nrow=38,ncol=3052))

#this fills the first 3051 columns of the data frame with the
#transposed Golub data
golub.df[1:38,1:3051]=golubt

#here are the factors of classifications calculated on the
#class file
gol.fac <- factor(golub.cl,levels=0:1, labels= c("ALL","AML"))...

This is only a preview of the solution. Please use the purchase button to see the entire solution

Related Homework Solutions

Statistics Questions \$18.00
Statistics
Data Sets
Mathematics
Linear Regression
ANOVA
Test for Normality
Functions
R Programming
Normal Distribution
Variables
Codes
P-Values
Estimation
Coefficients
Statistics & R Programming Questions \$30.00
Statistics
Mathematics
Computing
R Programming
Functions
Data
Min
Max
Average
Scores
Assessment
Points
Variables
Vectors
Statistics Questions \$23.00
Statistics
Mathematics
Binomial Distribution
Tables
Programming Statements
Probability
Factory
Defective Products
Population Mean
Standard Deviation
Samples
Excel
Statistics & R Programming Questions \$80.00
Statistics
Mathematics
R Programming
Samples
Data
MSE
Functions
Residuals
Comparison
Homogeneous Variance
ANOVA
Observations
Relative Efficiency
Factors
Categorical Independent Variables \$28.00
mathematics
Statistics
R
Programming
Categorical
Independent
Variables
Kindergarten
Data Manipulation and Plots Using R-Programming \$28.00
Mathematics
Statistics
R-Programming
Plots
Transport
Gender
Tables
Variables
Factors
Levels
Data Frames
Codes
Functions
Live Chats