 Golub Data In R-Programming

Subject Mathematics Statistics-R Programming

Question

Problem 1
On the Golub et al. (1999) data set, we consider the correlation between the Zyxin gene expression values and each of the gene in the data set
(a)How many of the genes have correlation values less than negative 0.5? (Those genes are highly negatively correlated with Zyxin gene).
(b)Find the gene names for the top five genes that are most negatively correlated with Zyxin gene.
(c) Using the t-test, how many genes are negatively correlated with the Zyxin gene? Use a false discovery rate of 0.05. (Hint: use cor.test() to get the p-values then adjust for FDR. Notice that we want a one-sided test here.)

Problem 2
On the Golub et al. (1999) data set, regress the expression values for the GRO3 GRO3 oncogene on the expression values of the GRO2 GRO2 oncogene.
(a)Is there a statistically significant linear relationship between the two genes’ expression? Use appropriate statistical analysis to make the conclusion. What proportion of the GRO3 GRO3 oncogene expression’s variation can be explained by the regression on GRO2 GRO2 oncogene expression?
(b)Test if the slope parameter is less than 0.5 at the α = 0.05 level.
(c) Find an 80% prediction interval for the GRO3 GRO3 oncogene expression when GRO2 GRO2 oncogene is not expressed (zero expression value).
(d)Check the regression model assumptions. Can we trust the statistical inferences from the regression fit?

Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

# Problem 1
# On the Golub et al. (1999) data set, we consider the correlation between the Zyxin gene
# expression values and each of the gene in the data set

library("multtest")
data(golub)
dim(golub)
#  3051   38
golub <- data.frame(golub)
gol.fac <- factor( golub.cl, levels=0:1, labels=c("ALL","AML"))
# (a)How many of the genes have correlation values less than negative 0.5? (Those genes are
# highly negatively correlated with Zyxin gene).
golub.gnames[2124,]
#  "4847"      "Zyxin"    "X95735_at"
correlations <- apply(golub,1,cor, as.numeric( golub[2124,] ))
correlations.less.than.05 <- correlations < 0.5
sum(correlations.less.than.05)
#  2941
# 2941 gnes

# (b)Find the gene names for the top five genes that are most negatively correlated with
# Zyxin gene.
o <- order(correlations)
golub.gnames[o,][1:5,2]
#  "Macmarcks"
#  "Inducible protein mRNA"
#  "C-myb gene extracted from Human (c-myb) gene, complete primary cds, and five complete alternatively spliced cds"
#  "Oncoprotein 18 (Op18) gene"
#  "54 kDa protein mRNA"...

This is only a preview of the solution. Please use the purchase button to see the entire solution

Related Homework Solutions

R Programming Problems \$30.00
Statistics
Mathematics
R Programming
Temperatures
Data
Probability
Correlation
Research
Results
Tables
Functions
Statistics - R Programming Problems \$63.00
Statistics
Mathematics
R-Programming
Computer Science
Codes
Data Sets
Classification Tree
ROC Curve
Logistic Regression
Matrix
Expression Values
Sensitivity
Support Vector Machine
Functions
Influences of Student Achievement \$48.00
Mathematics
R
Programming
Influences
Student
Achievement
Null
Hypothesis
Variable
R Programming Problems \$33.00
Mathematics
Statistics
R Programming
Baseball Players
Strikes
Scores
Samples
Information
Estimation
Functions
Countries
Standard Errors
Salary
Statistics & R Programming Problem \$50.00
Mathematics
Statistics
R Programming
Normal Distribution
EM Algorithm
MLE
Augmented Variables
Equations
Computing
Conditions
Statements
Parameters
Categorical Independent Variables \$28.00
mathematics
Statistics
R
Programming
Categorical
Independent
Variables
Kindergarten
Live Chats