 Statistics Questions

Subject Mathematics Statistics-R Programming

Question

1- Cluster analysis on the ”Zyxin” expression values of the Golub et al. (1999) data.
(a) Produce a chatter plot of the gene expression values using showing different symbols for the two groups.
(b) Use single linkage cluster analysis to see whether the three indicates two different groups.
(c) Use k-means cluster analysis. Are the two clusters according to the diagnosis of the patient groups?
(d) Perform a bootstrap on the cluster means. You will have to modify the code here and there. Do the confidence intervals for the cluster means overlap?

2 - Close to CCND3 Cyclin D3. Recall that we did various analysis on the expression data of the CCND3 Cyclin D3 gene of the Golub (1999) data.
(a) Use gene filter to find the ten closed genes to the expression values of CCND3 Cyclin D3. Give their probe as well as their biological names.
(b) Produce of combined boxplot separately for the ALL and the AML expression values. Compare it with that on the basis of CCND3 Cyclin D3 and comment of the similarities.
(c) Compare the smallest distances with those among the Cyclingenes computed above. What is your conclusion?

3 - MCM3. In the example on MCM3 a plot shows that there is an outlier.
(a) Plot the data and invent a manner to find the row number of the outlier.
(b) Remove the outlier, test the correlation coefficient. Compare the results to those above.
(c) Perform the bootstrap to construct a confidence interval.

4 - Cluster analysis on part of Golub data.
(a) Select the oncogenes from the Golub data and plot the tree from a single linkage cluster analysis.
(b) Do you observe meaningful clusters.
(c) Select the antigenes and answer the same questions.
(d) select the receptor genes and answer the same questions.

5 - Principal Components Analysis on part of the ALL data.
(a) Construct an expression set with the patients with B-cell in stage B1, B2, and B3. Compute the corresponding ANOVA p-values of all gene expressions. Construct the expression set with the p- values smaller than 0.001. Report the dimensionality of the data matrix with gene expressions.
(b) Are the correlations between the patients positive?
(c) Compute the eigenvalues of the correlation matrix. Report the largest five. Are the first three larger than one?
(d) Program a bootstrap of the largest five eigenvalues. Report the bootstrap 95% confidence intervals and draw relevant conclusions.
(e) Plot the genes in a plot of the first two principal components.

Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden. This is only a preview of the solution. Please use the purchase button to see the entire solution

Related Homework Solutions

Data Driven Management \$38.00
Statistics
Salaries
Datasets
Budget
Finance
Mathematics
Labels
Graphs
R-Programming
Lahman Package
Functions
Statistics Questions \$40.00
Statistics
R-Programming
Computer Science
Mathematics
Data Sets
Wind Reports
Columns
Matrices
Functions
Codes
Regression
ANOVA
Training Sets
Classification
Models
Statistics & R Programming Problem \$50.00
Mathematics
Statistics
R Programming
Normal Distribution
EM Algorithm
MLE
Augmented Variables
Equations
Computing
Conditions
Statements
Parameters
Statistics Questions \$35.00
Statistics
Medical Records
Mathematics
Computer Science
R-Programming
Tables
Studies
Percentage
Sample Size
Intervention
Effect Measures
Alternative Hypothesis
Null Hypothesis
BMI Category
Functions
Stink Bugs \$38.00
Mathematics
Statistics
R
Programming
Stink
Bugs
Analysis
Treatments
Predators
Agriculture
Statistics Questions \$53.00
Mathematics
Statistics
Companies
Random Samples
Standard Deviation
Average Values
Confidence Intervals
Organizations
Functions
Live Chats