 # Statistics Report

Subject Mathematics Statistics-R Programming

## Question

4 K-Nearest Neighbors
 Brie
y describe the method in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Use V -fold cross-validation to choose a value of k.
 Plot the cross-validation error against k.
 State the value of k chosen and the cross-validation error for this k.

5 QDA, LDA and FDA
 Brie
y describe the methods in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Find the V -fold cross-validation error for QDA, LDA and FDA.
 Use the LDA and FDA coefficients to assess which variables increase the chances of diabetes.

6 Classification Trees
 Briefly describe the method in your own words. State any assumptions used and whether you
think these assumptions are violated. Include relevant formula.
 Find the V -fold cross-validation error for the classification tree t using the default settings in rpart.
 Interpret the fitted tree using all of the data. Interpret the importance of each variable and the role each variable plays in predicting diabetes.

7 Logistic Regression
 Briefly describe the method in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Use appropriate methods discussed in class to select a model.
 Find the V -fold cross-validation error for this model.
 Use the selected logistic regression model to interpret the role each variable plays in predicting diabetes.

## Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

Data Description:-
1. Title: Applying different statistical models on hospital data
2. Source of data: - This dataset (pid.dat) has been extracted from a combined dataset of several United State (US) hospitals. The aim for the collection was to determine the risk factors involved with diabetes.
3. Attribute Information: From these hospitals of united state various types of measurements were taken from total 392 patients. The variables that has been collected are: - 1) pregnant: frequency of patient’s pregnancy.
2) Glucose: the patient's plasma glucose concentration.
3) Pressure: the patient’s blood pressure (B.P.) (mm Hg).
4) Triceps: the patient's triceps thickness (mm).
5) Insulin: the patient's serum insulin (mu U/ml).
6) Mass body mass index: the patient's weight(kg) divided by the height
7) Pedigree: the patient's diabetes pedigree function.
8) Age: the patient's age in years.
9) Diabetes: Class variable (“pos" or “neg").
3. Missing Attribute Values: None...

This is only a preview of the solution. Please use the purchase button to see the entire solution

## Related Homework Solutions

Statistical Analysis Using R-Programming \$98.00
Statistical Analysis
Gene Expression Values
K-Means Cluster
Patient Groups
Biostatistics
R-Programming
Confidence Intervals
Estimation
Functions
Marginal Independence
Euclidean Distance
Statistics Questions \$43.00
Statistics
Mathematics
R Programming
Cluster Analysis
Golub Data
Patient Groups
Research
Cluster Means
Codes
Graphs
Distances
Gene Filters
Correlation Matrix
Outliers
R Programming Problems \$48.00
Statistics
Mathematics
Computer Science
R Programming
Data Sets
Plots
Statistical Procedures
Functions
Weight
Permutation Test
Variables
Chi-Squared
Codes
R Programming Problems \$68.00
Statistics
Mathematics
R Programming
Bootstrap Median
Variables
Confidence Intervals
Distribution
Histogram
QQ Plot
Data Sets
Probability
Empirical Lambda
Repetitions
Functions
Statistics & R Programming Problem \$50.00
Mathematics
Statistics
R Programming
Normal Distribution
EM Algorithm
MLE
Augmented Variables
Equations
Computing
Conditions
Statements
Parameters
R Programming Problems \$78.00
Statistics
Mathematics
R Programming
ANOVA
Data Sets
Cancer Patients
Functions
Variance
Coefficients
Clusters
Molecules
Cell Differentiation
Genes
Cross Validation
Classification
Live Chats