Subject Mathematics Statistics-R Programming

Question

4 K-Nearest Neighbors
 Brie
y describe the method in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Use V -fold cross-validation to choose a value of k.
 Plot the cross-validation error against k.
 State the value of k chosen and the cross-validation error for this k.

5 QDA, LDA and FDA
 Brie
y describe the methods in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Find the V -fold cross-validation error for QDA, LDA and FDA.
 Use the LDA and FDA coefficients to assess which variables increase the chances of diabetes.

6 Classification Trees
 Briefly describe the method in your own words. State any assumptions used and whether you
think these assumptions are violated. Include relevant formula.
 Find the V -fold cross-validation error for the classification tree t using the default settings in rpart.
 Interpret the fitted tree using all of the data. Interpret the importance of each variable and the role each variable plays in predicting diabetes.

7 Logistic Regression
 Briefly describe the method in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Use appropriate methods discussed in class to select a model.
 Find the V -fold cross-validation error for this model.
 Use the selected logistic regression model to interpret the role each variable plays in predicting diabetes.

Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

Data Description:-
1. Title: Applying different statistical models on hospital data
2. Source of data: - This dataset (pid.dat) has been extracted from a combined dataset of several United State (US) hospitals. The aim for the collection was to determine the risk factors involved with diabetes.
3. Attribute Information: From these hospitals of united state various types of measurements were taken from total 392 patients. The variables that has been collected are: - 1) pregnant: frequency of patient’s pregnancy.
2) Glucose: the patient's plasma glucose concentration.
3) Pressure: the patient’s blood pressure (B.P.) (mm Hg).
4) Triceps: the patient's triceps thickness (mm).
5) Insulin: the patient's serum insulin (mu U/ml).
6) Mass body mass index: the patient's weight(kg) divided by the height
7) Pedigree: the patient's diabetes pedigree function.
8) Age: the patient's age in years.
9) Diabetes: Class variable (“pos" or “neg").
3. Missing Attribute Values: None...

This is only a preview of the solution. Please use the purchase button to see the entire solution

Related Homework Solutions

Statistical Analysis Using R-Programming
Homework Solution
$98.00
Statistical Analysis
Gene Expression Values
K-Means Cluster
Patient Groups
Biostatistics
R-Programming
Confidence Intervals
Estimation
Functions
Marginal Independence
Euclidean Distance
Statistics Questions
Homework Solution
$43.00
Statistics
Mathematics
R Programming
Cluster Analysis
Golub Data
Patient Groups
Research
Cluster Means
Codes
Graphs
Distances
Gene Filters
Correlation Matrix
Outliers
R Programming Problems
Homework Solution
$48.00
Statistics
Mathematics
Computer Science
R Programming
Data Sets
Plots
Statistical Procedures
Functions
Weight
Permutation Test
Variables
Chi-Squared
Codes
R Programming Problems
Homework Solution
$68.00
Statistics
Mathematics
R Programming
Bootstrap Median
Variables
Confidence Intervals
Distribution
Histogram
QQ Plot
Data Sets
Probability
Empirical Lambda
Repetitions
Functions
Statistics & R Programming Problem
Homework Solution
$50.00
Mathematics
Statistics
R Programming
Normal Distribution
EM Algorithm
MLE
Augmented Variables
Equations
Computing
Conditions
Statements
Parameters
R Programming Problems
Homework Solution
$78.00
Statistics
Mathematics
R Programming
ANOVA
Data Sets
Cancer Patients
Functions
Variance
Coefficients
Clusters
Molecules
Cell Differentiation
Genes
Cross Validation
Classification
Get help from a qualified tutor
Live Chats