 # Statistics Report

Subject Mathematics Statistics-R Programming

## Question

4 K-Nearest Neighbors
 Brie
y describe the method in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Use V -fold cross-validation to choose a value of k.
 Plot the cross-validation error against k.
 State the value of k chosen and the cross-validation error for this k.

5 QDA, LDA and FDA
 Brie
y describe the methods in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Find the V -fold cross-validation error for QDA, LDA and FDA.
 Use the LDA and FDA coefficients to assess which variables increase the chances of diabetes.

6 Classification Trees
 Briefly describe the method in your own words. State any assumptions used and whether you
think these assumptions are violated. Include relevant formula.
 Find the V -fold cross-validation error for the classification tree t using the default settings in rpart.
 Interpret the fitted tree using all of the data. Interpret the importance of each variable and the role each variable plays in predicting diabetes.

7 Logistic Regression
 Briefly describe the method in your own words. State any assumptions used and whether you think these assumptions are violated. Include relevant formula.
 Use appropriate methods discussed in class to select a model.
 Find the V -fold cross-validation error for this model.
 Use the selected logistic regression model to interpret the role each variable plays in predicting diabetes.

## Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

Data Description:-
1. Title: Applying different statistical models on hospital data
2. Source of data: - This dataset (pid.dat) has been extracted from a combined dataset of several United State (US) hospitals. The aim for the collection was to determine the risk factors involved with diabetes.
3. Attribute Information: From these hospitals of united state various types of measurements were taken from total 392 patients. The variables that has been collected are: - 1) pregnant: frequency of patient’s pregnancy.
2) Glucose: the patient's plasma glucose concentration.
3) Pressure: the patient’s blood pressure (B.P.) (mm Hg).
4) Triceps: the patient's triceps thickness (mm).
5) Insulin: the patient's serum insulin (mu U/ml).
6) Mass body mass index: the patient's weight(kg) divided by the height
7) Pedigree: the patient's diabetes pedigree function.
8) Age: the patient's age in years.
9) Diabetes: Class variable (“pos" or “neg").
3. Missing Attribute Values: None...

This is only a preview of the solution. Please use the purchase button to see the entire solution

## Related Homework Solutions

R Programming Problems \$35.00
Mathematics
Statistics
R Programming
Tables
Information
Airfare Report
Datasets
Transportation
Scripts
Variables
Missing Values
Passengers
Patterns
Series of Coordinates in R \$68.00
Computer Science
R Programming
Series
Coordinates
Cartesian Plane
Third Dimension
Euclidean Space
Text File
Tables
Matrices
Arrays
Formulas
Practical Aspects of Database Design \$75.00
Computer Science
R Programming
Database Design
NASA
Text Mining
Cluster Analysis
K-Means
Time Series
Forecast
Seasonal Components
Plots
Trends
Packages
R Programming Questions \$40.00
Computer Science
R Programming
Database
Information
Classes
Variables
Blood Pressure
Serum Insulin
Pedigree Function
Distribution
Data Split
Statistics Questions \$38.00
Mathematics
Statistics
Binomial Probability Distribution
Tabular Form
Commercial Fisherman
Sample Mean
Population
Standard Deviation
Plotting in R \$10.00
Computer Science
Statistics
R Programming
Density
Functions
Poisson
Binomial
Scatter Plot
Distributions
Live Chats