QuestionQuestion

In this homework assignment, you will explore Auto-MPG Dataset.
Dataset contains following attributes:
1. mpg (miles per gallon)
2. cylinders (number of cylinders, power unit of an engine)
3. displacement (total volume of all the cylinders in an engine, measured in cubic centimeters [cc])
4. horsepower: (the amount of power an engine develops)
5. weight: (weight of the car)
6. acceleration: (accelaration of the car)
7. year: (model year of the car, two digits representing the year from 19**)
8. origin: (shows the origin of the car, 1 for American, 2 for European and 3 for Asian)
9. car name: (unique name for each car)
You will explore the data types and scales, cardinalities, number of missing values, detect outliers, handle m create data quality report for original and cleaned dataset.
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

Read the dataset
adf = pd.read_csv('auto-mpg.csv')
Identify the data types (numerical [int, oat], categorical) and data scales for all the attributes.

Q1 (10 points)
# Answer to Q1
Attribute       Data Type    Data Scale
mpg                     ?                      ?
displacement       ?                      ?
horsepower         ?                      ?
weight                  ?                      ?
year                     ?                      ?
origin                   ?                      ?
carname             ?                      ?

Q2
Identify the cardinalities (number of unique values) and number of missing values for each attribute

Q3
Visualize the distribution of each attribute (other than carname, since it is unique). Note here that for nomi bar plots. For ratio and interval scale attributes, use histograms.

Q4
Using your favorite outlier detection method, identify the outliers for each attribute (other than year, origin, remove the outlier or replace with a default value.

Q5
Handle the missing values you found in Q2 using kNN imputation. Use KNNImputer from sklearn.impute neighbors to 3 and use the column subset of ['cylinders' , displacement' , weight'] for imputation.

Q6 (20 points)
Create a Scatter Plot Matrix (a pair plot) of attributes. Use origin map plot aspects to different colors.
Q6.a - What can you say about the relationship between cylinders and mpg values?
Q6.b - What can you say about the cylinders of Asian cars (origin = 3)?
Q6.c - Is there a correlation between weight and displacement?

Question
Create a data quality report for the Auto-MPG dataset.
Provide the data quality tables, distributions of categorical and nominal variables.
Also provide your solutions for handling outliers and missing values.
Create the data quality tables after handling outliers and missing values.

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
       "name": "ipython",
       "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.3"
    },
    "colab": {
      "name": "HW2_assignment.ipynb",
      "provenance": [],
      "collapsed_sections": []
    }
},
"cells": [
    {
      "cell_type": "markdown",
      "metadata": {
       "id": "_yRd_DqZuwFe",
       "colab_type": "text"
      },
      "source": [
       "# Homework 2\n",
       "In this homework assignment, you will explore Auto-MPG Dataset. \n",
       "\n",
       "Dataset contains following attributes:\n",
       "1. mpg (miles per gallon)\n",
       "2. cylinders (number of cylinders, power unit of an engine)\n",
       "3. displacement (total volume of all the cylinders in an engine, measured in cubic centimeters [cc])\n",
       "4. horsepower: (the amount of power an engine develops)\n",
       "5. weight: (weight of the car)\n",
       "6. acceleration: (accelaration of the car)\n",
       "7. year: (model year of the car, two digits representing the year from 19**)\n",
       "8. origin: (shows the origin of the car, 1 for American, 2 for European and 3 for Asian)\n",
       "9. car name: (unique name for each car)\n",
       "\n",
       "You will explore the data types and scales, cardinalities, number of missing values, detect outliers, handle missing values and outliers and create data quality report for original and cleaned dataset."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
       "id": "YmzAvjL4uwFh",
       "colab_type": "code",
       "colab": {}
      },
      "source": [
       "%matplotlib inline\n",
       "import pandas as pd\n",
       "import numpy as np\n",
       "import matplotlib\n",
       "import matplotlib.pyplot as plt"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
       "id": "d5QmcJxcuwFl",
       "colab_type": "text"
      },
      "source": [
       "### Read the dataset"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
       "id": "w3GlTCsmuwFm",
       "colab_type": "code",
       "colab": {}
      },
      "source": [
       "adf = pd.read_csv('auto-mpg.csv')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
       "id": "M19pfddVuwFp",
       "colab_type": "text"
      },
      "source": [
       "### Q1 (10 points)\n",
       "Identify the data types (numerical [int, float], categorical) and data scales for all the attributes."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
       "id": "Jvz89H-FuwFq",
       "colab_type": "code...
$25.00 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Computer Science - Other Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats