QuestionQuestion

Transcribed TextTranscribed Text

Emissions.csv is a repository of organic gas and particulate matter (PM) speciation profiles of air pollution sources. Here is the description for each column: PROFILE_CODE: Code of pollution profile PROFILE_NAME Name of pollution profile PROFILE_TYPE: GAS, GAS-VBS, PM, PM-AE6, PM-Simplified, PM-VBS, OTHER MASTER_POLLUTANT: PM, TOG, VOC, SVOC, NMOG, etc. Different category levels for profiles: CATEGORY_LEVEL_1_Generation_Mechanism CATEGORY_LEVEL_2_Sector_Equipment CATEGORY_LEVEL_3_Fuel_Product SPECIES_ID: ID of detected chemical compounds, each number represents a unique chemical compound. WEIGHT_PERCENT: The weight percentage of the detected chemical compound in the same row NOEL a NS the cas Constar 200 ... de 30 501 GAS Constar: - cas Consbar: be .... Dominal * Das de Comta. 3el Dat Constar: : - Coo Er 3d Date - wa - .... Contuc - " * Dasinal ler Detinerg Ces Commun: - 500 . - < ward d * Definerg Ce: GAS Constur: - es Ges $K For example, the information in the red box (profile code 1) means: Profile came from External Combustion Boiler Residual Oil Profile type is Gas. Master pollutant not determined (Most of the other master pollutants are determined) CATEGORY_LEVEL_1 the Generation Mechanism is Combustion CATEGORY_LEVEL_ 2, the Sector Equipment is Boiler CATEGORY_LEVEL_3, the Fuel Product is Residual oil. Profile 1 has 5 rows of data, means there are 5 different chemical compounds detected in profile 1, the percentages of those five detected compounds are relatively: 14% of specie No. 592 o 28% of specie No.281 o 5% of specie No.601 o 11% of specie No.529 o 42% of specie No.465 You are going to implement any machine learning methods that you think good to do the clustering or classification job, to find the relationship between profiles and chemical compound compositions. First, you can do the unsupervised clustering by 3 levels of categories, divide the population or data points into a number of groups suchthat data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. Then, you can do the supervised classification, you will create models that predict a class from input features. Algorithms learn from labeled data. After understanding the data, the algorithm determines which label should be given to new data by associating patterns to the unlabeled new data. This is an open-minded task, information in columns of category level 1, 2 and 3 will be used for machine learning. Whatis more, you can use any possible machine learning algorithms to get the best results. Other grouping methods can be applied as well. For example, you can group profile name into different categories by extracting keywords, like agriculture, petroleum industry, other industry, household uses, traffic emissions, etc. You canuse any machine learning methods to play with the data, toshow your data analysis ability. Explain your ideas, visualize your results in any way that you think good to show. The work should be done with Python 3, any modules such as TensorFlow Keras and PyTorch can be applied.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

{
"cells": [
{
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing essentials packages"
   ]
},
{
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "from sklearn.cluster import KMeans\n",
    "from sklearn.cluster import AgglomerativeClustering\n",
    "from sklearn.cluster import SpectralClustering\n",
    "from sklearn.mixture import GaussianMixture\n",
    "from sklearn.cluster import DBSCAN\n",
    "from sklearn.metrics import silhouette_score\n",
    "from sklearn import preprocessing\n",
    "from sklearn.decomposition import PCA"
   ]
},
{
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.metrics import confusion_matrix\n",
    "from sklearn.metrics import accuracy_score"
   ]
},
{
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Preparation"
   ]
},
{
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
    "name": "stdout",
    "output_type": "stream",
    "text": [
      " PROFILE_CODE                               PROFILE_NAME PROFILE_TYPE \\\n",
      "0         0001 External Combustion Boiler - Residual Oil          GAS   \n",
      "1         0001 External Combustion Boiler - Residual Oil          GAS   \n",
      "2         0001 External Combustion Boiler - Residual Oil          GAS   \n",
      "3         0001 External Combustion Boiler - Residual Oil          GAS   \n",
      "4         0001 External Combustion Boiler - Residual Oil          GAS   \n",
      "\n",
      " MASTER_POLLUTANT CATEGORY_LEVEL_1_Generation_Mechanism \\\n",
      "0             NaN                            Combustion   \n",
      "1             NaN                            Combustion   \n",
      "2             NaN                            Combustion   \n",
      "3             NaN                            Combustion   \n",
      "4             NaN                            Combustion   \n",
      "\n",
      " CATEGORY_LEVEL_2_Sector_Equipment CATEGORY_LEVEL_3_ Fuel_Product \\\n",
      "0                            Boiler                   Residual oil   \n",
      "1                            Boiler                   Residual oil   \n",
      "2                            Boiler                   Residual oil   \n",
      "3                            Boiler                   Residual oil   \n",
      "4                            Boiler                   Residual oil   \n",
      "\n",...

By purchasing this solution you'll be able to access the following files:
Solution.ipynb.

$75.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Computer Science - Other Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats