{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Homework 1"
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Description\n",
"\n",
"As the first homework of this course, you will apply the knowledge you have learned in the first lecture on `linear models` and simple extensions using `kernel` methods on some datasets. This is also a chance to learn to use `jupyter notebook` and basic `sklearn` functionalities. A basic introduction to installing and using `jupyter notebooks` is found [here](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook). You are strongly encouraged to use python 3 instead of python 2\n",
"\n",
"\n",
"The goals of this homework are\n",
" * Learn basic data processing and machine learning using python\n",
" * Reinforce knowledge on linear models and kernel methods\n",
" * Prepare for the project in this course\n",
" \n",
"Whenever you are stuck, you are encouraged to look at the jupyter notebook demonstration for the first lecture for some guidance. You should also learn to look at documentation of online libraries. For example, if you want to perform *kernel ridge regression* using `sklearn`, a quick google will land you on [this page](https://scikit-learn.org/stable/modules/kernel_ridge.html), where extensive description and examples are given."
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Instructions\n",
"\n",
"Please complete each question below directly in this notebook. Note that you can write in the cells to describe what you are doing and any issues you face. If you are familiar, you can also format it using `markdown` as it is supported by `jupyter`. \n",
"\n",
"The grade given is based on the following\n",
" 1. Scientific correctness of your approach\n",
" 2. Clear documentation of your approach"
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Dataset\n",
"\n",
"We will work on a standard regression dataset on predicting concrete compressive strength from various composition and processing properties. \n",
"\n",
"We will download the dataset directly from the [UCI repository](https://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength) using `urllib`. Running it requires an internet connection. Otherwise, you can also download the `Concrete_Data.xls` directly from the link above and place into the root folder relative to this notebook. \n",
"\n",
"Reading the Excel file requires the `xlrd` library on top of `pandas`. If you see an error due to not having xlrd, install it via\n",
"```\n",
"$pip install xlrd\n",
"```\n",
"\n",
"In general, if you see an error \n",
"```\n",
"ModuleNotFoundError: No module named 'some_module'\n",
"```\n",
"just install it\n",
"```\n",
"$pip install some_module\n",
"```"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 1,
"source": [
"import pandas as pd\n",
"import urllib.request"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 2,
"source": [
"url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/Concrete_Data.xls'\n",
"urllib.request.urlretrieve(url, './Concrete_Data.xls')\n",
"data = pd.read_excel('./Concrete_Data.xls')"
],
"outputs": [],
"metadata": {}
},

1. (10 points) Derive the update equations when the hidden units u...