QuestionQuestion

Statistical Programming

Programming Assignment 5 – Data Preparations and Statistics

Introduction

The file cps.csv (attached) contains school profile information for Public Schools.
Your program will derive some data from it and then generate some statistical information.

Requirements
You are to create a program in Python that performs the following:

1. Loads the cps.csv file (assume it's in the current directory) and create a DataFrame object from it.
2. Based on the data contained in the cps.csv file, generates a dataframe with the following information:

a. School_ID
b. Short_Name
c. Is_High_School
d. Zip
e. Student_Count_Total
f. College_Enrollment_Rate_School
g. Lowest Grade Offered (derived from Grades_Offered_All column)
h. Highest Grade Offered (derived from Grades_Offered_All column)
i. Starting Hour (derived from School_Hours column)

The values for a-f are based on existing columns in the data. For g-i, you will need to generate new columns which derives information from existing ones.
Replace the missing numeric values with the mean for that column.
Display the first 10 rows of this dataframe.

3. Displays the following information:
a. Mean and standard deviation of College Enrollment Rate for High Schools
b. Mean and standard deviation of Student_Count_Total for non-High Schools
c. Distribution of starting hours for all schools
d. Number of schools outside of the Loop Neighborhood (i.e., outside of zip codes 60601, 60602, 60603, 60604, 60605, 60606, 60607, and 60616)

Additional Requirements
1. The name of your source code file should be DataStats.py.
All your code should be within a single file.
2. You need to use the pandas DataFrame object for storing data.
3. Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments.
4. You need to use meaningful identifier names that conform to standard naming conventions.
5. At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

import pandas as pd

# Reading data file
data = pd.read_csv("cps.csv")

# Setting row index
data = data.set_index("School_ID")


# Utility method to clear start hours.
def clean_start_hours(x):
    return ''.join(ch if ch.isdigit() else ':' for ch in x)


# Utility method to extract the starting hours.
def find_starting_hour(x):
    if not pd.isna(x):
       str_splitted = str(clean_start_hours(x)).split(":")
       str_splitted = [s for s in str_splitted i...

By purchasing this solution you'll be able to access the following files:
DataStats.py.

$63.00
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats