QuestionQuestion

1 Import each of the csv files (3) into a pandas DataFrame.
(a) Provide the code you used to do this. (Just the code for reading one of the files will suffice to show how you did it. Correctly, of course.)
(b) Print out the column names of your item DataFrame and the first four (4) records in it.
(c) Describe the data types of the columns in each of your DataFrames. Hint: Provide a count of each data type in the columns.
Include your commented code for each of the above.

2 Write each of you pandas DataFrames to a local SQLite DB named xyz.db. Include only data for active buyers in these tables. Verify that you have written the tables to your SQLite DB correctly. (Commented code, of course.) Name your tables item, customer, and mail.

3 Now, using the same data that you imported in 2, above, create a new DataFrame called custSum that you also write as a table to xyz.db, and that has the following characteristics. This DataFrame should have one row per customer record.
(a) Include on each customer's record a binary, Y or N, indicator of whether the customer is a 'heavy buyer,' where the definition of a 'heavy buyer' is a customer whose YTD purchasing in 2009 is greater than 90% of the 2009 YTD purchasing of all customers who are active buyers. Verify your coding of this new variable by crosstabulating it with an indicator of whether their 2009 YTD purchasing is at at least the 90th percentile of all 2009 YTD purchasing.
(b) Add to each customer's record whether the customer has the following credit cards: AMEX, Discover, VISA, and Mastercard, with each credit card variable codes as a Y or a N for yes or no, respectively. Document your creation of these codes by showing how they are related to the code values in the data
(c) Add to each customer's record their estimated HH income, and the genders of adults "1" and "2."
(d) Add to each customer's record their ZIP code and ZIP+4 code.
(e) Be sure to include the account number on each customer's record.
(f) Provide a count of the number of records in this DataFrame.
(g) Write cumsum as a SQL table to xyz.db.
(g) Verify that you have written this DataFrame to xyz.db correctly.
(Don't forget to comment your code so that a reader can understand what it is supposed to do.)

4 Create a new pandas DataFrame of data that will be used for target marketing and write it out to a headered csv file.
(a) This DataFrame should have one row per customer. The customers included should be active buyers or lapsed buyers.
(b) The row for each customer should include the customer's account identifier, and an indicator variable (Y/N, or 1/0) for each product category the customer has made at least one purchase in.
(c) Include for each customer their buyer status, and the total dollar amount of the purchases they have made from XYZ using all data available for him or her.
(d) Write your DataFrame to a csv file that has a header record of the column names, and also store it in a shelve database.
(e) Verify that the files you wrote your customer DataFrame to were written correctly.
(Commented code, of course.)

5 Report the three (3) most frequently purchased product categories by the gender of "adult 1" using only the data for the active customers. Include for these categories the total spend in dollars on each category, the total number of products purchased in these categories, and the number of adults in each gender category.
(Be sure to comment your code.)

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

# ## Import each of the csv files (3) into a pandas DataFrame

# In[134]:

mail_df = pd.read_csv('mail.csv')
item_df = pd.read_csv('item.csv')
customer_df = pd.read_csv('customer.csv')

# ### Print out the column names of your item DataFrame and the first four (4) records in it.

# In[11]:

mail_df.iloc[:5,:]

# In[12]:

item_df.iloc[:5,:]

# In[13]:

customer_df.iloc[:5,:]

# ### Describe the data types of the columns in each of your DataFrames

# In[136]:

list(item_df.columns.values)

# In[137]:

list(item_df.dtypes)

# In[139]:

# Helper function to convert pandas dtype to SQLite data type
def get_db_type(dtype):
   
    if dtype.__str__() == "object":
      
       return "TEXT"
   
    elif dtype.__str__() == "int64":
      
       return "INTEGER"
   
    elif dtype.__str__() == "float64":
      
       return "REAL"
   
    else:
   
       return "NULL"


# In[140]:

# Writing data to sqlite db helper functions

def create_db(db_name):

    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    return conn, cursor...

By purchasing this solution you'll be able to access the following files:
Solution.zip.

$15.00
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats