Accomplish the following tasks. The final output is two graphs – one each for step 5 and step 6.
• Steps 1 through 4 are sequential and each build on the prior step. Steps 5 and 6 each build on Step 4, but are separate from each other.
For some of the steps, you’ll have to do some thinking to figure out how to accomplish them. This assignment requires more thinking than the prior ones.
1. For each year, load and filter the data into a merged DataFrame with the following steps. (You can run a for loop for each of the five years, and do these operations inside the for-loop. You can even make a List of DataFrames in the loop using .append(), and then use that list for step 2.)
a. Read-in the base 5500 data.
b. Filter the data (these filters can be done in any order)
i. Remove non-unique filings from the base 5500 data. Specifically, group by SPONS_DFE_EIN and SPONS_DFE_PN and take the one with the last filing date. (SPONS_DFE_EIN and SPONS_DFE_PN are the ID number of the plan sponsor and then plan number for each plan that the given sponsor is responsible for. Together they uniquely identify each plan.)
ii. Keep only filings that include the strings “1A”,”1B”, or both in their TYPE_PENSION_BNFT_CODE field. (You may want to use the .str.find(…) function to help. Also, make sure you have the “and/or” logic correct when thinking about 1A or 1B.)
c. Read in the schedule MB data.
d. Merge together the base 5500 data with its corresponding 5500 schedule H data. Use the ACK_ID field as the merge key. (ACK_ID is unique to each filing. For a given plan, the base form and all the schedules are filed together.) Make sure to use the correct merge type.
2. Concatenate together all the years into one DataFrame. (Most plans that remain should have one filing in each of the five years of data.)
3. Filter the new DataFrame and keep only the rows where both MB_AST_FNDNG_STD_AMT (assets) and MB_ACCR_LIAB_GAIN_MTHD_AMT (liabilities) are positive. (The bad data here aren’t negative values, but zero or missing values.)
4. Create a new column, YEAR, based on the year in the FORM_PLAN_YEAR_BEGIN_DATE field. (One trick to get access to the “year” field of that column is to first convert that column to a DatetimeIndex. You could instead use the apply function.)
5. Total Fundedness by year:
a. Create a new DataFrame that has total values by year (i.e. just five rows). To do that, by year sum the MB_AST_FNDNG_STD_AMT and MB_ACCR_LIAB_GAIN_MTHD_AMT fields to get the total assets and total liabilities for each year. (To be clear, each year should have a total for assets and a total for liabilities.)
b. Then divide the assets by liabilities for each year.
c. Create a figure with two-subfigures.
i. A plot with two lines (total assets and total liabilities), with YEAR on the x-axis.
ii. A plot with one line – total fundedness by year.
6. Categories by year:
a. Compute fundedness of each plan each year by dividing MB_AST_FNDNG_STD_AMT by MB_ACCR_LIAB_GAIN_MTHD_AMT.
b. For each year, compute the number of plans that are:
i. less than 60% funded
ii. between 60% and 80% funded
iii. greater than 80% funded
c. Use these counts to create a stacked bar chart with year on the x-axis and the number of plans on the y-axis.
Submit your code as a Python file.
These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns
import pandas as pd
base_path = "5500_base"
mb_path = "5500_mb"
# list of base files
base_files = os.listdir(base_path)
# list of mb files
mb_files = os.listdir(mb_path)
""" Step 1 """
df =  # List of DataFrames
for base_file, mb_file in zip(base_files, mb_files): # iterate over all 5 years
# load base data.
base = pd.read_csv(os.path.join(base_path, base_file))
# filter the data
base.drop_duplicates(subset = ["SPONS_DFE_EIN", "SPONS_DFE_PN"], inplace = True) # drop non-unique filings
#base.drop_duplicates(inplace = True) # drop non-unique filings
base = base.sort_values("DATE_RECEIVED", ascending=False).groupby(by= ["SPONS_DFE_EIN", "SPONS_DFE_PN"]).last()...
By purchasing this solution you'll be able to access the following files: