Dictionaries give us an enriched way to store values by much more than just sequential indexes (as lists gave us); we identify key-value pairs, and treat keys like indexes of various other types. Though the keys are unordered, dictionaries help us simplify many tasks by keeping those key-value associations. Each key can only be paired with one value at a time in a dictionary.
When a file contains ASCII text in it, we can readily write programs to open the file and compute with its contents. It turns out that reading and writing text files gives our programs far more longevity than open-to-quit; we can store data and results for later, save user preferences, and all sorts of things. We will be reading text files that happen to be in the CSV format.
Scenario
We've got some data about registered baby names in certain years in a comma-separated-values file; only read_file needs to interact with files, and all others will use our required structure to describe the names, genders, and counts per year.
CSV file: This is our name for a file containing ascii text where each line in the file represents one record of information; each piece of info is surrounded by double quotes, and each of these quoted things is separated by a single comma. The very first line is the "header" row, which names the columns but is not part of the data.
Here is a very small sample file that can be used in our project. Note: a file's extension has no actual effect on its contents. These are ascii files, so you can edit them with your code editor just as easily as a .txt or .py file.
Popularity: we refer to a dictionary whose keys are years, and whose values are tuples of (count, rank), as a popularity record. We use this representation to store information about a particular name for multiple years.
Note â€“ though a record belongs to a particular name (as the example below is the record for name â€˜DANIELâ€™), notice we don't see it here â€“ that's because it will be represented elsewhere in a database, and duplicated
information is rarely a good idea in a database.
Database: a "database" of names can store multiple names from multiple years. Our database is a dictionary whose keys are tuples of (name, gender), and whose values are popularity records in the form of dictionary as defined above.
We either call a database ranked, where all ranks have been correctly filled in, or unranked, where ranks are either None or no longer correct due to an addition.
Functions dealing with unranked databases
These functions either work on, or create, unranked databases.
â€¢ read_file(filename): This is the *only* function that needs to deal with reading a file. It will accept the file name as a string, and assumes it is a CSV file as described above (with our name data in the same format as the example, but with any number of rows after the header row). It will open the file, read all the name entries, and correctly create the unranked database.
â€¢ Return the resulting database.
â€¢ Set all rankings to None.
â€¢ Hint: How can you break this task down into multiple phases, each one taking a pass over the data and making something slightly more useful towards getting the result? How can you use any functions that you have to write, or could write?
â€¢ if you get stuck on this one, you can still attempt others â€“ just test with the tester or with manually built database values.
â€¢ add_name (db, name, gender, year, count): This function accepts an existing database db, name, gender, and year, and then it updates the database to include that information.
â€¢ Return None.
â€¢ the rank for that entry is set to be None.
â€¢ The resulting database is considered unranked, as the addition means the rankings aren't up-to-date.
It's acceptable that there are some None ranks and some numeric ranks.
â€¢ find_all_years (db) : This function accepts an existing database db, finds all years included in it anywhere. Return the list of years in ascending order
â€¢ new_names(db, gender, old_year, new_year): accepts a database db, gender, and a pair of years. It searches in the database for names of that gender that do not make the top list for old_year, but do show up in the list for new_year. Return the qualified names as a list of strings, alphabetically sorted.
Functions dealing with ranked databases
These functions either make a database become ranked, or they rely upon ranked databases. If a function expects to receive a ranked database, you are not required to handle receiving an unranked database in any particular way â€“ that behavior is undefined and won't be tested. You're welcome to still try to handle it if you want.
â€¢ rank_names_for_one_year (db, year) : This function accepts an existing database and a year. It calculates the ranking of names according to their counts and updates that information into the database.
Rank boysâ€™ and girlsâ€™ names separately. The most popular name for each gender (with the highest count) gets a rank value of 1.
â€¢ This function should return None.
â€¢ rank_names (db) : This function accepts an existing database and ranks all names for all years of data present, making the database become ranked.
â€¢ This function should return None.
â€¢ rank male and female names separately.
â€¢ If there is a tie in counts, assign all names with the same rank and make sure the next rank is adjusted accordingly. Given counts of A:10, B:5, C:5, D:5, E:1, they'd get rankings of A=1, B=2, C=2, D=2, E=5.
â€¢ Hint: use previous functions!
â€¢ popularity_by_name(rdb, name, gender): accepts a ranked database, name, and gender. It finds the popularity counts for all years included in the db for name, assemble them in a list of tuples (year, rank), and return the list. If db has no records for name, return []. Sort multiple yearsâ€™ records (tuples) by year.
â€¢ popularity_by_year(rdb, gender, year, top=10): accepts a ranked database, gender, year, and top.
It finds for the specified year, the top popular names and returns them in a list of tuples (rank, name).
Sort multiple tuples in your return list by rank most common first). You can assume top is always a positive integer. If top is not provided, use default value and report the top 10 popular names. If top is larger than the number of stored names for that year, report all names in the right order.
â€¢ always_popular_names(rdb, gender, years=None, top=10): accepts a ranked database, gender, a list of years and a top threshold. It searches in the database for (name, gender) records that are always ranked within (and including) top for all the years in the given list and return them as a list of strings.
â€¢ If years==None, use all years present anywhere in the database.
â€¢ If top is not provided, default to 10.
â€¢ in the answer, sort the names alphabetically.
â€¢ increasing_rank_names(rdb, gender, old_year, new_year): accepts a ranked database, gender, and two years. It searches in the database for names whose rank gets promoted from old_year to new_year, and returns them as a list of strings. If there are multiple names, sort them alphabetically. If a name does not get ranked for a year, do not include that name in the return list.

Dictionaries give us an enriched way to store values by much more t...