QuestionQuestion

This assignment provides further insight into the implementation of a database engine by implementing a very small, simple database engine to use on top of CSVFiles. On a much smaller scale, we want you to build something similar to SQL. Think about what information SQL requires (Metadata, table names, column names, primary keys...) and how SQL leverages the data that it contains to make data processing faster and better (column type constraints, indexing, query optimization...)
There are two parts to thisproject. The first MUST be completed before the second.
1.Build a simple database catalog (a table that contains the metadata of all tables in a schema)
2.Build a query engine that uses the data of the catalog
BreakdownThe zipfolder contains thefollowing files:
•● Appearances.csv
•● Batting.csv
•● People.csv
•● CSVTable.py
•● CSVCatalog.py
•● Unit_test_catalog.py (empty)
•● Unit_test_csv_table.py (empty)
Part 1 The first step is to complete the catalog. Think of the catalog as defining and containing all of the relevant information about the table, columns, and indexes along the lines of: The tables we want you to use are the included csv files (appearances, batting, people)
ex_table_name    ex_column_name      ex_primary_key    ex_path

This information can be stored in a table(s) in sql and accessed as necessary or stored in a csv file(s) that you create.
The Catalog defines 3 types:
1.A TableDefinitionrepresents metadata information about aCSVDataTable. The database engine will maintain the data in a CSV file.The catalog contains information about:
1.The path/file name for the data.
2.Column names, types and whether or not NULL is an allowed value. The column names defined via the catalog API are a subset of column headers in the underlying CSV file.
3.Columns that comprise the primary key.
4.A set of one or more index definitions. An index definition has a name, type of index (PRIMARY, UNIQUE, INDEX) and columns that comprise the index value. 2.ColumnDefinition: A class defining a column.
3.IndexDefinition: A class defining an index.
The catalog supports defining a newtable, dropping an existing table definition, loading a previous defined table definition, adding and removing columns and indices from a table definition. These methods should be tested in the Unit_test_catalog.py file using the tables we included.
Part 2
The second step is to complete the CSVTable which leverages information from the catalog. There are only two methods you must write in this file, execute_smart_join() and dumb_join().
Dumb_join looks at each row in the left table and iteratively says “Does this row match this row in the right table?” If it does, let’s make a new entry appending the two dictionaries together. If it does not, keep looking! After aggregating the two tables, dumb_join applies the where_template to the data and returns the output. A dumb join of people.csv and batting.csv on playerID should take 30 minutes.
execute_smart_join() designates a scan table and a probe table (HINT: leverage a method I’ve included), finds the sub_where_template for each table (push down the where clause if applicable) and leverages the index to join the tables together. An optimized join of people.csv and batting.csv on playerID with a where clause should take under a minute.
Show testing and comment how long it took in the unit_test_csv_table.py file. Include screenshots if necessary

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

'''
1. run q definition
2. Class Column Definition
3. Class Index Definition
4. Class Table Definition
5. CSV Catalog Definition
'''


def run_q(cnx, q, args, fetch=False):
    '''If you decide to store the metadata in sql this will be useful'''
    cursor = cnx.cursor()
    cursor.execute(q, args)
    if fetch:
       result = cursor.fetchall()
    else:
       result = None
    cnx.commit()
    return result


class ColumnDefinition:

    #example of what column types are acceptable
    column_types = ("text", "number")

    def __init__(self, column_name, column_type="text", not_null=False):
       """
       :param column_name: Cannot be None.
       :param column_type: Must be one of valid column_types.
       :param not_null: True or False
       """

       if column_name is None:
            raise ValueError('Column name necessary')

       else:
            self.column_name = column_name

       if column_type in self.column_types:
            self.column_type = column_type

       else:
            raise ValueError('That column type is not accepted. Please try again.')

       if not_null == False:
            self.not_null = not_null
       elif not_null == True:
            self.not_null = not_null
       else:
            raise ValueError('should be T/F')

    def __str__(self):
       # returns the table object in json format
       return json.dumps(self.to_json(), indent=2)

    def to_json(self):
       """
       :return: A JSON object, not a string, representing the column and it's properties.
       """
       result = {
            "column_name": self.column_name,
            "column_type": self.column_type,
            "not_null": self.not_null
       }
       return result


class IndexDefinition:
    """
    Represents the definition of an index.
    """
    index_types = ("PRIMARY", "UNIQUE", "INDEX")

    def __init__(self, index_name: object, index_type: object, column_names: object) -> object:
       """
       :param index_name: Name for index. Must be unique name for table.
       :param index_type: Valid index type.
       :param column_names: list of column names assc with that index
       """
       #YOUR CODE GOES HERE#
       if index_type in self.index_types:
            self.index_type = index_type
       else:
            raise ValueError('That index type is not accepted. Please try again.')

       self.index_name = index_name
       self.column_names = column_names...
$100.00 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Database Development Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats