Implement a hashtable that uses either chaining or open addressing to resolve collisions, your choice, as is the programming language. This hashtable should simply store words as both key and data (essentially there is no data just keys). Duplicate words should not be added. Your task is to experiment with the performance of various hashing functions on the following operation: Counting the number of unique words in an input file.

The way this operation is done is as follows. Create an initially empty hashtable H. Open the input file. For every word w in the input file, if w is already in H (found via a look-up in the hashtable), do nothing. If it is not, add w to the hashtable (via an insertion). After processing the file, simply return the number of words in the hashtable.
A word in a file is defined as a sequence of continuous alphabetic characters (case insensitive - convert to upper-case). So the following line: "I emailed sk8er Joe about the meeting at 7:30 -notAT 8:30." has 10 unique words:


Note that run, runs, running are all different words. That is, you don’t need to compute the stem of a word.
You should grab text files of different lengths including sizeable ones with several thousand (or more) words. I suggest looking at news articles across the web as well as some data corpi.

Performance Testing
Testing your hashing performance is fairly easy. Just run the unique-words operation on different sized files (at least 20 of them) and time how long it takes for the different hashing code schemes. Then do an analysis comparing them in a manner similar to the last programming assignment.
To help aid in keeping everything else constant, be sure the only thing different is the hashing function used. In particular, keep the load factor fixed, at say 0.5 or 0.75, and the method of collision resolution constant. When using a probabilistic hashing function, that is, where a random value a is chosen at the start, I recommend testing the performance on the same file multiple times and taking the average so that one bad random value does not ruin the overall picture of the performance.

Hashing Functions
There are various functions you can choose. Each function should of course work on a given input string as the key. I want you to try (at least) five different functions including:
1. Bad: The sum of the ASCII values of each character in the string.
2. Bad: The product of the ASCII values of each character in the string.
3. Two other functions of your own choosing (and research).
Remember to do all of your calculations using mod N where N is the size of the table. This will prevent numbers from growing too large during your function evaluation!

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

public class HashTable {
    private Node [] buckets;
    private int m;
    * constructor
    * @param size
    public HashTable(int size) {
       buckets = new Node[size];
       m = size;
    * insert into buckets
    * @param data
    public void insert(String data){
       // turn to upper case
       String uppercase = data.toUpperCase();
       // get hashed key
       int h = hash(data);
       // the bucket is null
       if (buckets[h] == null) {
            buckets[h] = new Node(h, data, null);
       } else {...
$30.00 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Java Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats