代写数据结构中的Hash Table,通过不同的实现方式,对Hash碰撞进行分析。
Overview
For this assignment, you will implement a hash table and experiment using
various hashing methods to try and minimize collisions. To simplify
implementation, you will be using separate chaining to resolve collisions.
Additionally, you will generate statistics on the performance of your hash
table, which will be written to a file.
Part 1 - Hash Table
In this part, you will implement a basic Hash Table. We have provided you with
an interface and starter code that implements that interface. Fill in the
blanks to complete given methods.
Implementation Details
You will be implementing the IHashTable interface, as additional required
private helper methods. See method descriptions for details. The benefits of a
hash table are a result of the ability to hash a value and then immediately
locate its position if there are no collisions, or vastly narrow down the
search space if there are. As such, you shouldn’t need to search through the
entire table to find an element for your contains() and delete() methods.
Your hash table will be used exclusively to store strings, as this is the type
of hash table that would be used to store a dictionary of words for uses such
as spell check.
To resolve collisions, use will be using separate chaining. One way of
implementing this would be to have your backing storage be an array of
LinkedLists. Then you would simply add to the LinkedList at the specified
index when inserting elements into your hash table. The performance of hash
tables degrades if there are too many collisions. As such, you will keep track
of the load factor of the table (the ratio of the size of the table to the
number of elements) and once this value exceeds , you must double the size of
the table and rehash all of the values. You will be implementing a private
helper method rehash() to accomplish this.
You will be implementing three different hash functions. You must implement a
rolling hash using Horner’s Method, as well as two other hash functions of
your choosing. A list of possible hash functions are provided in the starter
files, but feel free to use other hash functions as well. Remember, at the
minimum your hash functions must be deterministic, although your goal is for
them to be uniform and fast. You will later compare the performance of the
hash functions and record your results in HW5.txt.
You must actually implement the algorithms in your HashTable class. If you use
String.hashCode(), you not receive any credit for this assignment.
Hash Table Methods
public boolean insert(String value)
Insert the string value into the hash table, returning true if the value was
inserted and false if the value was already present.
Throws:
NullPointerException if value is null
public boolean delete(String value)
Delete the given value from the hash table, returning true if the value was
deleted and false if the value could not be found.
Throws:
NullPointerException if value is null
public boolean contains(String value)
Check if the given value is present in the hash table, returning true if it is
and false otherwise.
Throws:
NullPointerException if value is null
public void printTable()
Print the contents of the hash table, printing nothing if the table is empty.
See appendix for example output.
public int getSize()
Accessor for the number of elements currently stored in the hash table.
private void rehash()
Expand the size of the hash table and rehash the elements. Called when the
load factor passes a threshold.
Keeping Statistics
You should keep track and report the following information for the hash table
- The number of times you had to expand the table.
- The load factor in the table (before resizing), trimmed to 2 decimal places.
- The number of insertions that encountered a collision (before resizing).
- The length of the longest known collision chain (before resizing).
Note that all but the first of these statistics will need to be reinitialized
(reset) whenever you expand the table.
Write it out to a text file in the following format r, c, n are integers,
alpha is floating point number), each time you do a resize and rehash. Append
a new set of values for each rehash.
r resizes, load factor alpha, c collisions, n longest chain
Notice the two constructors in HashTable class in the starter code provided.
One takes in the initial size(anyvalue > 0
), while the other constructor
takes in a filename as well as a size. The boolean ‘printStats’ is set to
false by default, and is only changed to true in the constructor that receives
a filename. Each time you expand and rehash your table, you must check if this
boolean is set to true, and if yes, you must write the current statistics to
the file before rehashing.
For example, after, say, 4 rehashes, the file will have the following lines: - 1 resizes, load factor 0.67, 1 collisions, 2 longest chain
- 2 resizes, load factor 0.67, 7 collisions, 2 longest chain
- 3 resizes, load factor 0.75, 55 collisions, 6 longest chain
- 4 resizes, load factor 0.68, 221 collisions, 14 longest chain
There is no ‘correct statistics’ and your numbers will change depending on
what hash function and initial table size you chose. These values will only
indicate how good your chosen hash function is.
HashTable Tester
Your tester must test the insert(), contains(), delete() and getSize() methods
thoroughly. Note that printTable() is only present for you to test your hash
table during implementation, and as such does not need to be tested for an
exact output string. However, this method must still be implemented. See the
appendix for sample formatting.
You can test the 2-arg constructor by passing a local file path and making
sure that the stats are printed to the file. However, when you submit your
tester, make sure you remove/comment this test as your submitted file must not
have any local file path.
Hash Analysis
Remember that a good hash function is fast, deterministic, and uniform. For
the final part of this assignment, you will be evaluating your hash functions
on these categories.
A good way to test the relative performance of your hashing methods is to
build a dictionary with the provided files. To do this, you would read the
provided longDictionary.txt or simpleDictionary.txt and treat each line read
as a single word to insert into your table. Make sure that you specify a file
to record your statistics to so that you have concrete data to reference with
your explanations.
For each Hash function, describe the performance that it displayed. To assess
whether or not it is uniform, consider the following questions:
- How effective was it at preventing collisions?
- When collisions occurred, did it have have many short chains or a few long ones?
Keep in mind that for a hash function to be useable, it must be deterministic.
Mention briefly why each function satisfies this property.
Finally, describe the runtime performance for calculating the hash. A good way
to do this would be to use the same dictionary file as before, but instead of
building the complete table, just record the time it takes to hash the values.
To do this, call the protected hash() function with each of the enum values.
Record your findings in HW5.txt
Extra Credit - BST in C
You will be implementing a Binary Search Tree in C. You have been provided
with started code in BinaryTree.c. Input/Output for this assignment will be
handled through the command line. The values inserted into your BST will be
strings of max length 100. Your BST will have the ability to insert, search,
and print the entire tree in order. See the appendix for specific help with C.
Function Descriptions
int compare(char a, char b)
Mimics the functionality of strcmp()
Returns:
0 if both strings are equal
Negative if the first non-matching character has a lower ascii value in a than
b
Positive if the first non-matching character has a higher ascii value in a
than b
DO NOT use strcmp() in the implementation of compare
void insert()
Adds a node with the input value to the tree. Smaller strings are inserted on
the left and larger on the right. If the element is a duplicate, do nothing.
You will need to use malloc to allocate memory for each new node that is
inserted. The parameter to malloc() is the size in bytes; just passing the
number of elements is not correct.
void find()
Search the tree for a node containing the string the user inputs as its value.
Prints “Found: “ node value value (using printNode()) if the node is present.
Otherwise prints “Not found: “ followed by user input.
See sample output for examples
void traverse()
Traverse will output an inorder traversal of the values in the tree nodes. Use
printNode to print the values of the nodes. If the tree is empty nothing will
be printed.
See sample output for examples.
You may choose to implement the functionalities of the binary tree in any
order but the following is recommended:
- compare():
You will need a strong understanding of how strings are represented in C to
complete this part of the assignment. Consult your notes/online resources. You
cannot use strcmp() in your implementation, however you can check your output
against strcmp(). - insert():
You should be familiar with how nodes are inserted into a BST. The tricky
thing about this portion is allocating memory for new nodes using malloc. Make
sure you understand how malloc works before you attempt to implement insert! - find():
Again something you all should be familiar with. No tricks here. - traverse():
This is a simple in order traversal of your BST. This can be accomplished
iteratively or recursively. You are free to use helper methods here.
Your output must match the Sample Output!
Important Information
Do not edit any code marked “DO NOT EDIT”
You may edit main for the purposes of testing but when you turn in your code
it must match the starter code. You may use helper methods if you see fit, but
be aware - in C methods must be declared before they are used.