代写数据结构中Hash的几种不同实现算法,并比较算法优劣。
Requirement
In this lab we will investigate how a hash function affects the structure of a
separate chaining hash table. You will use a Java program that produces an GUI
in which you can use various hash functions with different parameters, input
text to be hashed, and watch the hash table grow, and finally read off various
statistics about the table.
Exercise Zero
Download the code distribution here (ZIP) and get it to run in a folder of
your chosing. Please go to the following web site to get a text version of
Dicken’s A Tale of Two Cities:
http://www.gutenberg.org/cache/epub/98/pg98.txt
Now you should run the StringHash.java program, and cut and paste a couple of
thousand words from the novel into the Input Box in String Hash. Push Hash and
watch what happens. You will see a simulation of the hash function growing in
Hash Table, a histogram of the bucket lengths, and below these a printout of
some statistics.
There is nothing else to do for this problem except to make sure you know what
the program does and how to run it.
Exercise One
Now you will try a couple of the hash functions we have written into the
program, they are as follows, where M is the size of the table (number of
buckets); all are functions from strings to integers in the range [0 .. M-1]:
- Silly: Take the ASCII value of the last character and then do % M;
- Naive Add: Add all the ASCII values of the characters and then % M;
- Naive Mult: Multiply all the ASCII values of the characters and then % M;
- Add Lin Cong: Add all the ASCII values to get sum, then return (LCMultiplier*sum) % M
- Sequential LC: Define an array of prime numbers, P[…]; multiply each of the ASCII values by a different prime, sum and then return (LCMultiplier*sum) % M
- Folding, Java HashCode, Seq Java HashCode: These are all “industrial strength” hash functions for string which may be found on the web; all are supposed to be very high performance, and spread the keys out as randomly as possible.
Roughly, the order of functions here is from worst to best.
What to do for this problem: Cut and paste a couple of thousand words from A
Tale of Two Cities into the input box, and try each of the hash functions,
without changing any of the parameters (Table Size or LC Multiplier). Be sure
to examine the shape of the hash table in the Hash Table box, and the
histogram, and then look at the statistics. Focus on the the mean lookup cost
– this is the average number of comparisons to find each key in the table
(same as we have been thinking about all semester!); optimal lookup cost –
this is the average number of comparisons to find a key, in an optimal table
where all buckets were the same size (+/- 1); standard deviation – this is the
spread of the bucket lengths around the mean – closer to 0 means the table is
closer to optimal; histogram – just a picture of the distribution of the
bucket lengths – optimal is very few buckets lengths, centered around the
mean.
The most important thing to look at is how close the mean lookup cost comes to
the optimal lookup cost, and how small the standard deviation is. The table is
most balanced when the mean lookup is closest to the optimal lookup, when the
standard deviation is smallest, and the histogram shows this graphicallly by
clustering the bucket lengths more closely around the mean.
Now answer the following question: - a. Which hash functions performed better?
- b. Did you need the “industrial-strength” hash functions near the end of the list?
- c. Which was the simplest (i.e., earliest in the list) hash function to do reasonably well?
Exercise Two
Now set the hash function to Add Lin Cong, a method we discussed in lecture.
Here is the code:
// do naiveAdd, then linear congruential
// M and LCM… will be extracted from the appropriate text boxes
static int AddLinCong(String s, int M, int LCMultiplier) {
int sum = 0;
for (int i = 0; i < s.length(); i++)
sum += (s.charAt(i) * LCMultiplier) % M;
return sum % M;
}
—|—
Now we want you to play with the table size M and the LCMultiplier and see how
sensitive the input data is to these choices.
- (i) Try M = 144 and LCMultiplier = 2.
- (ii) Now try M = 144 and LCMultiplier = 12.
- (iii) Finally, try two prime numbers from the list at the bottom of this lab.
For each of these, answer the following questions: - a. What do you observe?
- b. Why did this happen?
Exercise Three
Now we are going to try to “break” these hash functions. Set M = 11,
LCMultiplier = 3, and clear the Input Box.
- (i) First, find a sequence of words which perform badly, i.e., create an imbalanced table (Hint: try various permutations of the same letters, e.g., “wayne” and “enyaw”). Think about what statistics would indicate that this is a worst case…..
- (ii) Now try this same sequence of words with a different M and LCMultiplier from the list of primes, but still use Add Lin Cong.
- (iii) Now try this same sequence of words with the other, “industrial strength” hash functions.
For each of these, answer the following questions: - a. What do you observe?
- b. Why did this happen? (Just answer what? for part C.)
What to Submit
Please submit your answers to the italicized questions as a file Lab12.txt as
part of your submission for HW 11.