代写数据库中的动态Hash的实现,为后续的数据库设计做准备。
Overview
In our first programming assignment, you created a binary file of uniformly-
sized records. Querying the content of such files can be made more efficient
by using an index.
In class we talked about Dynamic Hashing indices under the assumption of a
binary (2-way) tree. If your situation permits - and ours does - the use of an
n-way tree can produce a shallower in-memory tree index and thus faster
searches. Further, we can create indices by reading the content of the field
being indexed in reverse order. Doing so can further improve querying
performance.
Assignment
Write a complete, well-documented Java program that creates a Dynamic Hash
index for the binary file of taxi trips that your PartA.java program created,
and uses it to satisfy a simple type of query. In particular, you are to index
on the Trip Total field, using the digits in reversed (right to left) order,
ignoring the decimal points and the dollar signs.
Recall that a dynamic hash structure consists of a bucket directory tree and a
collection of buckets. In this assignment, the key field of the data consists
of just numeric characters. Your directory tree should be structured with up
to one level per digit in the key, and 10 tree pointers and 10 hash file
“pointers” (really just byte offsets into the binary file of hash buckets) per
node. The following picture should help:
At the beginning, your tree should consist of just the root node with pointers
to 10 empty buckets in the hash bucket file. As buckets are filled based on
the right-most digit of the key of the record being inserted, new leaf nodes
will be added to the tree and buckets will need to be split. To add new
buckets to the hash file, all you need to do is append the new buckets to the
end of the file (and, optionally, reuse the old bucket, to save space). For
this program, use hash buckets that can hold 1,000 index records each. I
recommend that each bucket be based on a data structure that consists of (a) a
count of how many of the 1,000 slots are filled, and (b) an array of slots.
The hash file will be just a binary file of these bucket structures.
Once the dynamic hashing structures (the in-memory tree and the on-disk buck
file) are created, the program is to process a simple variety of query using
your dynamic hashing index. Prompt the user to enter a sequence of a Trip
Total suffixes (e.g., a Trip Total of $6.70 has potential suffixes of 0, 70,
670, 0670, etc.). Your program should display the Trip ID and Trip Total
fields associated with the given suffix and the total number of records that
were printed. Sequentially scanning the binary file to find the matching
records is not acceptable the querying must use your index. Display the Trip
ID and Trip Total fields using the same output format you used for Program #1.
Data
The binary file to be indexed is to be the one your PartA.java program from
Program #1 creates. If you need to make changes to PartA.java to create a
correct binary file for this program to index, be sure to submit your updated
PartA.java in addition to this program.
The queries will consist of suffixes as described above. Suffixes of no digits
and of a ridiculously large quantity of digits should all be handled
reasonably.
As for dreaming up queries for testing, that’s up to you. We’ll test with a
variety of query strings, of course, to make sure that your code is operating
correctly and robustly. Thus, so should you.
Output
Apart from the various prompts to the user, the only displayed output from
your program should be the query results. After the list of records, display a
count of the number of records that match each query: “# records matched your
query.”
Hand In
You are required to submit your completed program file(s) using the turnin
facility on lectura.
The submission folder is cs460p2. Instructions are available from the document
of submission instructions linked to the class web page. Because we will be
grading your program on lectura, it needs to run on lectura. This means that
you need to test it on lectura. Name your main program source file Prog2.java
so that we don’t have to guess which files to compile, but feel free to split
up your code over additional files if you feel that doing so is appropriate.
Submit each file as-is; that is, do not ‘package’ them into ZIP or TAR files.
Want to Learn More?
- There aren’t many examples of our version of Dynamic Hashing available. I refer to this type of external hashing as “Dynamic Hashing” to distinguish it from Extendible Hashing, which is more well-known. But, really, Extendible Hashing is an example of the general concept of dynamic hashing, too.
Other Requirements and Hints:
- It is possible that a bucket size of 1,000 records might prove to be insufficient, given that the data is far from uniformly distributed (many Trip Totals end in “5”, for example). If so, use Piazza to work with your classmates to determine an adequate (but not wastefully large!) bucket capacity, and use it.
- Work on just a part of the program at a time; don’t try to code it all before you test any of it. Don’t be afraid to do things a little bit backwards; for example, it’s nice to have a crude query algorithm in place to help you test the construction of your index.
- You can make debugging easier by using only a small amount of carefully selected data - and very small buckets - as you develop the code. Switch to the complete data file and full-size buckets when everything seems to be working.
- Repeating one of my standard pieces of advice: Comment your code according to the style guidelines as you write the code (not in the last few hours before the due date and time!). Doing this makes writing documentation far less tedious.
- As always: Start early! There’s plenty to do here.