Introduction
作业分四部分,需要实现 Huffman coding ,当然也就是构造Huffman Tree了。
通常的Huffman作业不会这么复杂,这次这么复杂是因为增加了许多工程性的功能,数据格式也是五花八门。
这次主要是搭建环境,熟悉整个项目框架结构。
This is a programming assignment. Get started early! Unanticipated problems
will arise in your work, and you will need time to sort them out.
Clarifying questions can be posted on the CMPT 115 Moodle forums. Help with
debugging can be obtained by working in the lab during Help Desk hours of your
instructors and TAs. Expect high demand for this kind of help on Fridays.
All C++ source code files must contain your name, student number, and NSID;
and the course, assignment number, and question number as comments at the top
of the file. All C++ programs must compile with g++ on the command line
without errors or warnings, using the standard flags described in tutorial,
namely -Wall -pedantic.
You may, of course, work on your assignment using Eclipse, or any other C++
IDE, on your personal notebooks or desktops, or on any computer in the Spinks
labs. We cannot always help you with problems arising specifically from the
IDE, but C++ should have the same bahaviour on all these systems. There are
good reasons to use an IDE like Eclipse or MSVS, but there are good reasons to
learn to use the command-line too. Learn both!
Use of String class or string or other object-oriented code (except for cin or
cout and any others explicitly allowed in an assignment specification) will
result in a flat deduction of 25%. (That means: don’t use advanced techniques
to avoid learning the concepts we are attempting to teach you!)
The Big Picture
For this assignment, you will be implementing the Huffman Tree encoding
algorithm for file compression. We’re giving you 2 weeks for all of this work.
It’s crucial that you do not leave it to the last minute. (All this talk about
code is confusing; sometimes the word code refers to C++ code. Sometimes it
refers to the Huffman code. And, unfortunately, we will have to refer to ASCII
code as well. We’ll try to be clear.)
The main application here to read a file, say message.txt, containing some
text, and then to translate (“encode”) the text using the Huffman code. Your
program will write the Huffman coded text to a new file, say coded-
message.txt. Your program will also be required to read the very same Huffman
coded text from coded-message.txt, and decode it, resulting in an exact
duplicate of original message. This will demonstrate that your application
uses the Huffman code correctly.
This is almost a real application. It’s artificial in the sense that your main
program will encode and decode the same message. A real file-compression
application will either encode a file, or decode it, but usually not both.
Students who want a challenge can go beyond the requirements to write a file
compression utility. Another unrealistic aspect is that we’re using a text-
file as the compressed file, which actually causes the “compressed file” to be
larger than the original. Don’t worry about that; the Huffman code, and the
use of lists and trees is what we’re interested in.
Files and more files
We’re giving you a lot of material to start with, by providing partially
implemented ADTs:
- List.cc, List.h: These files have List and Iterator ADT defined.
- TreeNode.cc, TreeNode.h: These files define the TreeNode ADT, which is similar to the material on Trees in Lecture Topic 10.
- Frequency.cc, Frequency.h: These files define an ADT to store a character and an integer.
- FrequencyList.cc, FrequencyList.h: These files define an ADT wrapper for Lists, with two special operations that you have to define.
- Huffman.cc, Huffman.h: These files define two new ADTs: The HUffmanTree, and the HuffmanCodec. These are explained in more detail below.
For consistency, all of the ADT implementation files have the extension “.cc”
so you can recognize them.
You should find List.cc and TreeNode.cc familiar, as you’ve seen variations of
it in previous lectures and assignments. We have made a few simple exercises
using Lists and TreeNodes to get started, but most of the List and TreeNode
ADT is given. You should review the ADT definitions, to become familiar with
the operation names, and to know the details of how the operations are called
(i.e., parameters and return values).
There are some utility files that we provided:
Element.h, TreeElement.h: These define the data that get stored in our Lists
and Trees. You won’t have to change them for this assignment. - copy.cc, copy.h: These define a useful funciton for making safe copies of C-strings. We’ve used this function several times.
- LOCALE.h: A file with 2 symbols defined. We’ve used #include “LOCALE.h” in all of the other files.
- message.txt: You can use any message file you like, but we’ve provided one that is roughly at the size limit of what the application should be able to do. It’s also topical.
There are two application files for your use: - testADTs.cpp: a program that tests almost everything about the ADTs above. There are more than 50 tests, and if they all come back as “Passed” you are almost finished!
- a9app.cpp: This is the application program. Once you have passed all the tests, build this, and see if your work is finished! YOu may make alterations to this file, but you can leave it as given, and the application will run just fine if your ADT operations are working properly.
Steps (briefly)
Each of the following steps is explained more carefully below!
- Download the files, and make sure everything runs right.
- Follow the Exercises in this assignment. The first few are warm-ups, so that you become familiar with the process you should follow. For each exercise:
a. Read the exercise.
b. Add some code to one or more of the files.
c. Build the testADT program, and check if the output shows that your work was
successful. The testing in testADT is pretty thorough. - The final exercise tests your work using the a9app.cc program, which should prove to you that everything is working well.
Download all the files and build
Moodle has the complete set of files. It will help a lot if do this work in a
new folder. The complete list of files to download is here:
Element.h
Frequency.cc
Frequency.h
FrequencyList.cc
FrequencyList.h
Huffman.cc
Huffman.h
LOCALE.h
List.cc
List.h
TreeElement.h
TreeNode.cc
TreeNode.h
a9app.cpp
copy.cc
copy.h
message.txt
testADTs.cpp
These files are also zipped up in a single archive called AllA9Files.zip. You
can download them all at once, move AllA9Files.zip into an empty folder, and
double-click on the archive. This should put all the files into that folder
automatically.
Once downloaded, you can build the two applications as follows:
g++ -Wall -pedantic *.cc testADTs.cpp -o testADTs
and
g++ -Wall -pedantic *.cc a9app.cpp -o a9app
Notice that the command-line uses .cc ; the is a wildcard character. Using a
wildcard like this is a way to say “put every file that has a .cc extension in
this command right here.” That’s why it’s important to start with an empty
directory. We want the wildcard to match the C++ files for A9, but not any
others.
Both application programs will build without error. If there’s a compilation
error, check that you have all the files!
Both applications will run, but until you finish the exercises, they will halt
almost immediately due to errors (because most of the interesting parts of the
program are blank for you to fill in).