用C语言模拟实现操作系统中的字符串使用的堆内存管理功能。
Overview
- Your code must successfully compile and run on Submitty, which uses Ubuntu v14.04.5 LTS.
- You must use C for this homework assignment, and your code must successfully compile via gcc with absolutely no warning messages when the -Wall (i.e., warn all) compiler option is used. We will also use -Werror, which will treat all warnings as critical errors.
- Note that the gcc compiler is version 4.8.5 (Ubuntu 4.8.5-2ubuntu1~14.04.1).
Homework Specifications
In this first homework, you will use C to implement a relatively simple text
file manipulation program. The goal is to become more comfortable programming
in C on Linux, in particular handling strings and dynamically allocating (and
re-allocating) memory. Overall, your program will open and read a text file,
parse all words from the file, store each parsed word into a dynamically
allocated array, then display the words and their respective number of
occurrences. You will use a second array to keep track of the word occurrence
counts.
Required Input
The input file is specified on the command-line as the first argument. The
second command-line argument, which is optional, is an integer specifying the
number of words to display in the output. Note that words are displayed in the
order that they are first encountered in the given text file.
As an example, you could execute your program and have it read hw1-input01.txt
and display word counts for all of the words in the file as follows:
bash$ ./a.out hw1-input01.txt
As another example, you could execute your program and have it read the same
input file, but display word counts for only the first ten words in the file
as follows:
bash$ ./a.out hw1-input01.txt 10
Words are defined as containing one or more alphanumeric characters. Further,
words are case sensitive (e.g., “Lion” is different than “lion”). Consider
using fopen(), fscanf(), fgetc(), isalnum(), and other such string and
character functions. Check out the details of each function by reviewing the
corresponding man pages from the terminal.
Error Handling
If improper command-line arguments are given, report an error message to
stderr and abort further program execution. In general, if an error is
encountered, display a meaningful error message on stderr and abort further
program execution.
Error messages must have the following format:
ERROR:
Required Data Structures
For your data structures, you must use two parallel arrays, meaning that the
elements stored in each array at common index j correspond to one another. In
this case, each word is tied to its corresponding count. More specifically,
the first array contains words encountered in the given input file. The second
array contains integers that represent the number of occurrences of the
corresponding word in the file.
Your program must store words in the first array as you discover them in the
file, initially setting the corresponding integer in the second array to 1. If
a word already exists in the data structure (use linear search here), do not
add it again; instead, incremement its corresponding count in the second
array.
Note that you must store all words in memory, not just the words to be
displayed. Further, both arrays must be dynamically allocated on the runtime
heap. To do so, dynamically create the first array as an array of character
pointers (char*) using calloc(). Next, dynamically create the second array as
an array of integers, also via calloc(). Start with an array size of 8. If the
size of the arrays needs to be increased, use realloc() to do so, doubling the
size each time.
You must also dynamically allocate the memory to store each word (since a
char* is simply a pointer). To do so, use malloc() or calloc(); be sure to
calculate the number of bytes to allocate as the length of the given word plus
one, since strings in C are implemented as char arrays that end with a ‘\0’
character.
To read in each word from the file, you may use a statically allocated
character array of size 80. In other words, you can assume that each word is
no more than 79 characters long.
Finally, be sure to use free() to ensure all dynamically allocated memory is
properly deallocated. Use valgrind to verify that there are no memory leaks.
Required Output
When you execute your program, you must display a line of output each time you
allocate or re-allocate memory for the two parallel arrays.
As an example, if you are given the following hw1-input01.txt file:
Once when a Lion was asleep a little Mouse began running up and down upon him;
this soon wakened the Lion, who placed his huge paw upon him, and opened his
big jaws to swallow him. “Pardon, O King,” cried the little Mouse: “forgive me
this time, I shall never forget it: who knows but what I may be able to do you
a turn some of these days?” The Lion was so tickled at the idea of the Mouse
being able to help him, that he lifted up his paw and let him go. Some time
after the Lion was caught in a trap, and the hunters, who desired to carry him
alive to the King, tied him to a tree while they went in search of a wagon
to carry him on. Just then the little Mouse happened to pass by, and seeing
the sad plight in which the Lion was, sent up to him and soon gnawed away the
ropes that bound the King of the Beasts. “Was I not right?” said the little Mouse.
“LITTLE FRIENDS MAY PROVE GREAT FRIENDS.”
Executing the code as follows should display the output shown below:
bash$ ./a.out hw1-input01.txt 10
Allocated initial parallel arrays of size 8.
Re-allocated parallel arrays to be size 16.
Re-allocated parallel arrays to be size 32.
Re-allocated parallel arrays to be size 64.
Re-allocated parallel arrays to be size 128.
All done (successfully read 184 words; 107 unique words).
First 10 words (and corresponding counts) are:
Once – 1
when – 1
a – 6
Lion – 5
was – 4
asleep – 1
little – 4
Mouse – 5
began – 1
running – 1
Match the above output format exactly as shown above.
If the second command-line argument is omitted, display the output shown
below:
bash$ ./a.out hw1-input01.txt
Allocated initial parallel arrays of size 8.
Re-allocated parallel arrays to be size 16.
Re-allocated parallel arrays to be size 32.
Re-allocated parallel arrays to be size 64.
Re-allocated parallel arrays to be size 128.
All done (successfully read 184 words; 107 unique words).
All words (and corresponding counts) are:
Once – 1
when – 1
a – 6
…
As with the previous example, match the above output format exactly as shown
above.
Submission Instructions
To submit your assignment (and also perform final testing of your code),
please use Submitty, the homework submission server. The URL is on the course
website.
To make sure that your program executes properly on Submitty, use the
techniques below.
First, as discussed in class, output to standard output (stdout) is buffered.
To ensure buffered output is properly flushed to a file for grading on
Submitty, use fflush() after every set of printf() statements as follows:
printf( … ); /* print something out to stdout /
fflush( stdout ); / make sure that the output is sent to the /
/ terminal or redirected output file */
—|—
Second, also discussed in class, use the DEBUG_MODE technique to make sure you
do not submit any debugging code. Here is an example:
#ifdef DEBUG_MODE
printf(“the value of x is %d\n”, x);
printf(“the value of q is %d\n”, q);
fflush(stdout);
#endif
—|—
And to compile this code in “debug” mode, use the -D flag as follows:
bash$ gcc -Wall -Werror -D DEBUG_MODE homework1.c