完成C语言基础练习,根据规则实现 [ Encode ](https://en.wikipedia.org/wiki/Binary-to-
text_encoding “Encode”) 和Decode方法。
![Encode](https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Original_source_code_bitcoin-
version-0.1.0_file_base58.h.png/400px-Original_source_code_bitcoin-
version-0.1.0_file_base58.h.png)
Requirement
- Your coding style will NOT be graded for this question.
- Your testing methodology will NOT be graded for this question.
- All assignment rules and policies are in place for this question.
- There is no public test. Marmoset only ensures that your code “runs”.
- For this and EVERY question, you may define helper functions
For your convenience, the same README.TXT file appears in q1a and q1b.
When we transmit information we often want to try and minimise the size of the
data that we have to send using data compression.
A popular method of data compression is to identify common patterns and then
replace (substitute) those patterns with shorter codewords.
For this question, we compress a string that contains only lower case letters
and whitespace (no punctuation, upper case, numbers, etc.).
We replace a pattern of lower case letters with a single non-lower case
character.
For example, we could replace the pattern “the” with the char ‘T’.
If the original message was “the cat in the hat”, the compressed (or
“encoded”) message would be “T cat in T hat”.
There could be multiple patterns, so we could also encode “at” as ‘@’ and then
the message would be “T c@ in T h@”.
A “mapping”struct map
contains: - the single character [small] and
- the pattern [big].
struct map {
char small;
char *big;
};
—|—
For the above example, we would have:
struct map example[2] = { {‘@’, “at”}, {‘T’, “the”two big parantheses;
—|—
There are a few rules for a [struct map] that must be followed:
- [small] must be in the ASCII range ‘!’ … ‘_‘ (inclusive)
- [big] must be a valid and non-empty string that contains only lower case letters (‘a’ … ‘z’) and no other characters (e.g., no whitespace)
When working with an array of maps, there are additional rules: - the [small] fields must all be unique (no duplicates)
- the [big] fields must not only be unique, but no [big] field can be a prefix of another [big] field (see below for an example)
- the elements of the array must be sorted so that the [big] strings are in lexicographical order (see above: “at” before “the”)
Rule 5 exists so you can use binary search on an array of maps to find a
pattern match, so if the longest [big] in the array is “m” and there are “k”
entries, then it would take O(m*logk) to search the array.
Rule 4 exists to make this assignment easier by removing any ambiguity.
No [big] field can be the prefix of another big field, so you could not have
both “the” and “theatre” as [big] fields because “the” is a prefix of
“theatre” (“the” appears at the start of “theatre”).
Even despite rule 4, there could be some ambiguity because some [big] fields
could overlap. Consider this overlap example:
struct map overlap_example[3] = { {‘@’, “at”}, {‘E’, “eat”}, {‘T’, “the”two big parantheses;
—|—
This is a valid array because no [big] field is the prefix of any other.
However, for the string “theatre” it could be encoded as “T@re” or “thEre”.
To avoid this, we have our final rule:
- the “first” match is always selected, moving from the start of the message string to end (in other words, moving left to right)
So in our previous example, the encoding would be “T@re” because the first
match moving left-to-right would be “the”.
In part A, you use an array ofstruct map
to encode (compress) a string
In part B, you use an array ofstruct map
to decode (decompress) a string
For both part A and B we have provided a test client that uses the same file
format.
The file format should be straightforward for you to understand: - make sure you provide your mappings in alphabetical order by [big] field
- a tilde (~) followed by a newline is used to separate the mappings and the message
Additional Notes
- In both parts, you must provide a new, dynamically allocated string that has been realloc’d to the correct length.
- If you were to encode a string and then decode it, the result should be the same as the string you had prior to encoding.
- In Part A, the encoded string must be as short as possible following the above rules, meaning that if a substitution is possible it must be done.