这次是Java基础课作业,没有学太多的Java高级语法,但是作业却是要求编写一个古老的 Enigma 加密机,逻辑非常复杂。
Introduction
This programming assignment is intended to exercise a few useful data
structures and an object-based view of a programming problem. There is some
background reading, but the necessary program is not (or rather need not be)
terribly big.
We will be grading largely on whether you manage to get your program to work
(according to our tests). In addition, we will be looking at your own tests
(which you should be sure to turn in as well). While we have supplied a few
unit tests and some simple integration tests and testing utilities, the tests
in skeleton are entirely inadequate for testing your program. There is also a
stylistic component: the submission and grading machinery require that your
program pass a mechanized style check (style61b), which mainly checks for
formatting and the presence of comments in the proper places. See the course
website for a brief description of the style rules.
Background
You may have heard of the Enigma machines that Germany used during World War
II to encrypt its military communications. If you have not, I recommend you
read the wikipedia page on them, or similar resource, especially the part
about design and operation. This project involves building a simulator for a
generalized version of this machine (which itself had several different
versions.) Your program will take descriptions of possible initial
configurations of the machine and messages to encode or decode (the Enigma
algorithms were reciprocal, meaning that encryption is its own inverse
operation.)
The Enigmas effect a substitution cipher, on the letters of a message. That
is, at any given time, the machine performs a permutation—a one-to-one
mapping—of the alphabet onto itself. The alphabet consists solely of the 26
letters in one case (there were various conventions for spaces and
punctuation).
Plain substitution ciphers are easy to break (you’ve probably seen puzzles in
newspapers that consist of breaking such ciphers). The Enigma, however,
implements a progressive substitution, different for each subsequent letter of
the message. This made decryption considerably more difficult.
The device consists of a simple mechanical system of (partially)
interchangeable rotors (Walzen) that sit side-by-side on a shaft and make
electrical contact with each other. Most of these rotors have 26 contacts on
both sides, which are wired together internally so as to effect a permutation
of signals coming in from one side onto the contacts on the other (and the
inverse permutation when going in the reverse direction). To the left of the
rotors, one could select one of a set of reflectors (Umkehrwalzen), with
contacts on their right sides only, and wired to connect half of those
contacts to the other half. A signal starting from the right through one of
the 26 possible contacts will flow through wires in the rotors, “bounce” off
the reflector, and then come back through the same rotors (in reverse) by a
different route, always ending up permuted to a letter position different from
where it started. (This was a significant cryptographic weakness, as it turned
out. It doesn’t really do a would-be code-breaker any good to know that some
letters in an encrypted message might be the same as the those in the
plaintext if he doesn’t know which ones. But it does a great deal of good to
be able to eliminate possible decryptions because some of their letters are
the same as in the plaintext.)
Each rotor and each reflector implements a different permutation, and the
overall effect depends on their configuration: which rotors and reflector are
used, what order they are placed in the machine, and which rotational position
they are initially set to. This configuration is the first part of the secret
key used to encrypt or decrypt a message. In what follows, we’ll refer to the
selected rotors in a machine’s configuration as 1–N, with 1 being the
reflector, and N the rightmost rotor. In our simulator, N will be a
configuration parameter. In actual Enigma machines, it was fixed for any given
model (the Navy used four and the Wehrmacht used three.)
The overall permutation changes with each successive letter because some of
the rotors rotate after encrypting a letter. Each rotor has a circular ratchet
on its right side and a an “alphabet ring” (Ringstellung) on its left side
that fits over the ratchet of the rotor to its left. Before a letter of a
message is translated, a spring-loaded pawl (lever), one to the right of each
rotating rotor, tries to engage the ratchet on the right side of its rotor and
thus rotate that rotor by one position, changing the permutation performed by
the rotor. The lever on the rightmost rotor (N) always succeeds, so that rotor
N (the “fast” rotor) rotates one position before each character. The pawls
pushing the other rotors, however, are normally blocked from engaging their
rotors by the ring on the left side of the rotor to their right.
This ring usually holds the pawl away from its ratchet, preventing the rotor
wheel to its left from moving. However, the rings have notches in them (either
one or two in the original Enigma machines), and when the pawl is positioned
over a notch in the ring for the rotor to its right, it slips through to its
own rotor and pushes it forward. A “feature” of the design called “double
stepping” (corrected in other versions of the Enigma, since it reduced the
period of the cipher) is that when a pawl is in a notch, it also moves the
notch itself and the rotor the notch is connected to, so that the rotors on
both sides of the pawl move.
Let’s illustrate with a much simplified version. Suppose our alphabet has only
the letters A-C and we have four rotors (numbered 1-4) each of which has one
notch on its ring at the C position. Suppose also that there are 3 pawls, one
for each of rotors 2-4. There is no pawl for rotor 1, which will therefore not
rotate. We’ll start with the rotors set at AAAA. The next 19 positions are as
follows:
AAAB AAAC AABA AABB AABC AACA ABAB ABAC
ABBA ABBB ABBC ABCA ACAB ACAC ACBA ACBB
ACBC ACCA AAAB
As you can see,
- Rotor 4, the fast rotor, advances each time, pushed by pawl 3.
- Rotor 3 advances whenever rotor 4 is at C, because then pawl 3 pushes on its ratchet.
- Rotor 2 advances whenever rotor 3 is at C, pushed by pawl 2. Rotor 3 also advances when it is at C, because when pawl 2 slips into rotor 3’s notch it will push against that notch when it moves.
- There is no pawl 1, so rotor 2 (unlike rotor 3) does not advance just because it is at C.
- Rotor 1 never changes, since there is no pawl on either side of it.
So the advancement of the rotors, while similar to that of the wheels of an
odometer, is not quite the same. If it were, then the next position after AACA
would be AACB, rather than ABAB.
The effect of advancing a wheel is to change where on the wheel any given
signal enters or leaves. When a wheel is in its ‘A’ setting in the machine,
then a signal that arrives from the right at, say, the ‘C’ position, goes into
the ‘C’ contact on the wheel. Likewise, a signal that leaves the wheel from
its left ‘C’ contact exits at the ‘C’ position. When the wheel is rotated by
one to its ‘B’ setting, a signal that arrives at the ‘C’ position goes instead
into the ‘D’ contact on the wheel, and a signal that leaves through the ‘D’
contact does so at the ‘C’ position. It’s easier to calculate if we use
numbers 0—25 rather than letters (‘A’ is 0, ‘B’ is 1, etc.). Then, when the
wheel is in its k setting, a signal entering at the p position enters the ` p
- k mod 26
contact on the wheel, and a signal exiting through the c contact does so at the
c−k mod 26 ` position. For example, Figure 1 shows one of the
rotors from the real Enigma machines (called rotor “I”) and the effect of
moving from its ‘A’ to its ‘B’ setting.
The contacts on the rightmost rotor’s right side connect with stationary input
and output contacts, which run to keys that, when pressed, direct current to
the contact from a battery or, when not pressed, direct current back from the
contact to a light bulb indicating a letter of the alphabet. Since a letter
never encrypts or decrypts to itself after going back and forth through the
rotors, the to and from directions never conflict.
The German Navy used a machine with 12 rotors and five slots for them:- Eight rotors labeled with roman numerals I—VIII, of which three will be used in any given configuration as the rightmost rotors,
- Two additional non-moving rotors (Zusatzwalzen) labeled “Beta” and “Gamma”, of which one will be used in any configuration, as the fourth-from-right rotor, and
- Two reflectors (Umkehrwalzen), labeled ‘B’ and ‘C’, of which one will be used in any given configuration as the leftmost rotor.
Given just this equipment, there are 614,175,744 possible configurations (or
keys): - Two possible reflectors, times
- Two possible rotors in the fourth position, times
- 8!/(8−3)!=336 choices for the rightmost three rotors and their ordering, times
- 26^4 possible initial rotational settings for the rightmost four rotors (each reflector had only one possible position.).
Especially by today’s standards, this is not a large key size (less than 30
bits). To make things more difficult for code-breakers, therefore, the Enigma
incorporated a plugboard (Steckerbrett) between the keyboard and the rightmost
wheel. It acted as a non-moving, configurable rotor. The operator could choose
any set of disjoint pairs of letters by means cables placed between them on
the plugboard. Each selected pair would then be swapped going into the machine
from the keyboard and coming out into the indicator lights. Thus, if the
operator connected (“steckered”) the letters A and P, then P would be
substituted for each A typed and vice versa. Likewise, if an ingoing letter
was encrypted to P by the other rotors, it would display as A, and letters
decrypted as A would display as P.
Describing Permutations
Since the rotors and the plugboard implement permutations, we’ll need a
standard way to describe them. We could simply have a table showing each
letter and what it maps to, but we’ll use a more compact notation known as
cycle representation. The idea is that any permutation of a set may be
described as a set of cyclic permutations. For example, the notation
describes the permutation in Figure 1. It describes seven cycles:
- A maps to E, E to L, L to T, …, R to U, and U back to A.
- B maps to K, K to N, N to W, and W back to B.
- C maps to M, M to O, O to Y, and Y back to C.
- D maps to F, F to G, and G back to D.
- I maps to V and V back to I.
- J maps to Z and Z back to J.
- S maps to itself.
The inverse permutation just reverses these cycles: - U maps to R, R to X, …, E to A, and A back to U.
- …
- S maps to itself.
Each letter appears in one and only one cycle, so the mapping is unambiguous.
As a shorthand, we’ll say that if a letter is left out of all cycles, it maps
to itself (so that we could have left off “(S)” In the example above.)
Example
As an example of a translation, consider the set of rotors from Figure 2, and
suppose that
- The rotors in positions 1—5 are, respectively, B, Beta, III, IV, and I.
- The rotors in positions 2—5 are currently at positions A, X, L, E, respectively.
- In the plugboard, the letter pair ‘Y’ and ‘F’ and the letter pair ‘Z’ and ‘H’ are both interchanged.
Input and Output
To run your program, you can use the command
java enigma.Main [configuration file] [input file] [output file]
The configuration file contains descriptions of the machine and the available
rotors. The data are in free format. That is, they consist of strings of non-
whitespace characters separated by arbitrary whitespace (spaces, tabs, and
newlines), so that indentation, spacing, and line breaks are irrelevant. Each
file has the following contents:
- A string of non-blank characters, giving the alphabet. Unless you do the extra credit, you may assume this is the upper-case alphabet.
- Two integer numerals, S>P≥0, where SS is the number of rotor slots (including the reflector) and PP is the number of pawls—that is, the number of rotors that move. The moving rotors and their pawls are all to the right of any non-moving ones.
- Any number of rotor descriptors. Each has the following components (separated by whitespace):
The input file to your program will consist of a sequence of messages to
decode, each preceded by a line giving the initial settings. Given the
configuration file above, a settings line looks like this:- B BETA III IV I AXLE (YF) (ZH)
(all upper case.) This particular example means that the rotors used are
reflector B, and rotors Beta, III, IV, and I, with rotor I in the rightmost,
or fast, slot. The remaining parenthesized items indicate that the letter pair
Y and F and the pair Z and M are steckered (swapped going in from the keyboard
and going out to the lights).
In general for this particular configuration, rotor 1 is always the reflector;
rotor 2 is Beta or Gamma, and each of 3—5 is one of rotors I—VIII. A rotor may
not be repeated. The four letters of the following word (AXLE in the example)
give the initial positions of rotors 2—5, respectively (i.e., not including
the reflector). Any number of steckered pairs may follow (including none).
After each settings line comes a message on any number of lines. Each line of
a message consists only of letters, blanks, and tabs (0 or more). The program
should ignore the blanks and tabs and convert all letters to upper case. The
end of message is indicated either by the end of the input or by a new
configuration line (distinguished by its leading asterisk). The machine is not
reset between lines, but continues stepping from where it left off on the
previous message line. Because the Enigma is a reciprocal cipher, a given
translation may either be a decryption or encryption; you don’t have to worry
about which, since the process is the same in any case.
Output the translation for each message line in groups of five upper-case
letters, separated by blanks (the last group may have fewer characters,
depending on the message length). Figure 3 contains an example that shows an
encryption followed by a decryption of the encrypted message. Since we have
yet to cover the details of File I/O, you will be provided the File IO
machinery for this project.
- B BETA III IV I AXLE (YF) (ZH)
Handling Errors
You can see a number of opportunities for input errors:
- The configuration file may have the wrong format.
- The input might not start with a setting.
- The setting line can contain the wrong number of arguments.
- The rotors might be misnamed.
- A rotor might be repeated in the setting line.
- The first rotor might not be a reflector.
- The initial positions string might be the wrong length or contain characters not in the alphabet.
A significant amount of a program will typically be devoted to detecting such
errors, and taking corrective action. In our case, the only corrective action
needed is to throw an EnigmaException with an explanatory message.
Testing
The directory testing contains the scripts test-correct and test-error for
testing the execution of enigma.Main.
- bash test-correct F1.inp F2.inp … will run the program for each of the message files F1.inp, F2.inp …, comparing the results to the corresponding output files F1.out, F2.out, …. The configuration files used are F1.conf, F2.conf, …. However, if any of these is missing, the file default.conf (from the same directory) is used instead.
- bash test-error F1.inp F2.inp … will run the program for each of the message files F1.inp, F2.inp …, checking that the program reports at least one error in each case. The configuration files are as for test-correct. The tests we’ve supplied are nowhere near adequate to test your program, so you will need to generate your own integration tests as well (we will check to see that you make an effort to test).
Extra Credit
If you feel up to it, consider extending your program to work on more general
alphabets (which will be specified by the first string in the configuration
file). The effect of specifying a new alphabet is to change the size and
contents of the rotors. Continue to convert lower-case letters in messages to
upper case. Alphabets should not contain whitespace, lower-case letters, or
any of the special characters “(“, “)”, or “*”.