Introduction
Vigenère Cipher加解密算法的第二部分。完成解密的GPU逻辑运算。
Part 2: Cracking the Cipher
Index of Coincidence
Starting in the mid 19th century some cryptologists such as Charles Babbage
were able to break Vigenère ciphers by realizing that once the length of the
key (N) was known, the problem was reduced to solving N separate Caesar shift
ciphers. Solving each of these ciphers individually is more difficult than
solving one large Caesar cipher, because there are fewer symbols in each
alphabet making frequency analysis more difficult. Furthermore, because
letters from each alphabet are not consecutive, using larger context
information such as bigrams and words is also more challenging.
Despite these difficulties, the cracking process was usually possible, just
tedious and required a fair bit of trial and error. In this homework, we avoid
the tedious part by giving you a very long cipher text such that a pure
frequency analysis based attack on each alphabet, once the key length is
known, will work. That is, we guarantee that the text is long enough for the
letter ‘e’ to be the most common symbol in each alphabet (up to a cipher
period of about 200).
How do you figure out the key length? The most general method is essentially
by using the auto-correlation of the cipher text.
Here is an example of a cross-correlation between two unrelated texts:
MYTEACHERISAWESOME
ILOVETHRUSTCODING
==================
IOC = 1 / (17 / 26) = 1.53
With such short samples, the actual numerical values don’t work out as
precisely as we would expect with longer samples.
If in the auto-correlation we have a shift of 0, then clearly we will get an
IOC of 26, this isn’t particularly useful. If we shift the text by 1, then the
IOC goes down below 1 (around .6) because in English letters tend to not
follow themselves — there are less matches than would be expected by pure
chance.
ITWASTHEBESTOFTIMESITWASTHEWORSTOFTIMES
ITWASTHEBESTOFTIMESITWASTHEWORSTOFTIMES
========================================
IOC: 0
It turns out that after we shift a text by four or more (nearly) all
correlation is lost and the IOC returns to 1.73 — as if we were comparing two
completely distinct texts.
ITWASTHEBESTOFTIMESITWASTHEWORSTOFTIMES
ITWASTHEBESTOFTIMESITWASTHEWORSTOFTIMES
==========================================
IOC: 2 / (32 / 26) = 1.625
Using the IOC to crack the cipher
If we shift a cipher text by k and calculate the IOC with an unshifted version
of itself, and if the shift matches the key length, then the IOC will be close
to 1.73 because each pair of letters will have been encoded using the same
alphabet! If the shift doesn’t match the key length it will be as if we are
comparing two randomly generated sets of characters and the IOC will be close
to 1. This works as long as the period is greater than 3, otherwise the
results are muddled by the correlation of the plaintext. For this assignment
we won’t worry about handling the case where the period is less than 4.
Here is a visual demonstration of what is happening — each alphabet is shown
as a different color. This is for demonstration purposes, don’t worry that the
shifts are less than 4 here.
Implement the Solver
In this part, you should modify solve_cipher.cu and fill in all the places
marked with TODO. You will mainly use thrust functions and fill in functor
bodies. make solve_cipher will only make for this part.
The program solve cipher takes as input a cipher text, outputs the key length
and writes the decrypted text to a file named plain text.txt.
The implementation is decomposed in two steps:
- Determine the key length (which can be between 4 and 200) using the IOC.
- Decrypt the cipher text.
Question 6
(5 points) Fill in the apply_shift functor. Your implementation can be the
same as in Part 1.
Question 7
(10 points) In the main function, fill in the part at the beginning of the
while loop in order to get the value of the IOC. After having done this, you
will know the key length. Use thrust to compute numMatches.
Question 8
(20 points) Fill in the rest of the main function (places marked TODO) to
transform the cipher text back to the plain text. Remember that, to compute
the plain text from the cipher text, you will be solving keyLength independent
Caesar ciphers. Use thrust to calculate the shifts and to transform the cipher
text back.
To ensure that your implementation is correct you should check the following:
- The output of getLetterFrequencyGpu matches the output of getLetterFrequencyCpu (the test should pass).
- The ioc values are in a similar range to those suggested above.
- The key length in solve cipher is the same as the one you passed as input to create cipher.
- The encryption key computed by solve cipher is the same as the one used by create cipher.
- The plain text.txt file, which solve cipher generates contains readable English text (with the exception of spaces and punctuation).
Running CUDA program
Machines
We will be using the icme-gpu1 cluster, which you can access with ssh.
Compiling
To use nvcc on the icme cluster, you must load the cuda module first.
module add shared
module add cuda75
module add gcc/4.8.5
You can copy and paste these three lines to your ~/.bashrc file so they will
be loaded automatically when you connect on the cluster. (Please load
gcc/4.8.5 instead of gcc, because the nvcc does not support gcc version 4.9
and up.)
You should then be able to run make or nvcc to build your CUDA binary. You can
even run your CUDA program at this time, but only the host code can be
executed and you’ll get runtime error with any GPU access. You will have to
request the GPU resource from the SLURM system.
Runing CUDA program on cluster
You have two options to run your CUDA program:
- You can submit a script as a job to the queue with sbatch.
Note we have included SLURM configurations for GPU resources in the scripts
we provide, i.e., #SBATCH –gres=gpu:1 to request GPU resources for the job.
In this way all the output will be written to a file *.out. - Alternatively, you can use the srun command. srun offers a more convenient mode to write/debug your code because contrary to the queue, you do not need to open files to look at the results of your computation.
To require GPU resources you can do the following:
srun –gres=gpu:1 YOUR_COMMAND.
Or the scripts provided with:
srun –gres=gpu:1 ./hw4.sh
If running the above line gives you “Permission denied” error, then you can
give file hw4.sh execute permission with chmod +x hw4.sh.