根据截获的加密消息,了解有关如何进行加密的一些详细信息,根据 AES
算法,重建并解密出原始消息。
![AES](https://upload.wikimedia.org/wikipedia/commons/thumb/5/50/AES_%28Rijndael%29_Round_Function.png/250px-
AES_%28Rijndael%29_Round_Function.png)
Learning Outcomes
In this assignment you will demonstrate your abilities functions, pointers,
arrays, loops, and conditionals. The assignment covers the following
components of the textbook: Chapters 1 to 7 inclusive, along with material
from pages 230-235.
The assignment is intended to be very challenging and push your programming
skills significantly further than Grok, as well as testing your algorithmic
thinking. However, it is also intended to be fun, and a very close analogue to
a real world task. The first time I coded the full version of the attack for
the research paper took me a long time! You can anticipate that this
assignment might also take quite a while, but the payoff will be in your new
skills and learning.
THE FOLLOWING IS FOR INFORMATIONAL PURPOSES, YOU DO NOT HAVE TO READ THE
RESEARCH PAPER:
The inspiration for this assignment is the research paper “Practical state
recovery attacks against legacy RNG implementations” by Shaanan N. Cohney,
Matthew D. Green, and Nadia Heninger, with more details at duhkattack.com.
Read all the instructions carefully and make sure you understand what is
expected of you
Full Story (just for fun)
Watch the video here and read the accompanying description!
The (Short) Story
You have intercepted an encrypted message, and know some details about how it
was generated. The message was produced by taking the xor of the original
message with some key k1 .
This key, k1 , was produced using a pseudorandom number generator at times T11
to T19 .
However, while you know all the times at which the generator was ever used (T0
to T19 ), you only have access to outputs from T9 and T10 (O9 and O10 ).
You reverse engineered the generator and found it used a second key, k2 .
Your goal is to reconstruct k1 , and use it to decrypt the original message.
Background Material
The following additional material provides context for the assignment itself.
You can also watch the videos uploaded to the LMS for a walkthrough of the
assignment and the background material.
Generating Random Numbers
Computers don’t know how to be truly random. While one can purchase a device
to measure very unpredictable phenomena (like radioactive decays) and convert
them into random numbers, most computers have no such hardware. Instead,
computers rely on algorithms that produce random looking output that is very
hard for an outsider to distinguish from output that is truly random. How
hard? We say that no algorithm that runs in polynomial time (Big O of any
polynomial) should be able to tell if the output is truly random or not with
anymore than an minute probability. We call these pseudorandom number
generators.
Importantly, the sequence of random numbers should also not be guessable. If
you see prior output from the algorithm, they should be so “random” that you
can’t guess what will come next. This is important because we typically use a
random generator to make random keys for encryption.
One well known algorithm to generate random numbers is the the following
pseudorandom number generator (known as the ‘ANSI X9.31 DRBG’).
The generator itself relies on having a cryptographic function, normally one
called AES that we discuss in a bit more detail in Section 4.5. Because of
this choice, our assignment will process 128-bits at a time, the same amount
as AES. We represent the encryption of message m under AES with k key as AESk
(m) = c.
The X9.31 generator operates as follows:
- Step 1: Initialize the state of the algorithm with some pre-generated random array V0 of size 128-bits (think of it as an array of 16 characters). This can be done by taking a value such as the time the computer was turned on, xor’d with other timing values from the startup process. Then loop starting with i = 1 as many times as you need to generate sufficient output
- Step 2: Generate an intermediate value
Ii = AESk(Ti)
where Ti is the current time - Step 3: Generate 128 bits of output by computing
Oi = AESk(Ii ⊕ Vi−1)
- Step 4: Calculate the value that will be used in the next iteration
Vi = AESk(Oi ⊕ Ii)
Eventually you’ll end up with an array O that has as many blocks of random
numbers as you need.
Note that the value Vi is the only variable from the previous iteration that
is used in the next iteration. We therefore call Vi the state of the
pseudorandom number generator. In this design, if an attacker can figure out
the state, they can predict all future outputs.
The values we provide already incorporate the value of V0 , so you will not
need to worry about it in the assignment.
The DUHK Attack
Unfortunately the ANSI X9.31 random number generator has a deadly flaw. If the
following conditions hold, an attacker can guess the sequence of random
numbers, and therefore know any key that’s made up of those numbers:
The attack works as follows for i = 1 without loss of generality, assuming the
following notation for AES decryption AES k-1 (c) = m
Now assume that our clever attacker:
- Has a small number of good guesses for the key k
- Has intercepted Oi and Oi+1
- Knows the values of Ti and Ti+1 the times when the two outputs were generated
The attacker now makes lots of guesses for the key k. When they guess the
right one, the left side and the right side of the equations will be equal!
They now know the key.2
Because the attacker now knows the key they can compute the secret state of
the generator using: an equation in which they know the value of all the
variables on the right hand side.
And, once the attacker knows the state of the generator, they can generate all
the subsequent outputs Oi+2 , Oi+3 etc., assuming they also know the times at
which those are generated.
Bits and Bytes
To carry out this attack you might want to learn a bit more about how we can
store and manipulate all the values used for this sort of cryptography.
Remember from lecture that all values are ultimately represented inside a
computer’s memory as numbers of some sort. Every number can be written in
different ‘bases’, like decimal (base-10) or hexadecimal (base-16) which we
use for addresses, or binary (base-2).
If we write a number in binary, it will consist of only zeroes and ones, with
the rightmost digit corresponding to the one-place, the second-rightmost digit
corresponding to the twos-place, then fours-place, eights-place, etc.
ascending in powers of two.
This is directly analogous to the places in a base-10 number: the ones place,
the tens place, the hundreds place.
Thinking back to what we know about the char type, we know it can store up to
256 values (including zero). To write all these numbers in binary we would
need eight ones and zeroes, each of which we call a ‘bit’, with eight bits
making up a ‘byte’. So, when we represent a char in binary, we will typically
add zeroes to the front until all eight bits are represented.
This assignment will not require you to manipulate the bits of any individual
number other than using the XOR operation on two bytes at a time.
We can do the same thing, just with base-16, instead of base-2, where along
with the digits 0-9 we add the letters A-F to our repertoire. We call this the
hexadecimal system, or hex for short.
This means after 9, we count A, B, C, D, E, F and then we’re out of symbols,
so we add a 1 to the next column (the 161 s), and put a zero in the original
column (the 160 s).
Likewise we could write3 204 as 12 161 + 12 160 = 0xC 161 + 0xC 160 = 0xCC.
Note that here I prefix hexadecimal numbers with 0x to indicate to the reader,
by convention, that the number is in base-16.
XOR
One of the operations that we care about more in cryptography is called
exclusive-or.
Given two eight-bit numbers represented as a collection of eight bits i0...7
and j0...7
, we can define operations that change the bits in different
ways. One of these operations is XORshort for exclusive-orwhich we write using
the symbol . We can define XOR in the following manner:
Let l0…7 be the output of the operation.
One-time pad and Basic Encryption
For the very last part of the assignment you’ll need to know just a tiny bit
more cryptography which we present here.
Symmetric encryption (where both sides start sharing a secret key) obeys the
following property: Dk (Ek (m)) = m
.
That is with a message m, encryption algorithm E and decryption algorithm D,
both of which use a key k, the decryptionof-the-encryption should be the
original message.
A one-time pad is a way of encrypting a message. Given a key that is as many
bits in length as the message to encipher, this provides the best theoretical
security guarantee of any cipher4 . To encrypt a message m into a ciphertext c
with key k the process is simple: for the i-th byte of the message, XOR it
with the i-th byte of the key.
The decryption process is similar: for the i-th byte of the ciphertext, XOR it
with the i-th byte of the key.
Why don’t we use this technique everywhere? For the size of the messages that
we transmit daily, we’d need to have already sent and stored keys just as long
as the messages! This quickly gets unwieldy and infeasible.
Instead we typically use ciphers that aren’t as strong but that can use a
small key to secure messages that are for all practical purposes, secure. The
most widely used (symmetric) cipher is called AES (the Advanced Encryption
System).5
Your Assignment
Your solutions must be contained within a C file, program1.c.
Your code must compile as submitted or you may receive a zero on the
assignment.
Your program is to read in input of the following format:
- Input Line 1: Length of Encrypted Message in bytes, an integer l with a maximum value of 1024.
- Input Line 2: Encrypted Message encrypted using a one time pad under key k1 , which is l characters in length. Each character is represented as a two-digit hexadecimal number in the file (see Lecture 10, minute l15). This corresponds to b = 16 groups of 16-characters (blocks), where x is the ceiling function. The numbers will not be prefixed with 0x.
- Input Line 3: Outputs O9 and O10 from the random number generator, in groups of 16 characters, where each character is represented as a two-digit hexadecimal number in the file.
- Input Line 4: Timesteps T0 through T19 used to generate O0 to O19 , in groups of 16-characters (blocks), where each character is represented as a two-digit hexadecimal number in the file.
- Input Line 5: The first 1284 characters in the text of Sherlock Holmes (or some other novel!), the cipher book.
There is no trailing newline at the end of your input.
Your job is to read in the text of the book, and find which 16-character block
has been used as the key k2 for the random number generator.
You will then generate enough output from the random number generator to
produce k1 , which is composed entirely of this random output.
Finally, you will decrypt the original message using the key k1 .
Along with this spec we are providing you with skeleton code to fill in, a set
of libraries to accompany the scaffold and files containing the input data.
See Section 6 for more details.
We now provide information about the stages in greater detail.
Stage 0: Reading the Input File
Your first task is to read in the input file (from stdin, the standard input),
and parse each line correctly, as described above.
This is typically done (outside Grok) with a command similar to this one:
./program1 < assignment1-input1.txt
When the program tries to read input, it receives the contents of the file.
You can always open up a file in your text editor to see what’s inside, and we
encourage you to do so for this assignment.6
Do not use the facilities in C to open the input file within your code. While
this is often good practice, we have not learned it yet in class. Therefore,
your submission must take in input as described above.
You must store your results in the following variables, available as arguments
in the function stage0:
- int *ciphertext_length which should be set by pointer to the length of the ciphertext (Line 1),
- msg_t ciphertext which should contain the ciphertext (Line 2),
- block_t outputs[] which should contains the two outputs from the random number we provide (Line 3),
- block_t timesteps[] which should contains all the timesteps (Line 4), and
- book_t cipherbook which should contain the text of the cipher book. (Line 5).
These values are passed to the submit_stage0 function called in main. You will
not be able to pass the tests on Grok if you leave extraneous printf
statements in your code. This will present you with debugging output in the
following format (truncated for brevity):
Stage 1: Stripping Punctuation
Produce an array that consists of only the alphanumeric values from the book’s
text on Line 5.
You may not use any functions from string.h or ctype.h.
You must store your results in the following variables, available as arguments
in the function stage1:
- book_t cipherbook[] which should (after your completed implementation) contain only the alphanumeric values from the book, and
- int *book_len which should be set by pointer to be the length of the cipher book after you have removed all non-alphanumeric values.
These values are passed to the submit_stage1 function called in main. This
will submit your result for stage 1 for grading, as well as giving you a debug
output with the format (truncated for brevity):
Stage 2: Guessing the key k2
You will need to implement a loop to iterate through the stripped book you
generated in stage 1, in blocks of 16-bytes. For each block, generate both
sides of eq. (1). above and check if both sides match. When both sides match,
you have discovered the correct set of 16 bytes that comprises k2
You must store your result in the following variable, available as arguments
in the function stage2:
- block_t key2 which should contain the 16 characters that make up k2 .
This value is passed to the submit_stage2 function called in main. This will
submit your result for stage 2 for grading, as well as giving you a debug
output with the format:
Stage 3: Generating the key k1
You will need to generate outputs O11 , O12 , …, Ok until you have generated n
bytes of output where n is the length in characters of the encrypted message
on Line 2. Remember each O is 16-bytes in length.
Write a function that implements the random number generator, and use it to
generate output of length n. This output is k1 . To do so you will need to
write code that performs Steps 1 to 4 listed in section 4.1.
This will require you to first calculate V10 , the initial state of the
generator, which you should be able to compute from
Stage 4: Decrypting the original message
Iterate over both k1 and c the encrypted message taking the xor to produce the
original unencrypted message and solve the caper!
You must store your result in the following variable, available as arguments
in the function stage4:
- byte_t plaintext[] which should contains the decrypted message.
These values are passed to the submit_stage4 function called in main. This
will submit your result for stage 4 for grading, as well as giving you a debug
output with the format:
Provided Code
We are providing you with a file that contains some pre-existing code to both
aid in grading, and also to help you structure your assignment.
You must use the scaffold code for the assignment, else you will have
differences in your output and will face severe mark deductions.
The scaffold begins with an authorship declaration. You must fill this in with
your full name, student number and date.
Type definitions
The scaffold provides a set of type definitions: book_t, byte_t, block_t and
msg_t. These are aliases for their defined types. For example, block_t,
representing a block of 16 bytes of data is defined with:
#define BLOCKSIZE 16 // The length of a block
typedef unsigned char byte_t; // A byte (8 bits)
typedef byte_t block_t[BLOCKSIZE]; // A cipher bitset (block) (16 bytes)
—|—
block_t can be used anywhere where you would use a unsigned char[16]. For the
rest of the implementations, see the scaffold code.
The aes.h library
The scaffold includes the library aes.h with the line #include “aes.h”. It
provides the simple AES implementation functions:
AES_encrypt(unsigned char message[16], unsigned char key[16], unsigned char output[16])
AES_decrypt(unsigned char ciphertext[16], unsigned char key[16], unsigned char output[16])
—|—
Clearly, a variable of type block_t could be used in the arguments of the AES
functions.
Do not modify any code in aes.h or aes.c. It will be replaced with a fresh
version in grading.
The a1grader.h library
The scaffold includes the library a1grader.h with the line #include
“a1grader.h”.
It provides access to the following functions:
void submit_stage0(int cipher_length, msg_t ciphertext, block_t outputs[N_OUTPUT_BLOCKS], block_t timesteps[N_TIMESTEPS], book_t cipherbook);
void submit_stage1(book_t stripped_book, int book_len);
void submit_stage2(block_t key2);
void submit_stage3(byte_t *key1);
void submit_stage4(msg_t plaintext);
—|—
These are called automatically in the scaffold code at the end of each stage
of the assignment, for printing the outputs you saw in 5, as well as
submitting those stages for assessment during grading. Make sure your values
for submission are stored in the variables that are passed to these functions.
The library also provides the helper function void hexdump(byte_t a[], int
len), a debug function that outputs a byte array in the format you saw in each
of the stages in Section 5.
On each line, the bit number of the first byte of each block is printed in
hexadecimal. Then follows two sets of eight bytes (a block of 16 bytes), with
each byte printed as pairs of hexadecimal characters (00 to FF represents all
values 0 to 255 that can be stored in a byte). Finally, an ASCII
representation of these bytes is printed, with all non-printable ASCII
characters (control characters, newlines, invalid ASCII) being printed as
(centerdot). In the example above, the byte 0f is at address 0x0010 and is not
a printable ASCII character, so is printed as a .
You are free to use this function in debugging your code, but we recommend
removing those debug printouts when you submit your final work.
We have provided you an implementation for this library, in a1grader.c for the
development version of these functions.
For grading, we will use the grading version of a1grader.c, which you will not
be able to see, specifically for grading your work.
Do not modify any code in a1grader.h or a1grader.c. It will be replaced with a
fresh, but modified version for grading.
The scaffold code includes the functions from these files by referencing their
header files:
// Provides the functions AES_encrypt and AES_decrypt (see the assignment spec)
#include “aes.h”
// Provides functions to submit your work for each stage.
#include “a1grader.h”
—|—
If you wish to see how any of these functions are implemented you may open and
view aes.c and a1grader.c.
The read_hex_line helper function
We provide the read_hex_line function to help you read the input data.
int read_hex_line(unsigned char output[], int max_count, char *last_char)
—|—
It reads input from the command line, repeatedly converting pairs of
hexadecimal characters (‘0’ to ‘F’), representing bytes of data to byte_t
values until either a newline ‘\n’ is read, or max_count bytes have been read
in. The function returns the number of bytes that have been successfully read
in. If you would like to know the last character that was read in by the
function after reading in values (for example, to see if it read in a
newline), pass a character by pointer in the last_char argument to retrieve
that value. You may also pass NULL to that argument if you do not care about
that value.
Developing and testing your program
If you are developing on Grok, all of the files including the scaffold (where
you will complete the assignment code), an implementation for aes.c and the
sample grader a1grader.c, and their header files (aes.h, aes.h) will be
automatically included.
You may compile and run as normal there.
If you plan to develop on your own machine, you will need to make sure you
have all of these files downloaded, with their correct names, including the
sample test files named assignment1-input1.txt and assignment1-input2.txt.
They should all be within the same directory. You can then compile your
program with the following command:
clang -Wall -std=c11 -pedantic program1.c aes.c a1grader.c -o program1
If you are using the gcc compiler, replace clang with gcc in the above line.
Your code will be compiled with the same command as above for grading, so make
sure it compiles without warnings and behaves as expected! Programs that do
not compile will receive serious mark deductions!
You must not remove or modify any of the provided code, as doing so will
confuse our grading tools.
Note: The test files we’re using on Grok do not contain the same secret
message contained in the provided input files. You can get full points either
way, but it’s more fun to actually decrypt the real messages. We do this
because Grok shows you the desired output, and we want to hide the actual
secrets!
Style
Style will be graded as part of each section. While there is no official style
guide, we require that your code be readable, appropriately documented, and
use abstraction where appropriate (functions and typedefs).
We also provide the following set of indicators that we will use when
evaluating your code for style, along with explanations for why we care about
some of them.
Submission Instructions
This assignment is worth 15% of your final mark for the subject.
You need to submit your code in Grok Learning ( https://groklearning.com
) for assessment.
Submission will NOT be done via Canvas. Instead, you will need to:
- Log in to Grok Learning using your student login details.
- Navigate to the Assignment 1 module of our subject COMP10002 2021 S1: Foundations of Algorithms.
- Place your code in the program1.c tab window.
- Compile your code by clicking on the Compile button.
- Once the compilation is successful, click on the Mark button to submit your code. (You can submit as many times as you want to. Only the last submission made before the deadline will be marked.)
- Two sample tests will be run automatically after you make a submission. Make sure that your submission passes these sample tests.
- Two hidden tests will be run for marking purpose. Results of these tests will not be available until after the submissions are marked.
You can (and should) submit both early and often - to check that your program
compiles correctly on our test system, which may have some different
characteristics to the lab machines and your own machines.
To receive any points for the coded stages your code must compile.
We will test your code with inputs (and novels) other than that provided as a
sample. Please ensure you don’t hard-code any values from the input file.
You will be given sample input files:
- assignment1-input.txt
- assignment2-input.txt
- grok-input1.txt
- grok-input2.txt
and sample output for the two Grok inputs: grok-output1.txt and grok-
output2.txt. You can test your code on your own machine with any one of the
files using following command and compare the output with the matching file:
./program1 < assignment1-input.txt /* < feeds data from the file into the program */
For the input files without matching output, you will know you have solved the
task when the result of Stage 4 is a message written in English. The is no
grade penalty for not testing on the two files without output, but doing so
will allow you to read the secret messages within!
Note that we are using the following command to compile your code on the
submission system (we name the source code file program.c). clang -Wall
-std=c11 -pedantic program1.c aes.c a1grader.c -o program1
The flag “-std=c11” enables the compiler to use a more modern standard of the
C languageC11. To ensure that your submission works properly on the submission
system, you should use this command to compile your code on your local machine
as well.
Additional Testing Features
We have added a feature to the scaffold allowing you to produce just the
output from a particular stage, and save the output to a file.
Grok automatically uses this feature to test each stage, and will show you
both the output from your program given the stage being tested, and compare it
to the correct output.
To do this on your own computer run the program as follows:
./program1 n < assignment1-input.txt /* < feeds data from the file into the program */
Where you must replace n with the number of the stage you wish to test.
The output is then stored in a file program1_output.txt which you can open
with a text editor.
To compare this file with the sample output we have provided a utility that
works on Linux/Mac or Linux on Windows, called compare.sh. To use it, first
run your program with the argument for the stage you wish to check. Then run
the following command where n is the stage number, and correctfile is the name
of the output file against which you wish to compare your own output.