使用 Multi-layer Perceptron
识别手写数字。
![Hand-Written
digits](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Arabic_Numerals.svg/400px-
Arabic_Numerals.svg.png)
Overview
In this programming homework, you will implement a multi-layer perceptron
(MLP) neural network and use it to classify hand-written digits shown in
Figure 1. You can use numerical libraries such as Numpy/Scipy, but machine
learning libraries are NOT allowed. You need to implement
feedforward/backpropagation as well as training process by yourselves.
Data Description
In this assignment you will use MNIST dataset (
http://yann.lecun.com/exdb/mnist/ ). You
can read its description from the url above. This dataset consists of four
files:
- Training set images, which contains 60,000 28 × 28 grayscale training images, each representing a single handwritten digit.
- Training set labels, which contains the associated 60,000 labels for the training images.
- Test set images, which contains 10,000 28 × 28 grayscale testing images, each representing a single handwritten digit.
- Test set labels, which contains the associated 10,000 labels for the testing images.
File 1 and 2 are the training set. File 3 and 4 are the test set. Each
training and test instance in the MNIST database consists of a 28 × 28
grayscale image of a handwritten digit and an associated integer label
indicating the digit that this image represents (0-9). Each of the 28 × 28 =
784 pixels of each of these images is represented by a single 8-bit color
channel. Thus, the values each pixel can take on range from 0 (completely
black) to 255 (28 1, completely white). If you are interested, the raw MNIST
format is described in http://yann.lecun.com/exdb/mnist/ .
For your convenience, we will use the .csv version of the dataset for
submission and grading. In order to access it, please download mnist.pkl.gz
and mnist_csv3.py from HW3->resource->asnlib->public to your local machine and
run the following command: python3 mnist_csv3.py
After that, you should be able to see File 1, 2, 3, 4. The format of our csv
files will be described in the Section 3 Task description below.
You can train and test your own networks locally with the whole or partial
dataset. When you submit, we provide a subset of MNIST for your
training/testing (not for grading). We reserve the grading training/testing
set (but it must be a subset of MNIST).
As an option, note that File 1 and 3 could be combined into File1+3, and File
2 and 4 can be combined into File2+4 (with the same index as File1+3). Viewed
this way, the whole data will be contained in these two files: File1+3
contains all the images, and File2+4 contains all the labels of the images.
One advantage of this is that one could partition the whole data into training
and testing sets anyway that is desired. You may easily modify mnist_csv3.py
or simply merge the .csv files to achieve this option.
Task description
Your task is to implement a multi-hidden-layer neural network learner (see
model description part for details of neural network you need to implement),
that will
- (1) Construct a neural network classifier from the given labeled training data,
- (2) Use the learned classifier to classify the unlabeled test data, and
- (3) Output the predictions of your classifier on the test data into a file in the same directory,
- (4) Finish in 30 minutes (for both training your model and making predictions).
In other words, your algorithm file NeuralNetwork. will take training data,
training labels, and testing data as inputs, and output your classification
predictions on the testing data as output. In your implementation, please do
not use any existing machine learning library call. You must implement the
algorithm yourself. Please develop your code yourself and do not copy from
other students or from the Internet.
Model description
The basic structure model of a neural network in this homework assignment is
as Figure 2 below. The figure shows a 2-hidden-layer neural network. The input
layer is one dimensional, you need to reshape input to 1-d by yourself. At
each hidden layer, you need to use a sigmoid activation function (see
references below). Since it is a multi-class classification problem, you need
to use softmax function (see references below) as activation at the final
output layer to generate probability distribution of each class. For computing
loss, you need to use the cross entropy loss function. (see references below)
There is no specific requirement on the number of nodes in each layer, you
need to choose them to make your neural network reach best performance. Also,
the number of nodes in the input layer should be the number of features, and
the number of nodes in the output layer should be the number of classes.
There are some hyper-parameters you need to tune to get better performance.
You need to find the best hyper-parameters so that your neural network can get
good performance on the given test data as well as on the hidden grading data.
- Learning rate: step size for update weights (e.g. weights = weights - learning * grads), different optimizers have different ways to use learning rate. (see reference in 2.1)
- Batch size: number of samples processed each time before the model is updated. The size of a batch must be more than or equal to one, and less than or equal to the number of samples in the training dataset. (e.g suppose your dataset is of 1000, and your batch size is 100, then you have 10 batches, each time you train one batch (100 samples) and after 10 batches, it trains all samples in your dataset.)
- Number of epoch: the number of complete passes through the training dataset (e.g. you have 1000 samples, 20 epochs means you loop this 1000 samples 20 times, suppose your batch size is 100, so in each epoch you train 1000/100 = 10 batches to loop the entire dataset and then you repeat this process 20 times)
- Number of units in each hidden layer
Remember that the program has to finish in 30 minutes, so choose your hyper-
parameters wisely.
Learning Curve Graph (we will not grade it but it may help)
In order to make sure your neural network actually learns something, You may
need to make a plot to show the learning process of your neural networks.
After every epoch (one epoch means going through all the samples in your
training data once), it may be a good idea to record your accuracy on the
training set and the validation set (it is just the test set we give you) and
make a plot of those accuracy as shown in the figure on the right.
Implementation Guidance
Suggested Steps
- Split the dataset into batches
- Initialize weights and bias
- Select one batch of data and calculate forward pass - follow the basic structure of the neural network to compute output for each layer, you might need to cache output of each layer for the convenience of backward propagation.
- Compute loss function - you need to use cross-entropy (logistic loss - see references above) as loss function
- Backward propagation - use backward propagation (your implementation) to update hidden weights
- Updates weights using optimization algorithms - there are many ways to update weights you can use plain SGD or advanced methods such as Momentum and Adam. (but you can get full credit easily without any advanced methods)
- Repeat 2,3,4,5,6 for all batches - after finishing this process for all batches (it just iterates all data points of the dataset), it is called ‘one epoch’.
- Repeat 2,3,4,5,6,7 number of epochs times- You might need to train many epochs to get a good result. As an option, you may want to print out the accuracy of your network at the end of each epoch.
Tips
There are many techniques that can speed up the training process of your
neural networks. Feel free to use them. For example, we suggest using
vectorization such as Numpy instead of for loop in python. You can also
- Try advanced optimizers such as SGD with momentum or Adam.
- Try other weights initialization methods such as Xavier initialization.
- Try dropout or batchnorm.
And so on, but you DO NOT really need them to achieve our accuracy goal. A
“vanilla” or naive implementation with proper learning rate can work very well
by itself.
DO NOT USE ANY existing machine learning library such as Tensorflow and
Pytorch.
Submission and Grading
As described previously, we will provide 3 input files (train_image.csv
train_label.csv test_image.csv) in your working path. Your program file should
be named as NeuralNetwork. (if you are using python3 or C++11, name it as
NeuralNetwork3.py/NeuralNetwork11.cpp) and output a file test_predictions.csv.
You need to make sure the output file name is exactly the same.
The training/testing dataset will be different in submission and grading, but
they are subsets from MNIST.
Grading is based on your prediction accuracy. We hope you can get at least 90%
accuracy, any result better than 90% will get all credit. Results between 50%
and 90% will get 50% credit, but if your accuracy is less than 50% you will
get nothing.
Notice: 90% is not a hard goal, if your implementation is correct, you will
find little extra work is needed to achieve the accuracy. In other words, if
you cannot get close to the goal, there is a high possibility that your code
has some problems.