Recognize Handwritten Digits with a Neural Network in TensorFlow
This guide teaches you how to create a neural network for recognizing handwritten digits from the mnist dataset in the popular Python library, TensorFlow. TensorFlow is an open source software library for machine learning developed by Google’s Brain Team. Read more about the TensorFlow library at the bottom of this page. Some knowledge of neural networks is required to get the full output from this guide, even though the code can easily be implemented by everyone. Feel free to check out this Introduction to Neural Networks. Throughout the guide, there will be guiding comments in the code to help you understand what is going on. However, if some things remain unclear, you are very welcome to ask questions in the comment section below this guide.
How to build a Neural Network in TensorFlow for the Mnist Dataset
In order to create a neural network in TensorFlow, one must implement a series of main functionalities. One way to do this is to design and implement the following functions:
initialize_parameters
initializes the parameters to be trained by the neural network.forward_propagation
calculate_cost
train_parameters
predict
First things first. We want to start out by installing TensorFlow. Follow this guide on how to install TensorFlow.
In this guide, we will be using the mnist dataset which can be obtained using the sklearn library. Check out this guide for more information on how to get test data using sklearn.
Here is a quick example of how you can load the mnist dataset using sklearn:
from sklearn.datasets import fetch_mldata from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder # Load the mnist data data = fetch_mldata('MNIST original') # Prepare the data for machine learning X = data.data y = data.target.reshape(len(X),1) encoder = OneHotEncoder() encoder.fit(y) y = encoder.transform(y).toarray() # Divide X and y into a train- and a test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
Now we are ready to get started with deep learning in TensorFlow!
Initialize Parameters
This first function we are going to write will be the initialize_parameters(layout)
. This function takes the layout of the neural network layout
as input and returns some initial parameters for our model. The parameters are a weight matrix W
and a bias vector b
for each layer.
The neural network layer will be defined as a dictionary as follows. Feel free to try different network layouts.
layout = {"l0":len(X_train[0]), "l1":100, "l2":100, "l3":60, "l4":len(y_train[0])}
For initialize_parameters()
, we want to loop over each of the layers in layout
and initialize parameters W
and b
for each.
import tensorflow as tf def initialize_parameters(layout): """ Arguments: layout (dict) : Layer sizes of the neural network Returns: parameters (dict) : Initialized parameters """ # Create parameters dictionary to store W and b and initialize counter i parameters = {} i = 1 # Loop over all the hidden layers and the output # and save initialized values for W and b in parameters for _ in layout: # Define layer conditions W = "W" + str(i) b = "b" + str(i) current_layer_size = layout["l" + str(i)] previous_layer_size = layout["l" + str(i-1)] # Initialize parameters parameters[W] = tf.get_variable(W, [current_layer_size, previous_layer_size], initializer=tf.contrib.layers.xavier_initializer(seed=1)) parameters[b] = tf.get_variable(b, [current_layer_size, 1], initializer=tf.zeros_initializer()) # Move to next layer i += 1 # Stop before output layer (we don't need parameters for this layer) if i == len(layout) - 1: break return parameters
Notice how we use tf.get_variable
to initialize both W
and b
. b
is just initialized as a zero vector, but we must initialize W
to random values to avoid risking a situation where no nodes will ever fire.
Forward Propagation
Next step is to write forward_propagation(X, parameters)
that takes inputs X
and parameters
to return the activation of the output layer A_output
. The input X
will be our training data while parameters
are the parameters obtained from the initialization function above.
def forward_propagation(X, parameters): """ Arguments: X (float array) : Features (X_train) parameters (dict) : Parameters W and b Returns: A_out (tf.Tensor) : The activation function of the output layer """ # Define a dictionary with the input (activation) layer # to add more activations as we iterate over the layers temp_dict = {"A0": X} # Determine the number of layers to iterate over layer_count = int(len(parameters) / 2) for i in range(1, layer_count+1): # Define layer conditions W = "W" + str(i) b = "b" + str(i) Z = "Z" + str(i) A = "A" + str(i) A_previous = "A" + str(i-1) # Perform forward propagation temp_dict[Z] = tf.add(tf.matmul(tf.cast(temp_dict[A_previous], tf.float32), tf.transpose(parameters[W])),tf.transpose(parameters[b])) # Use a relu activation function for the hidden layers # and a sigmoid activation function for the output layer if i == layer_count: temp_dict[A] = tf.nn.sigmoid(temp_dict[Z]) else: temp_dict[A] = tf.nn.relu(temp_dict[Z]) # Return only the activation of the output layer A_output = temp_dict["A" + str(layer_count)] return A_output
In this code, we have chosen to use a relu activation function tf.nn.relu
for the hidden layers and a sigmoid activation function tf.nn.sigmoid
for the output layer. You are free to play around with this and see what works best for you.
Calculate Cost
Now let’s define a cost function. This function is used by the optimizer in TensorFlow to determine how much the parameters should corrected in each iteration. The inputs for the calculate_cost()
function uses A_output
, y
, and parameters
to return the cost
variable.
def calculate_cost(A_output, y, parameters): """ Arguments: A_out (tf.Tensor) : Activation function of output layer y (float array) : Labels parameters (dict) : Parameters W and b Returns: cost (tf.Tensor) : The activation function of the output layer """ # We will comepare the calculated labels to the actual ones calculated_labels = tf.transpose(A_output) actual_labels = tf.transpose(y) # Determine the cost as cost = tf.reduce_mean(tf.squared_difference(calculated_labels, actual_labels)) return cost
Here we use the squared difference in tf.squared_diffrence
, but it is also possible to calculate the cost in other ways. Another popular choice is tf.nn.softmax_cross_entropy_with_logits
.
Train Parameters (Backward Propagation)
Now it is time to train our model. This function train_parameters(X_train, y_train, X_test, y_test, layout, learning_rate=0.1, epochs=5, batch_size=1000)
will take rather long list of inputs that should be rather self-explanatory. More info about the inputs are provided in the docstring below. This function returns the trained parameters
.
We have already written all the functions we need in a neural network except for backward propagation. This is, however, done very easily in TensorFlow as you do not have to write the code for the back propagation yourself. You simply have to define an optimizer and run it. The code below shows this.
Lets put together everything we have done so far:
def train_parameters(X_train, y_train, X_test, y_test, layout, learning_rate=0.005, epochs=20, batch_size=1000): """ Arguments: X_train (float array) : Train features y_train (float array) : Train labels X_test (float array) : Test features y_test (float array) : Test labels layout (dict) : Layer sizes of the neural network learning_rate (float) : The learning rate for training epochs (int) : Epochs through the data batch_size (int) : Size of each data batch Returns: parameters (dict) : The trained model parameters """ # Reset to default graph to avoid overwriting tf.variables ops.reset_default_graph() # Create placeholders of for tensors X and y (m, features_count) = X_train.shape labels_count = y_train.shape[1] X = tf.placeholder(tf.float32, [None, features_count], name="X") y = tf.placeholder(tf.float32, [None, labels_count], name="y") # Initialize parameters parameters = initialize_parameters(layout) # Do the forward propagation A_output = forward_propagation(X, parameters) # Calculate the cost cost = calculate_cost(A_output, y, parameters) # Define the tensorflow optimizer to do the back propagation optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost) # Initialize all the variables init = tf.global_variables_initializer() # Start the session to compute the tensorflow graph with tf.Session() as sess: # Run the global initializer sess.run(init) # Loop over all epochs while training the model for epoch in range(epochs): epoch_cost = 0 # Split the data into batches of size batch_size for batch in range(int(m / batch_size)): # Get a batch of X and y X_batch = X_train[batch*batch_size:(1 + batch)*batch_size] y_batch = y_train[batch*batch_size:(1 + batch)*batch_size] # Run the optimizer _ , batch_cost = sess.run([optimizer, cost], feed_dict={X: X_batch, y: y_batch}) epoch_cost += batch_cost # Print the cost to check if it's decreasing print("Cost after epoch " + str(epoch+1) + ": " + str(epoch_cost)) print("Training complete!") # Check how many of the predictions were correct #check_predictions = tf.equal(tf.round(A_output), y) check_predictions = tf.equal(tf.argmax(A_output, axis=1), tf.argmax(y, axis=1)) # Check accuracy on train and test set accuracy = tf.reduce_mean(tf.cast(check_predictions, tf.float32)) accuracy_train = accuracy.eval({X: X_train, y: y_train}) accuracy_test = accuracy.eval({X: X_test, y: y_test}) print("Accuracy on the training set: " + str(accuracy_train*100) + " %") print("Accuracy on the test set: " + str(accuracy_test*100) + " %") # Save trained parameters parameters = sess.run(parameters) return parameters
The output in the terminal after running train_parameters
looks like this:
>>> train_parameters(X_train, y_train, X_test, y_test, layout) Cost after epoch 0: 2.29775438644 Cost after epoch 1: 0.968977658078 Cost after epoch 2: 0.84340425767 Cost after epoch 3: 0.536806614138 Cost after epoch 4: 0.252858966356 Cost after epoch 5: 0.208994472632 Cost after epoch 6: 0.17735228478 Cost after epoch 7: 0.156209989451 Cost after epoch 8: 0.143354448723 Cost after epoch 9: 0.132179139298 Cost after epoch 10: 0.123276243918 Cost after epoch 11: 0.119895999436 Cost after epoch 12: 0.112337087048 Cost after epoch 13: 0.107231109519 Cost after epoch 14: 0.0964344530366 Cost after epoch 15: 0.0971446636831 Cost after epoch 16: 0.0877964202664 Cost after epoch 17: 0.077983592113 Cost after epoch 18: 0.0708390149521 Cost after epoch 19: 0.0632036953466 Training complete! Accuracy on the training data: 99.2500007153 % Accuracy on the test data: 96.9714283943 %
This is far from an optimal result on the mnist dataset, but it is sufficient for this guide. You should be able to improve this by playing around with the hyperparameters such as layout
, learning_rate
, epochs
, batch_size
etc. Also, consider adding regularization to the cost!
Predict
In order for the model parameters to be useful, we need a function that can take in new data along with the trained parameters and predict labels for these. This function is called predict(X_predict, parameters)
and returns the predicted label(s).
def predict(X_predict, parameters): """ Arguments: X_predict (float array) : Data for which we want to predict a label parameters (dict) : The trained model parameters Returns: prediction (int array) : Predicted label(s) """ # Reshape X_predict if necessary X_predict = X_predict.reshape([1,len(X_predict)]) if len(X_predict.shape)==1 else X_predict # Create placeholder for X features_count = X_predict.shape[0] if len(X_predict.shape)==1 else X_predict.shape[1] X = tf.placeholder(tf.float32, [None, features_count], name="X") # Do the forward propagation A_output = forward_propagation(X,parameters) # Get the prediction prediction = tf.argmax(A_output, axis=1) # Evaluate the result with tf.Session(): prediction = prediction.eval({X: X_predict}) return prediction
Lets see what we get when we test predict()
:
>>> y_predict = predict(X_test[2], parameters)[0] >>> y_actual = list(y_test[2]).index(1) >>> print("Predicted label: " + str(y_predict) + "\nActual label: " + str(y_actual)) Predicted label: 8 Actual label: 8
It seems like everything works as intended. Now try to predict all labels for X_test
at once:
>>> predict(X_test, parameters) array([1, 8, 8, ..., 0, 8, 4], dtype=int64)
Success!
Summary
That was our guide on how to build a deep neural network in TensorFlow that can be used for image classification. You have learned how to write each of the main functions for a neural network in TensorFlow. We hope that you enjoyed the guide!