Categories
level 2 python

The Best RNN for Image Classification: RNN, LSTM, or GRU?

Recurrent Neural Networks (RNNs) are neural networks that are designed for predicting sequence data. Images are not traditionally seen as sequence data, but can be modeled as such. Today we’re going to be testing out how well three different RNN architectures, Simple RNNs, LSTMs, and GRUs, do on image classification via the MNIST digits dataset.

Overview of a Comparison of RNN Architectures for Image Classification

In this post we will go over the following topics:

  • What is the MNIST Digits Dataset?
  • What are Recurrent Neural Networks?
    • Simple Recurrent Neural Networks
    • Long Short-Term Memory (LSTM) Models
    • Gated Recurrent Unit (GRU) Models
  • Keras and Tensorflow for Building Neural Networks
  • Comparing RNN Models on Image Data Classification
    • Image Classification Accuracy with Simple RNNs
    • Image Classification Accuracy with LSTMs
    • Image Classification Accuracy with GRUs
  • RNN vs LSTM vs GRU on Image Classification Summary

What is the MNIST Digits Dataset?

MNIST Dataset, Image from InteliDig

The MNIST Digits Dataset is a set of 60,000 images of handwritten digits. Each image is 28×28 pixels and labeled with the correct digit. This is a famous dataset for neural networks. It’s a common benchmarking dataset measuring how well a neural network is trained. You can find more information about it on the MNIST Datasets Homepage.

What are Recurrent Neural Networks?

Unfolded RNN Image from Wikipedia

Recurrent Neural Networks are neural networks that contain one or more recurrent layers. Traditional neural networks contain feedforward layers. That means each cell or node in a layer passes its output on to the next layer. A recurrent cell also passes its output to itself some amount of times. Thus, just like a recursive function calls itself, a recurrent cell uses its own output. 

RNNs are perfect for predict sequence data in which outputs of a sequence depend on more than just the one datapoint. This is why RNNs are best known for text data or Natural Language Processing. Learn more about RNNs through the Core Concepts of NLP.

Simple Recurrent Neural Networks

Simple Recurrent Neural Networks are the basic RNN architecture. Cells or nodes used in simple RNNs do not have gates in them. Each layer fully connects to the next layer just like in a traditional neural network. To be classified as a simpler recurrent neural network, a neural network must have at least one recurrent layer. The neural net must also not contain LSTM or GRU layers. A simple recurrent layer can be added to Keras models via the layers.SimpleRNN class.

Long Short-Term Memory (LSTM) Models

Long Short-Term Memory or LSTM models are a variation on the RNN architecture. To classify as an LSTM, a neural network must have at least 1 LSTM layer. At their core, they have the same recurrent behavior. However, LSTM nodes are not like regular RNN nodes. LSTM nodes have three extra gates: the input gate, the output gate, and the forget gate. These extra gates translate into LSTMs having four times as many parameters as simple RNNs. LSTMs deal with the vanishing gradient problem for RNNs. LSTMs can be implemented in Keras via the layers.LSTM or layers.LSTMCell classes.

Gated Recurrent Unit (GRU) Models

Gated Recurrent Units (GRUs) are another variation on the recurrent neural network design. GRU cells are similar to Long Short-Term Memory cells. Unlike a cell or node in a traditional neural network, GRUs also contain gates. They contain an input gate and a forget gate. Unlike LSTMs, GRUs do not contain output gates. GRUs were initially introduced in 2014 as an alternative to LSTMs. They show similar performance most of the time, but have less training parameters. GRUs have been shown to outperform LSTMs on certain data sets such as smaller datasets with lower frequency of data. GRU layers can be added to a neural network in Keras through the layers.GRU class.

Keras and Tensorflow for Building Neural Networks

We’re going to use Keras on Tensorflow to build, train, and test our neural networks. Keras is a high level neural network building API and Tensorflow is a low level API. This means that we can use Tensorflow’s backend while interacting with the Keras interface. What’s the advantage of this? Keras is easier to interact with.

Comparing RNN Models on Image Data Classification

Now that we’ve learned a bit about the three best known RNN types, simple, LSTMs, and GRUs, let’s see how each one performs on image classification. Each of the models are built on Keras with the Sequential model structure. They all have three layers, an input layer corresponding to the architecture type, a batch normalization layer, and an output layer. Each neural network will be trained for 10 epochs. To see how to train each model, see How to Build a Simple RNN in Keras, How to Build an LSTM in Keras, and How to Build a GRU in Keras.

Image Classification Accuracy with Simple RNNs

An RNN with a simple RNN layer of 64 units (an output of 64), a Batch normalization layer following that, and a dense output layer of 10 units has 6858 parameters. It achieves a roughly 96% accuracy on the MNIST dataset after 10 epochs.

Image Classification Accuracy with LSTM Models

An LSTM model set up like our simple RNN model with a 64 cell LSTM layer, a batch normalization layer, and a fully connected output layer has 24714 parameters. It achieves an accuracy of roughly 96% after being trained for 10 epochs.

Image Classification Accuracy with GRU Models

A GRU RNN with the same set up as the LSTM and the simple RNN has 18954 parameters. It has an accuracy of roughly 95% after being trained for 10 epochs.

RNN vs LSTM vs GRU on Image Data Summary

Note that none of the three RNN architectures actually achieved their maximum validation accuracy in the 10th epoch. This tells us that we may not even need to train them for 10 epochs on the MNIST dataset. They all reach similar levels of accuracy with the GRU model being slightly lower.

Further Reading

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 2 python NLP

Long Short-Term Memory (LSTM) in Keras

In December of 2021, we went over How to Build a Recurrent Neural Network from Scratch, How to Build a Neural Network from Scratch in Python 3, and How to Build a Neural Network with Sci-Kit Learn. As a continuation in the Neural Network series, this post is going to go over how to build a simple LSTM model in Keras with Tensorflow.

In this post we’ll use Keras and Tensorflow to create a simple LSTM model, and train and test it on the MNIST dataset. Here are the steps we’ll go through:

  1. What is an LSTM?
  2. Creating a Simple LSTM Neural Network with Keras
    1. Importing the Right Modules
    2. Adding Layers to Your Keras LSTM Model
  3. Training and Testing our LSTM on the MNIST Dataset
    1. Load the MNIST dataset
    2. Compile the Keras LSTM model
    3. Train and Fit the Keras LSTM Model
    4. Test your Keras LSTM Model
  4. Summary of Building a Keras LSTM Module in Python

To follow along, you’ll need to install tensorflow which you can do using the line in the terminal below.

pip install tensorflow 

What is an LSTM?

Long Short Term Memory (LSTM) Cell, Image from Stack Exchange

LSTM stands for “Long Short-Term Memory”. Confusing wording right? An LSTM is actually a kind of RNN architecture. It is, theoretically, a more “sophisticated” Recurrent Neural Network. Instead of just having recurrence, it also has “gates” that regulate information flow through the unit as shown in the image. LSTMs were initially introduced to solve the vanishing gradient problem of RNNs. They are often used over traditional, “simple” recurrent neural networks because they are also more computationally efficient.

Creating a Simple LSTM with Keras

Using Keras and Tensorflow makes building neural networks much easier to build. It’s much easier to build neural networks with these libraries than from scratch. The best reason to build a neural network from scratch is to understand how neural networks work. In practical situations, using a library like Tensorflow is the best approach. It’s straightforward and simple to build a neural network with Tensorflow and Keras, let’s take a look at how to use Keras to build our LSTM.

Importing the Right Modules

The first thing we need to do is import the right modules. For this example, we’re going to be working with tensorflow. We don’t technically need to do the bottom two imports, but they save us time when writing so when we add layers we don’t need to type tf.keras.layers. but can rather just write layers.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Adding Layers to Your Keras LSTM Model

It’s quite easy to build an LSTM in Keras. All that’s really required for an LSTM neural network is that it has to have LSTM cells or at least one LSTM layer. If we add different types of layers and cells, we can still call our neural network an LSTM, but it would be more accurate to give it a mixed name.

To build an LSTM, the first thing we’re going to do is initialize a Sequential model. Afterwards, we’ll add an LSTM layer. This is what makes this an LSTM neural network. Then we’ll add a batch normalization layer and a dense (fully connected) output layer. Next, we’ll print it out to get an idea of what it looks like.

model = keras.Sequential()
model.add(layers.LSTM(64, input_shape=(None, 28)))
model.add(layers.BatchNormalization())
model.add(layers.Dense(10))
print(model.summary())

You’ll see that the LSTM actually has WAY more parameters than the Simple RNN we built with Keras. The LSTM layer has four times the number of parameters as a simple RNN layer. This is because of the gates we talked about earlier.

Keras LSTM parameters

Training and Testing our Keras LSTM on the MNIST Dataset

Now that we’ve built our LSTM let’s see how it does on the MNIST digit dataset. This is the same dataset we tested the Keras RNN and the built from scratch Neural Network on. The MNIST dataset is a classic dataset to train and test neural networks on. It is a set of handwritten digits.

Load the MNIST dataset

The first thing we’ll do is load up the MNIST dataset from Keras. We’ll use the `load_data()` function from the MNIST dataset to load a pre-separated training and testing dataset. After loading the datasets, we’ll normalize our training data by dividing by 255. This is due to the scale of 256 (0 to 255) for the image data. Finally, we’ll set aside 10 test data points.

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
x_validate, y_validate = x_test[:-10], y_test[:-10]
x_test, y_test = x_test[-10:], y_test[-10:]

Compile the LSTM Neural Network

Now that we’ve created our LSTM and loaded up our data, let’s compile our model. We have to compile (or build) or model before we can train or test it. In our model compilation we will specify the loss function, in this case Sparse Categorical Cross Entropy, our optimizer, stochastic gradient descent, and our metric(s), accuracy. We can specify multiple metrics, but we’ll just go with accuracy for this example.

model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer="sgd",
    metrics=["accuracy"],
)	

Train and Fit the Keras LSTM Model

Now that the model is compiled, let’s train the model. To train the model in Keras, we just call the fit function. To use the fit function, we’ll need to pass in the training data for x and y, the validation, the batch_size, and the epochs. For this example, we’ll just train for 10 epochs.

model.fit(
    x_train, y_train, validation_data=(x_test, y_test), batch_size=64, epochs=1
)model.fit(
    x_train, y_train, validation_data=(x_validate, y_validate), batch_size=64, epochs=10
)

Test the Long Short Term Memory Keras Model

We’ve already trained and fit the model. The last thing to do is to test the model. We’ll run our model and use it to predict the sample we set aside earlier. Then, we’ll print out the sample and the correct label.

for i in range(10):
    result = tf.argmax(model.predict(tf.expand_dims(x_test[i], 0)), axis=1)
    print(result.numpy(), y_test[i])

We can see that after 1 epoch (you should really train for more, but once again this is just for an example)

We can see that after 10 epochs we see a pretty good accuracy at about 96%.

Keras LSTM after 10 epochs

Build a Simple LSTM with Keras Summary

In this post we learned how to build, train, and test an LSTM model built using Keras. We also learned that an LSTM is just a fancy RNN with gates. We built a simple sequential LSTM with three layers. Finally, we tested the LSTM we built on the MNIST digits dataset, a cornerstone dataset to test neural networks on.

Further Reading

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly