Categories
level 2 python

Build a GRU RNN in Keras

In December of 2021, we went over How to Build a Recurrent Neural Network from Scratch, How to Build a Neural Network from Scratch in Python 3, and How to Build a Neural Network with Sci-Kit Learn. As a continuation in the Neural Network series, this post is going to go over how to build a simple GRU model in Keras with Tensorflow.

In this post we’ll use Keras and Tensorflow to create a simple GRU model, and train and test it on the MNIST dataset. Here are the steps we’ll go through:

  1. What is a Gated Recurrence Unit GRU?
  2. Creating a Simple GRU RNN with Keras
    1. Importing the Right Modules to Build a GRU in Keras
    2. Adding Layers to Your Gated Recurrence Unit Model
  3. Training and Testing our GRU RNN on the MNIST Dataset
    1. Load the MNIST dataset
    2. Compile the Gated Recurrence Unit GRU RNN model
    3. Train and Fit the Model
    4. Test your Gated Recurrence Unit RNN Model

To follow along, you’ll need to install tensorflow which you can do using the line in the terminal below.

pip install tensorflow 

What is a Gated Recurrence Unit (GRU)?

GRU RNN Cell Image from Wikipedia

GRU stands for “Gated Recurrent Unit”. GRUs were introduced in 2014. They’re similar to LSTMs, but simpler. A GRU is basically an LSTM without an output gate. They perform similarly to LSTMs for most tasks but do better on certain tasks with smaller datasets and less frequent data.

Creating a Simple GRU RNN with Keras

Using Keras and Tensorflow makes building neural networks much easier to build. It’s much easier to build neural networks with these libraries than from scratch. The best reason to build a neural network from scratch is to understand how neural networks work. In practical situations, using a library like Tensorflow is the best approach. It’s straightforward and simple to build a neural network with Tensorflow and Keras, let’s take a look at how to use Keras to build our GRU.

Importing the Right Modules for Your Gated Recurrence Unit Model

The first thing we need to do is import the right modules. For this example, we’re going to be working with tensorflow. We don’t technically need to do the bottom two imports, but they save us time when writing so when we add layers, we don’t need to type tf.keras.layers. but can rather just write layers.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Adding Layers to Your GRU RNN Model

The GRU RNN is a Sequential Keras model. After initializing our Sequential model, we’ll need to add in the layers. The first layer we’ll add is the Gated Recurrent Unit layer. Since we’re operating with the MNIST dataset, we have to have an input shape of (28, 28). We’ll make this a 64-cell layer. Adding this layer is what makes our model a Gated Recurrent Unit model.

After adding the GRU layer, we’ll add a Batch Normalization layer. Finally, we’ll add a dense layer as output. The dense layer will have 10 units. We have 10 units in our output layer for the same reason we have to have the shape with 28 in the input layer. The MNIST dataset has 10 classifications, so we need 10 output nodes.

model = keras.Sequential()
model.add(layers.GRU(64, input_shape=(28, 28)))
model.add(layers.BatchNormalization())
model.add(layers.Dense(10))
print(model.summary())

You’ll see that the GRU has more parameters than the Simple RNN we built with Keras, but less than the LSTM. Like the LSTM, the GRU has internal gates. Unlike the LSTM, the GRU only has 3 internal gates instead of 4.

Weights for a GRU RNN

Training and Testing our GRU Model on the MNIST Dataset

Now that we’ve built our GRU let’s see how it does on the MNIST digit dataset. This is the same dataset we tested the Keras RNN and the built from scratch Neural Network on. The MNIST dataset is a classic dataset to train and test neural networks on. It is a set of handwritten digits.

Load the MNIST digits dataset

The first thing we need to do to work with the MNIST digits dataset is to load it. We’ll use Keras to load the dataset into a train and test set. Then we’ll normalize the data from it’s 0-255 scale to 0-1. We’ll also split the test data into a test set and a validation set. We will make a test set of 10 samples and use the other 9990 as validation data.

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
x_validate, y_validate = x_test[:-10], y_test[:-10]
x_test, y_test = x_test[-10:], y_test[-10:]

Compile the Keras GRU RNN

Now that we’ve created our GRU and loaded up our data, let’s compile our model. We have to compile (or build) or model before we can train or test it. In our model compilation we will specify the loss function, in this case Sparse Categorical Cross Entropy, our optimizer, stochastic gradient descent, and our metric(s), accuracy. We can specify multiple metrics, but we’ll just go with accuracy for this example.

model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer="sgd",
    metrics=["accuracy"],
)	

Train and Fit the GRU RNN Model

Now that the model is compiled, let’s train the model. To train the model in Keras, we just call the fit function. To use the fit function, we’ll need to pass in the training data for x and y, the validation, the batch_size, and the epochs. For this example, we’ll just train for 10 epochs.

model.fit(
    x_train, y_train, validation_data=(x_validate, y_validate), batch_size=64, epochs=10
)

Test the Keras Gated Recurrence Unit Model

Now that we’ve built and trained our GRU RNN, let’s test it. We’ll loop through and test all 10 data points we set aside when we created the test dataset. We’ll print the output of the model vs the actual data.

for i in range(10):
    result = tf.argmax(model.predict(tf.expand_dims(x_test[i], 0)), axis=1)    print(result.numpy(), y_test[i])

As you can see below, after 10 epochs, the model does quite well at roughly 95% accuracy for both the training and validation data. It predicts all 10 test data points correctly.

Keras GRU RNN Output after 10 Epochs

Build a Simple GRU RNN with Keras Summary

In this post we learned how to build, train, and test an GRU model built using Keras. We also learned that a GRU is just a fancy RNN with gates. We built a simple sequential GRU with three layers. Finally, we tested the GRU we built on the MNIST digits dataset, a cornerstone dataset to test neural networks on.

Further Reading

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.