In December of 2021, we went over How to Build a Recurrent Neural Network from Scratch, How to Build a Neural Network from Scratch in Python 3, and How to Build a Neural Network with Sci-Kit Learn. As a continuation in the Neural Network series, this post is going to go over how to build a simple GRU model in Keras with Tensorflow.
In this post we’ll use Keras and Tensorflow to create a simple GRU model, and train and test it on the MNIST dataset. Here are the steps we’ll go through:
- What is a Gated Recurrence Unit GRU?
- Creating a Simple GRU RNN with Keras
- Importing the Right Modules to Build a GRU in Keras
- Adding Layers to Your Gated Recurrence Unit Model
- Training and Testing our GRU RNN on the MNIST Dataset
- Load the MNIST dataset
- Compile the Gated Recurrence Unit GRU RNN model
- Train and Fit the Model
- Test your Gated Recurrence Unit RNN Model
To follow along, you’ll need to install tensorflow
which you can do using the line in the terminal below.
pip install tensorflow
What is a Gated Recurrence Unit (GRU)?
GRU stands for “Gated Recurrent Unit”. GRUs were introduced in 2014. They’re similar to LSTMs, but simpler. A GRU is basically an LSTM without an output gate. They perform similarly to LSTMs for most tasks but do better on certain tasks with smaller datasets and less frequent data.
Creating a Simple GRU RNN with Keras
Using Keras and Tensorflow makes building neural networks much easier to build. It’s much easier to build neural networks with these libraries than from scratch. The best reason to build a neural network from scratch is to understand how neural networks work. In practical situations, using a library like Tensorflow is the best approach. It’s straightforward and simple to build a neural network with Tensorflow and Keras, let’s take a look at how to use Keras to build our GRU.
Importing the Right Modules for Your Gated Recurrence Unit Model
The first thing we need to do is import the right modules. For this example, we’re going to be working with tensorflow
. We don’t technically need to do the bottom two imports, but they save us time when writing so when we add layers, we don’t need to type tf.keras.layers.
but can rather just write layers
.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
Adding Layers to Your GRU RNN Model
The GRU RNN is a Sequential
Keras model. After initializing our Sequential
model, we’ll need to add in the layers. The first layer we’ll add is the Gated Recurrent Unit layer. Since we’re operating with the MNIST dataset, we have to have an input shape of (28, 28)
. We’ll make this a 64-cell layer. Adding this layer is what makes our model a Gated Recurrent Unit model.
After adding the GRU layer, we’ll add a Batch Normalization layer. Finally, we’ll add a dense layer as output. The dense layer will have 10 units. We have 10 units in our output layer for the same reason we have to have the shape with 28 in the input layer. The MNIST dataset has 10 classifications, so we need 10 output nodes.
model = keras.Sequential()
model.add(layers.GRU(64, input_shape=(28, 28)))
model.add(layers.BatchNormalization())
model.add(layers.Dense(10))
print(model.summary())
You’ll see that the GRU has more parameters than the Simple RNN we built with Keras, but less than the LSTM. Like the LSTM, the GRU has internal gates. Unlike the LSTM, the GRU only has 3 internal gates instead of 4.
Training and Testing our GRU Model on the MNIST Dataset
Now that we’ve built our GRU let’s see how it does on the MNIST digit dataset. This is the same dataset we tested the Keras RNN and the built from scratch Neural Network on. The MNIST dataset is a classic dataset to train and test neural networks on. It is a set of handwritten digits.
Load the MNIST digits dataset
The first thing we need to do to work with the MNIST digits dataset is to load it. We’ll use Keras to load the dataset into a train and test set. Then we’ll normalize the data from it’s 0-255 scale to 0-1. We’ll also split the test data into a test set and a validation set. We will make a test set of 10 samples and use the other 9990 as validation data.
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
x_validate, y_validate = x_test[:-10], y_test[:-10]
x_test, y_test = x_test[-10:], y_test[-10:]
Compile the Keras GRU RNN
Now that we’ve created our GRU and loaded up our data, let’s compile our model. We have to compile (or build) or model before we can train or test it. In our model compilation we will specify the loss function, in this case Sparse Categorical Cross Entropy, our optimizer, stochastic gradient descent, and our metric(s), accuracy. We can specify multiple metrics, but we’ll just go with accuracy for this example.
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer="sgd",
metrics=["accuracy"],
)
Train and Fit the GRU RNN Model
Now that the model is compiled, let’s train the model. To train the model in Keras, we just call the fit
function. To use the fit
function, we’ll need to pass in the training data for x
and y
, the validation, the batch_size
, and the epochs
. For this example, we’ll just train for 10 epochs.
model.fit(
x_train, y_train, validation_data=(x_validate, y_validate), batch_size=64, epochs=10
)
Test the Keras Gated Recurrence Unit Model
Now that we’ve built and trained our GRU RNN, let’s test it. We’ll loop through and test all 10 data points we set aside when we created the test dataset. We’ll print the output of the model vs the actual data.
for i in range(10):
result = tf.argmax(model.predict(tf.expand_dims(x_test[i], 0)), axis=1) print(result.numpy(), y_test[i])
As you can see below, after 10 epochs, the model does quite well at roughly 95% accuracy for both the training and validation data. It predicts all 10 test data points correctly.
Build a Simple GRU RNN with Keras Summary
In this post we learned how to build, train, and test an GRU model built using Keras. We also learned that a GRU is just a fancy RNN with gates. We built a simple sequential GRU with three layers. Finally, we tested the GRU we built on the MNIST digits dataset, a cornerstone dataset to test neural networks on.
Further Reading
- Why Programming is Easy but Software Engineering is Hard
- Python and How to Run Two Functions in Parallel
- Dijkstra’s Algorithm in Python
- Build Your Own AI Text Summarizer
- Asynchronous Requests in Python
I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

hi, I want to program reset gate and update gate in python… thounk you