This is for the actual machine learning enthusiasts who want to know what the code for a neural network in Python looks like. In this post we’re going to build a fully connected deep neural net (DNN) from scratch in Python 3. Before we get started, I just want to say that you don’t need to know how to do this AT ALL to get started with applied machine learning.

This is just for those of you that want to actually understand what’s going on under the hood. **We’re going to be building a neural network from scratch in under 100 lines of code!** This code is adapted from Michael Nielson’s Neural Networks and Deep Learning Book, which was written for Python 2. Michael is way smarter than I am and if you want a more in-depth (math heavy) explanation, I highly suggest reading his book.

In this post we’ll cover:

- Introduction to Neural Network Code in Python
- Overview of the File Structure for Our Neural Network Code in Python 3
- Setting Up Helper Functions

- Building the Neural Network Code from Scratch in Python
- Feed Forward Function
- Gradient Descent

- Backpropagation for Neural Networks
- Feeding Forwards
- Backwards Pass

- Mini-Batch Updating
- Evaluating our Python Neural Network
- Putting All The Neural Network Code in Python Together
- Loading MNIST Data
- Running Tests

- Summary of Building a Python Neural Network from Scratch

You can find the Github Here. To follow along to this tutorial you’ll need to download the `numpy`

Python library. To do so, you can run the following command in the terminal:

`pip install numpy`

## Overview of File Structure for Our Neural Network Code in Python

There will be three files being made here. First, we have the `simple_nn.py`

file which will be outlined in “Setting Up Helper Functions” and “Building the Neural Network from Scratch”. We will also have a file to load the test data called `mnist_loader.py`

, outlined in “Loading MNIST Data”. Finally, we will have a file to test our neural network called `test.py`

that will be run in the terminal. This file is outlined in “Running Tests”.

## Setting Up Helper Functions

At the start of our program we’ll import the only two libraries we need, `random`

, and `numpy`

. We’ve seen `random`

used extensively via the Super Simply Python series in programs like the Random Number Generator, High Low Guessing Game, and Password Generator. We’ll be using the `random`

library to randomize the starting weights in our neural network. We’ll be using `numpy`

or `np`

(by convention it is usually imported as `np`

), to make our calculations faster.

After our imports, we’ll create our two helper functions. A `sigmoid`

function and a `sigmoid_prime`

function. We first learned about the sigmoid function in Introduction to Machine Learning: Logistic Regression. In this program, we’ll be using it as our activation function, the same way as it’s used to do classification in Logistic Regression. The `sigmoid_prime`

function is the derivative and is used in backpropagation to calculate the `delta`

or gradient.

```
import random
import numpy as np
# helpers
def sigmoid(z):
return 1.0/(1.0+np.exp(-z))
def sigmoid_prime(z):
return sigmoid(z)*(1-sigmoid(z))
```

# Building the Neural Network Code in Python from Scratch

This entire section is dedicated to building a fully connected neural network. All of the functions that follow will be under the network class. The full class code will be provided at the end of this section. The first thing we’ll do in our `Network`

class is create the constructor.

The constructor takes one parameter, `sizes`

. The `sizes`

variable is a list of numbers that indicates the number of input nodes at each layer in our neural network. In our `__init__`

function, we initialize four attributes. The number of layers, `num_layers`

, is set to the length of the `sizes`

and the list of the sizes of the layers is set to the input variables, `sizes`

. Next, the initial biases of our network are randomized for each layer after the input layer. Finally, the weights connecting each node are randomized for each connection between the input and output layers. For context, `np.random.randn()`

returns a random sample from the normal distribution.

```
class Network:
# sizes is a list of the number of nodes in each layer
def __init__(self, sizes):
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x) for x,y in zip(sizes[:-1], sizes[1:])]
```

## Feedforward Function

The `feedforward`

function is the function that sends information forward in the neural network. This function will take one parameter, `a`

, representing the current activation vector. This function loops through all the biases and weights in the network and calculates the activations at each layer. The `a`

returned is the activations of the last layer, which is the prediction.

```
def feedforward(self, a):
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a) + b)
return a
```

## Gradient Descent

Gradient Descent is the workhorse of our `Network`

class. In this version, we’re doing an altered version of gradient descent known as mini-batch (stochastic) gradient descent. This means that we’re going to update our model using a mini-batch of data points. This function takes four mandatory parameters and one optional parameter. The four mandatory parameters are the set of training data, the number of epochs, the size of the mini-batches, and the learning rate (`eta`

). We can optionally provide test data. When we test this network later, we will provide test data.

This function starts off by converting the `training_data`

into a list type and setting the number of samples to the length of that list. If the test data is passed in, we do the same to that. This is because these are not returned to us as lists, but `zip`

s of lists. We’ll see more about this when we load the MNIST data samples later. Note that this type-casting isn’t strictly necessary if we can ensure that we pass both types of data in as lists.

Once we have the data, we loop through the number of training epochs. A training epoch is simply one round of training the neural network. In each epoch, we start by shuffling the data to ensure randomness, then we create a list of mini-batches. For each mini-batch, we’ll call the `update_mini_batch`

method, which is covered below. If the test data is there, we’ll also return the test accuracy.

```
def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
training_data = list(training_data)
samples = len(training_data)
if test_data:
test_data = list(test_data)
n_test = len(test_data)
for j in range(epochs):
random.shuffle(training_data)
mini_batches = [training_data[k:k+mini_batch_size]
for k in range(0, samples, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print(f"Epoch {j}: {self.evaluate(test_data)} / {n_test}")
else:
print(f"Epoch {j} complete")
```

## Backpropagation for Neural Networks

Backpropagation is the updating of all the weights and biases after we run a training epoch. We use all the mistakes the network makes to update the weights. Before we actually create the backpropagation function, let’s create a helper function called `cost_derivative`

. The `cost_derivative`

function will determine if we made a mistake in our output layer. It takes two parameters, the `output_activations`

array and the expected output values, `y`

.

```
def cost_derivative(self, output_activations, y):
return(output_activations - y)
```

### Feeding Forwards

Now we’re ready to do backpropagation. Our `backprop`

function will take two values, `x`

, and `y`

. The first thing we’ll do is initialize our nablas or 𝛁 to 0 vectors. This symbol represents the gradients. We also need to keep track of our current activation vector, `activation`

, all of the activation vectors, `activations`

, and the z-vectors, `zs`

. The first activation is the input layer.

After setting these up, we’ll loop through all the biases and weights. In each loop we calculate the `z`

vector as the dot product of the weights and activation, add that to the list of `zs`

, recalculate the activation, and then add the new activation to the list of `activations`

.

### Backward Pass

Now comes the calculus. We start our backward pass by calculating the delta, which is equal to the error from the last layer multiplied by the `sigmoid_prime`

of the last entry of the `zs`

vectors. We set the last layer of `nabla_b`

as the delta and the last layer of `nabla_w`

equal to the dot product of the delta and the second to last layer of activations (transposed so we can actually do the math). After setting these last layers up, we do the same thing for each layer going backwards starting from the second to last layer. Finally, we return the `nabla`

s as a tuple.

```
def backprop(self, x, y):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # stores activations layer by layer
zs = [] # stores z vectors layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation) + b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
for _layer in range(2, self.num_layers):
z = zs[-_layer]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-_layer+1].transpose(), delta) * sp
nabla_b[-_layer] = delta
nabla_w[-_layer] = np.dot(delta, activations[-_layer-1].transpose())
return (nabla_b, nabla_w)
```

## Mini-Batch Updating

Mini-batch updating is part of our `SGD`

(stochastic) gradient descent function from earlier. I went back and forth on where to place this function since it’s used in `SGD`

but also requires `backprop`

. In the end I decided to put it down here. It starts much the same way as our `backprop`

function by creating 0 vectors of the `nabla`

s for the biases and weights. It takes two parameters, the `mini_batch`

, and the learning rate, `eta`

.

Then, for each input, `x`

, and output, `y`

, in the `mini_batch`

, we get the delta of each nabla array via the `backprop`

function. Next, we update the `nabla`

lists with these deltas. Finally, we update the weights and biases of the network using the `nablas`

and the learning rate. Each value is updated to the current value minus the learning rate divided by the size of the minibatch times the nabla value.

```
def update_mini_batch(self, mini_batch, eta):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
```

## Evaluating our Python Neural Network

The last function we need to write is the `evaluate`

function. This function takes one parameter, the `test_data`

. In this function, we simply compare the network’s outputs with the expected output, `y`

. The network’s outputs are calculated by feeding forward the input, `x`

.

```
def evaluate(self, test_data):
test_results = [(np.argmax(self.feedforward(x)), y)
for (x, y) in test_data]
return sum(int(y[x]) for (x, y) in test_results)
```

## Putting All the Neural Network Code in Python Together

Here’s what it looks like when we put all the code together.

```
import random
import numpy as np
# helpers
def sigmoid(z):
return 1.0/(1.0+np.exp(-z))
def sigmoid_prime(z):
return sigmoid(z)*(1-sigmoid(z))
class Network:
# sizes is a list of the number of nodes in each layer
def __init__(self, sizes):
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x) for x,y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, a):
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a) + b)
return a
def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
training_data = list(training_data)
samples = len(training_data)
if test_data:
test_data = list(test_data)
n_test = len(test_data)
for j in range(epochs):
random.shuffle(training_data)
mini_batches = [training_data[k:k+mini_batch_size]
for k in range(0, samples, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print(f"Epoch {j}: {self.evaluate(test_data)} / {n_test}")
else:
print(f"Epoch {j} complete")
def cost_derivative(self, output_activations, y):
return(output_activations - y)
def backprop(self, x, y):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # stores activations layer by layer
zs = [] # stores z vectors layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation) + b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
for _layer in range(2, self.num_layers):
z = zs[-_layer]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-_layer+1].transpose(), delta) * sp
nabla_b[-_layer] = delta
nabla_w[-_layer] = np.dot(delta, activations[-_layer-1].transpose())
return (nabla_b, nabla_w)
def update_mini_batch(self, mini_batch, eta):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
def evaluate(self, test_data):
test_results = [(np.argmax(self.feedforward(x)), y)
for (x, y) in test_data]
return sum(int(y[x]) for (x, y) in test_results)
```

## Testing our Neural Network

Great, now that we’ve written our Neural Network, we have to test it. We’ll test it using the MNIST dataset. You can download the dataset (and original Python 2.7 code) here.

## Loading MNIST Data

The MNIST data comes in a `.pkl.gz`

file type that we’ll use `gzip`

to open and `pickle`

to load. Let’s create a simple function to load this data as a tuple of size 3 split into the training, validation, and test data. To make our data easier to handle, we’ll create another function to encode the `y`

into an array of size 10. The array will contain all 0s except for a 1 which corresponds to the correct digit of the image.

To load our data into a usable format, we’ll use the simple `load_data`

function we created and the `one_hot_encode`

functions. We will create another function that will transform our `x`

values into a list of size 784, corresponding to the 784 pixels in the image, and our `y`

values into their one hot encoded vector form. Then we’ll zip these `x`

and `y`

values together so that each index corresponds to the other. We need to do this for the training, validation, and test data sets. Finally, we return the modified data.

```
import pickle
import gzip
import numpy as np
def load_data():
with gzip.open('mnist.pkl.gz', 'rb') as f:
training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
return (training_data, validation_data, test_data)
def one_hot_encode(y):
encoded = np.zeros((10, 1))
encoded[y] = 1.0
return encoded
def load_data_together():
train, validate, test = load_data()
train_x = [np.reshape(x, (784, 1)) for x in train[0]]
train_y = [one_hot_encode(y) for y in train[1]]
training_data = zip(train_x, train_y)
validate_x = [np.reshape(x, (784, 1)) for x in validate[0]]
validate_y = [one_hot_encode(y) for y in validate[1]]
validation_data = zip(validate_x, validate_y)
test_x = [np.reshape(x, (784, 1)) for x in test[0]]
test_y = [one_hot_encode(y) for y in test[1]]
testing_data = zip(test_x, test_y)
return (training_data, validation_data, testing_data)
```

## Running Tests

To run tests, we’ll create another file that will import both the neural network we created earlier (`simple_nn`

) and the MNIST data set loader (`mnist_loader`

). All we have to do in this file is load the data, create a `Network`

which has an input layer of size 784 and an output layer of size 10, and run the network’s `SGD`

function on the training data and test with the test data. Note that it doesn’t matter what any of the values in between 784 and 10 are for our list of input layers. Only the input size and output size are set, we can adjust the rest of the layers however we like. We don’t need 3 layers, we could also have 4 or 5, or even just 2. Play around with it and have fun.

```
import simple_nn
import mnist_loader
training_data, validation_data, test_data = mnist_loader.load_data_together()
net = simple_nn.Network([784, 30, 10])
net.SGD(training_data, 10, 10, 3.0, test_data=test_data)
```

When we run our test, we should see something like the following image:

## Summary of Building a Python Neural Network from Scratch

In this post we build a neural network from scratch in Python 3. We covered not only the high level math, but also got into the implementation details. First, we implemented helper functions. The `sigmoid`

and `sigmoid_prime`

functions are central to operating the neurons.

Next, we implemented the core operation for feeding data into the neural network, the feedforward function. Then, we wrote the work horse of our neural network in Python, the gradient descent function. Gradient descent is what allows our neural network to find “local minima” to optimize it’s weights and biases.

After gradient descent, we wrote the backpropagation function. This function allows the neural network to “learn” by providing updates when the outputs don’t match the correct labels. Finally, we tested our built-from-scratch Python neural network on the MNIST data set. It ran well!

## Further Reading

- Build a GRU RNN in Keras
- Why Programming is Easy but Software Engineering is Hard
- Five Easy Projects for Python Beginners
- Build Your Own AI Text Summarizer in Python
- Nested Lists in Python

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

## 16 thoughts on “Neural Network Code in Python 3 from Scratch”