Build a Convolutional Neural Network

2023-06-01
6 min read

To understand the basics of image-based machine learning and create a process that you can later analyze in memory, you must first build a machine learning model. You will be using PyTorch to create a model for the CIFAR10 dataset.

  1. Create a Python file and import the necessary libraries.
import torch
import torchvision
import torchvision.transforms as transforms
  1. Download the CIFAR10 train and test datasets and normalize the images.
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
testset  = torchvision.datasets.CIFAR10(root='./data', train=False,
                                        download=True, transform=transform)
  1. For learning purposes, use a 80/10/10 split for the model. This means that 80% of the data will be used to train the model, 10% will be used to validate and tweak hyperparameters, and 10% will be used for testing. PyTorch has already split the CIFAR10 dataset 5:1, so you must merge the default train and test sets in order to achieve the desired split.

Note: It is only acceptable to do this because the default sets are identical in format (i.e., they both come labeled).

from math import floor
from torch.utils.data import random_split
dataset  = torch.utils.data.ConcatDataset([trainset, testset])

train_size = floor(len(dataset) * 0.8)  # 48000 training images
valid_size = floor(len(dataset) * 0.1)  #  6000 validation images
test_size  = valid_size                 #  6000 testing images

trainset, validset, testset = random_split(dataset, [train_size, valid_size, test_size])

  1. The final preparation step is to initialize a DataLoader for each set. These are iterators that will feed batches of data into the model.
batch_size = 4                  # number of images per training iteration

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)
validloader = torch.utils.data.DataLoader(validset, batch_size=batch_size,
                                          shuffle=False, num_workers=2)
testloader  = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                          shuffle=False, num_workers=2)
  1. Next, you will build a CNN that takes in 3-channel images (RGB) and outputs 10 activation nodes (classes/types). The new class should inherit from nn.Module and overwrite the forward method.

In the _init_ function, define fields (conv1, pool, etc.) that are callable and represent layers and transformations the model should be able to perform. At a high level, here is what they do:

Tranformation Description
Conv2d applies a 2d convolution over an input image, looking for local features
MaxPool2d applies a 2d max pooling over an input, compressing the input
Linear applies a linear transformation, compressing the input
relu applies an activation function (rectified linear unit), ignores negative values
flatten flattens the dimensions of the input signal

Learn more: https://pytorch.org/vision/stable/index.html

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)             # in = 3 (RGB), out = 6 (features expected), kernel_size = 5x5
        self.pool  = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.lin1  = nn.Linear(16 * 5 * 5, 10)      # out * kernel_size from previous layer -> 10

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))        # conv1(32x32x3) -> pool(28x28x6) -> 14x14x6
        x = self.pool(F.relu(self.conv2(x)))        # conv2(14x14x6) -> pool(10x10x16) -> 5x5x16
        x = torch.flatten(x, 1)                     # flatten(5x5x16) -> 400
        x = self.lin1(x)                            # lin1(400) -> 10

        return x

# create a CNN instance
net = Net()

The ordering above is nothing special. Convolution -> Activation Function -> Pooling is a standard sequence for training an image model.

  1. Now you must define the loss function, hyperparameters, and the optimizer. The loss function is used to quantify the error of the model’s predictions. Hyperparameters are parameters that you the human, rather than the model, are in control of. Finally, the optimizer is the algorithm that, given the loss, updates the neural network parameters to improve accuracy.
import torch.optim as optim
# loss function
criterion = nn.CrossEntropyLoss()

# hyperparamters
lr = 0.001              # learning rate: the magnitude with which parameters are adjusted
momentum = 0.9          # momentum: the level of inertia/memory present in the optimizer
epochs = 2              # epochs: the number of times to run through the training data

optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)
  1. The step you’ve been waiting for. This is where everything you’ve written so far comes together to train and validate the performance of the model.
for epoch in range(epochs):

    # TRAIN
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        images, labels = data
        optimizer.zero_grad()                         # zero the parameter gradients so they do not accumulate

        # forward + backward + optimize
        outputs = net(images)
        loss = criterion(outputs, labels)             # compute loss at the output step
        loss.backward()                               # compute loss at parameter level for each layer
                                                      # back propogation of error through graph of funcs
        optimizer.step()                              # updates the parameters based on loss

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0


    # VALIDATE
    total = 0
    correct = 0
    for i, data in enumerate(validloader, 0):

        images, labels = data
        outputs = net(images)

        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'\nHyperparamters:\n\tbatch_size = {batch_size}\n\tlearning_rate = {lr}\n\tmomentum = {momentum}\n\tepochs = {epochs}')
    print(f'\nAccuracy of the network on {len(validset)} validation images: {100 * correct // total} %')

print('Finished Training and Validating')

OUTPUT:

[1,  2000] loss: 1.968
[1,  4000] loss: 1.651
[1,  6000] loss: 1.516
[1,  8000] loss: 1.449
[1, 10000] loss: 1.409
[1, 12000] loss: 1.390

Hyperparamters:
    batch_size = 4
    learning_rate = 0.001
    momentum = 0.9
    epochs = 2

Accuracy of the network on 6000 validation images: 50 %
[2,  2000] loss: 1.306
[2,  4000] loss: 1.314
[2,  6000] loss: 1.305
[2,  8000] loss: 1.292
[2, 10000] loss: 1.288
[2, 12000] loss: 1.279

Hyperparamters:
	batch_size = 4
	learning_rate = 0.001
	momentum = 0.9
	epochs = 2

Accuracy of the network on 6000 validation images: 53 %
Finished Training and Validating

Note: With these validation results, now would be the time to iteratively adjust hyperparameters and retrain + revalidate to observe better performance.

  1. In this step, you test the trained model on novel data it has never seen before.
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)       # calculate outputs by running images through the network

        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on {len(testset)} test images: {100 * correct // total} %')

OUTPUT:

Accuracy of the network on 6000 test images: 53 %

Given this dataset has 10 classes, random prediction-making would converge to 10% accuracy. At a 53% accuracy, it is obvious the model has learned. If you so desire, continue to tweak hyperparameters and model characteristics to achieve better performance.