Build a Convolutional Neural Network
To understand the basics of image-based machine learning and create a process that you can later analyze in memory, you must first build a machine learning model. You will be using PyTorch to create a model for the CIFAR10 dataset.
- Create a Python file and import the necessary libraries.
import torch
import torchvision
import torchvision.transforms as transforms
- Download the CIFAR10 train and test datasets and normalize the images.
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
- For learning purposes, use a 80/10/10 split for the model. This means that 80% of the data will be used to train the model, 10% will be used to validate and tweak hyperparameters, and 10% will be used for testing. PyTorch has already split the CIFAR10 dataset 5:1, so you must merge the default train and test sets in order to achieve the desired split.
Note: It is only acceptable to do this because the default sets are identical in format (i.e., they both come labeled).
from math import floor
from torch.utils.data import random_split
dataset = torch.utils.data.ConcatDataset([trainset, testset])
train_size = floor(len(dataset) * 0.8) # 48000 training images
valid_size = floor(len(dataset) * 0.1) # 6000 validation images
test_size = valid_size # 6000 testing images
trainset, validset, testset = random_split(dataset, [train_size, valid_size, test_size])
- The final preparation step is to initialize a DataLoader for each set. These are iterators that will feed batches of data into the model.
batch_size = 4 # number of images per training iteration
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=2)
validloader = torch.utils.data.DataLoader(validset, batch_size=batch_size,
shuffle=False, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False, num_workers=2)
- Next, you will build a CNN that takes in 3-channel images (RGB) and outputs 10 activation nodes (classes/types). The new class should inherit from nn.Module and overwrite the forward method.
In the _init_ function, define fields (conv1, pool, etc.) that are callable and represent layers and transformations the model should be able to perform. At a high level, here is what they do:
| Tranformation | Description |
|---|---|
| Conv2d | applies a 2d convolution over an input image, looking for local features |
| MaxPool2d | applies a 2d max pooling over an input, compressing the input |
| Linear | applies a linear transformation, compressing the input |
| relu | applies an activation function (rectified linear unit), ignores negative values |
| flatten | flattens the dimensions of the input signal |
Learn more: https://pytorch.org/vision/stable/index.html
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5) # in = 3 (RGB), out = 6 (features expected), kernel_size = 5x5
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.lin1 = nn.Linear(16 * 5 * 5, 10) # out * kernel_size from previous layer -> 10
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) # conv1(32x32x3) -> pool(28x28x6) -> 14x14x6
x = self.pool(F.relu(self.conv2(x))) # conv2(14x14x6) -> pool(10x10x16) -> 5x5x16
x = torch.flatten(x, 1) # flatten(5x5x16) -> 400
x = self.lin1(x) # lin1(400) -> 10
return x
# create a CNN instance
net = Net()
The ordering above is nothing special. Convolution -> Activation Function -> Pooling is a standard sequence for training an image model.
- Now you must define the loss function, hyperparameters, and the optimizer. The loss function is used to quantify the error of the model’s predictions. Hyperparameters are parameters that you the human, rather than the model, are in control of. Finally, the optimizer is the algorithm that, given the loss, updates the neural network parameters to improve accuracy.
import torch.optim as optim
# loss function
criterion = nn.CrossEntropyLoss()
# hyperparamters
lr = 0.001 # learning rate: the magnitude with which parameters are adjusted
momentum = 0.9 # momentum: the level of inertia/memory present in the optimizer
epochs = 2 # epochs: the number of times to run through the training data
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)
- The step you’ve been waiting for. This is where everything you’ve written so far comes together to train and validate the performance of the model.
for epoch in range(epochs):
# TRAIN
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
images, labels = data
optimizer.zero_grad() # zero the parameter gradients so they do not accumulate
# forward + backward + optimize
outputs = net(images)
loss = criterion(outputs, labels) # compute loss at the output step
loss.backward() # compute loss at parameter level for each layer
# back propogation of error through graph of funcs
optimizer.step() # updates the parameters based on loss
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
# VALIDATE
total = 0
correct = 0
for i, data in enumerate(validloader, 0):
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'\nHyperparamters:\n\tbatch_size = {batch_size}\n\tlearning_rate = {lr}\n\tmomentum = {momentum}\n\tepochs = {epochs}')
print(f'\nAccuracy of the network on {len(validset)} validation images: {100 * correct // total} %')
print('Finished Training and Validating')
OUTPUT:
[1, 2000] loss: 1.968
[1, 4000] loss: 1.651
[1, 6000] loss: 1.516
[1, 8000] loss: 1.449
[1, 10000] loss: 1.409
[1, 12000] loss: 1.390
Hyperparamters:
batch_size = 4
learning_rate = 0.001
momentum = 0.9
epochs = 2
Accuracy of the network on 6000 validation images: 50 %
[2, 2000] loss: 1.306
[2, 4000] loss: 1.314
[2, 6000] loss: 1.305
[2, 8000] loss: 1.292
[2, 10000] loss: 1.288
[2, 12000] loss: 1.279
Hyperparamters:
batch_size = 4
learning_rate = 0.001
momentum = 0.9
epochs = 2
Accuracy of the network on 6000 validation images: 53 %
Finished Training and Validating
Note: With these validation results, now would be the time to iteratively adjust hyperparameters and retrain + revalidate to observe better performance.
- In this step, you test the trained model on novel data it has never seen before.
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images) # calculate outputs by running images through the network
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy of the network on {len(testset)} test images: {100 * correct // total} %')
OUTPUT:
Accuracy of the network on 6000 test images: 53 %
Given this dataset has 10 classes, random prediction-making would converge to 10% accuracy. At a 53% accuracy, it is obvious the model has learned. If you so desire, continue to tweak hyperparameters and model characteristics to achieve better performance.