Here is my story: I recently gave a university tutoring class to MSc students on deep learning. Specifically, it was about training their first multi-layer perceptron (MLP) in Pytorch. I was literally stunned from their questions as beginners in the field. At the same time, I resonated with their struggles and reflected back to being a beginner myself. That’s what this blogpost is all about.If you are used to numpy, tensorflow or if you want to deepen your understanding in deep learning, with a hands-on coding tutorial, hop in.We will train our very first model called Multi-Layer Perceptron (MLP) in pytorch while explaining the design choices. Code is available on github.Shall we begin?
“`html
Imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
“`
The torch.nn package contains all the required layers to train our neural network. The layers need to be instantiated first and then called using their instances. During initialization we specify all our trainable components. The weights typically live in a class that inherits the torch.nn.Module class. Alternatives include the torch.nn.Sequential or the torch.nn.ModuleList class, which also inherit the torch.nn.Module class. Layers classes typically start with a capital letter even if they don’t have any trainable parameters so feel like declaring them like:
“`html
The torch.nn.functional contains all the functions that can be called directly without prior initialization. Mosttorch.nn modules have their corresponding mapping in a functional module like: A very handy example of a function I often use is the normalize function:
“`
Device: GPU
“`html
Students despise using the GPU. They don’t see any reason to since they are only using tiny toy datasets. I advise them to think in terms of scaling up the models and the data, but I can see it’s not that obvious in the beginning. My solution was to assign them to train a resnet18 in 100K image dataset in google colab. device = ‘cuda:0’ if torch.cuda.is_available() else ‘cpu’
print(‘device:’, device)
“`
There is one and only one reason we use the GPU: speed. The same model can be trained much much faster in a high-end GPU.
“`html
Nonetheless, we want to have the option to switch to the CPU execution of our notebook/script, by declaring a “device” variable at the top. Why? Well, for debugging! It’s quite common to have GPU-related errors, which are actually simple logical errors, but because the code is executed on the GPU, pytorch is not able to trace back the error properly. Examples may include slicing errors, like assigning a tensor of wrong shape to a slice of another tensor.
“`
The solution is to run the code on the CPU instead. You will probably get a more accurate error message.
“`html
GPU message example: RuntimeError: CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
CPU message example: Index 256 is out of bounds
“`
Image transforms
We will use an image dataset called CIFAR10 so we will need to specify how the data will be fed in the network.
“`html
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))])
“`
The rest of the content follows along in similar fashion, breaking down the components of training a multi-layer perceptron model. It goes into detail about the importance of GPU usage for speed, input normalization, the CIFAR10 image dataset, the ideal data splits in machine learning, the dataloader class, building a variable size MLP, writing the validation and training loops, design choices when training an MLP neural network, and then concludes with further steps and resources for continued learning. All the content is punctuated with the relevant code snippets and HTML tags.