Part2 lesson 9, 03_minibatch_training | fastai 2019 course -v3

Follow Jul 14, 2020 · 8 mins read
Part2 lesson 9, 03_minibatch_training | fastai 2019 course -v3
Share this

Since fastai instructor including many people who studied fastai highly emphasized to copy the code from scratch, I’ve been using that approach.
It was fine with Part1, but from Part2, applying same strategy which was trying to copy nbs from blank cell without peeking up original was not enough.
First, there were lots of sub-topics worth to dig in, since we’ve moved to bottom-up phase. And I couldn’t inspect deep inside of each topic(sometimes even don’t know what I was doing) only to replicating the code. It would be okay with part1 session, but seemed insufficient with part2
Second, literally it was too difficult to rely on my own memory. I should have open and close up the Jeremy’s too many times.
So, I thought it would be great if I could draw overall picture of each notebook, making replicating process easier. And it was. (At least with 01_matmul, 02_fully_connected, 02b_initializing)
So I share my own supplementary, hoping it could help you to replicate code much easily and grasp the lesson that Jeremy intended.

From this notebook, I learned how to re-define our NN model and update training process.

Original notebook link

My replicated Notebook Link

Initial Setup


  1. Fastai for Colab env
  2. import data
    1. Hyper parames
      1. (Number of input size = data size
      2. Number of output unit = value
      3. Number of hidden unit = 50
      4. Number of hidden size = 1
    2. Initialize the model (1st)
      1. init with size of nodes
      2. instance call with independent variable(s), returning prediction

Loss function: Cross entropy loss

  1. Softmax
    1. Write down the formula of softmax
    2. Why do we need this softmax function at the last layer?
    3. Write down softmax as code.
    4. Why do I need a log of softmax?
  2. Cross Entropy
    1. Write down the formula of cross entropy
    2. Why is this function adequate to the loss of categorical variables?
  3. Negative log likelihood function 1
    1. Implement the negative log likelihood function
    2. Why do we find a negative value
  4. LogSumExp
    1. What is LogSumExp
    2. Why is this dubbed ‘trick’ somehow?
    3. implement logsumexp function
    4. compare 3 with torch’s logsumexp
    5. re-implement log_softmax function using torch’s logsumexp method
    6. compare the loss function with 4 and pre-version

Basic Training Loop

  1. write down the procedure of general training loop (4 steps)
  2. define accuracy/loss function.
  3. grab one batch and test accuracy/loss function
  4. do the above step for all dataset with epoch = 1
  5. check loss/accuracy with trained model

Using parameters and optimal


we will use nn.Module.__setattr__ and move relu to functional. 2

  1. Re-define Model class using nn.Module. (2nd Model)
    1. What’s difference from that you made first?
  2. See the layers inside model using named_children method
    1. see one layer
  3. Re-define fit function with using parameters, not layers. (2nd Training Loop)
    1. What’s difference between nn.Module’s layer / parameters?
    2. Why this diff makes shorter code?
  4. Make DummyModule class to simulate pytorch’s __setattr__ (3rd Model)
    1. 3 dunder method
      1. Init
      2. setattr
      3. repr
    2. parameter funciton generates each every parameters in each layers
    3. What does the setattr do here?
      1. How can i check if I defined parameters properly?
    4. What does the repr do here?
  5. Call the instance of dummymodule and see repr and shape of parameters in instance

Registering modules

  1. Re-define Model class (4th Model)
    1. Use original layers approach
    2. Register the modules for every each layer using add_module
  2. What is registering modules?
  3. See the model instance (i.e. repr)


  1. Define class SequentialModel (5th Model)
    1. Use nn.ModuleList
  2. What does nn.ModuleList do for us?


  1. Make model instance using nn.Sequential (6th Model)
  2. Why does this class make jobs easier?


  1. Define class Optimizer
    1. Init - params, lr
    2. step
      • why do we need torch.no_grad()?
    3. zero_grad
  2. Do the one epoch learning with optimizer instance (3rd Training Loop)
  3. See the loss and accuracy 3
  4. Do the one epoch learning using PyTorch’s optim.SGD functionality (4th Training Loop)
    1. Define get_model function
      1. rtype: (1) model instance from nn.Sequential, (2) Optimizer function from optim.SGD
    2. See the loss and accuracy 4

Dataset and DataLoader

We will make a dataset / dataloader class to make iteration through mini-batches more efficiently.


  1. Make class Dataset which has three essential methods. (hint: 3)
    1. what are three components?
  2. make dataset object 1) get the length and 2) get items using index
    1. compare those tensor’s length / shape with originals
  3. Do the one epoch learning using data from Datasest (5th training loop)


  1. make class DataLoader which takes dataset and batch size, and instance is iterator returning next batch

  2. make def fit() which gets data from dataloader class. (6th training)

Random sampling

  1. Why we should shuffle the training set excluding validation set?
  2. When to re-shuffle the data? At the beginning of epoch? beginning of minibatch? and why I should?
  3. Make Sampler class 5
    1. initialized with 3 params, (int) dataset len, (int) bs, and (bool) shuffle
    2. make instance as generator which returns [shuffled] index of the next batch
  4. Test class Sampler with 10 size of data, and 3 of bs 6, 7
    1. suppose this data is train data and see the returnings
    2. suppose this data is valid data and see the returnings
  5. re-defin DataLoader (2nd)
    1. initialized with
      1. dataset instance
      2. sampler instance
      3. collate function: gets randomly extracted x, y pair and stack them
    2. generates stacked tensor
    3. What do we need collate function? How can I variate them?
  6. test newly defined dataloader
    1. grab one batch and plot the first item
    2. grab next batch and plot the first item
    3. see the loss and accuracy with newly clutched batch 8

PyTorch DataLoader

  1. What’s difference between pytorch’s dataloader and our dataloader?
  2. from torch, call dataloader, sampler for train and validation
  3. see the loss and accuracy
    • rendering collate and sampler
    • do above same way without previously defined collate and sampler 9


  1. make the new fit function (7th training)
    1. when should we validate?
    2. why we should have a validation set? How can we judge overfitting happened using validation set?
    3. why should we call train/valid? How does valid works here? (use word ‘inference’)
  2. Answer to the Jeremy’s Question: Are these validation results correct if batch size varies?
  3. make get_dls funciton
    1. return: dataloaders for train/valid sets
    2. why validation set has two times bigger batch size?
    3. there’s no optimization phase when to validate the model, then why do we need with torch.no_grad():?
  4. implement in 3 lines of code which obtain the data loader and fit the model


  1. I’ve heard that finding convex downward / upward is not the same and there are preferences. Find out which is easier and explain why. 

  2. Study other dunders 

  3. I don’t know if I’m getting better loss since I’m using better approach or doing same thing at same data repeatedly -> shallow: check I would get same result when I run this cell before others deep: inspect details of mechanisms each approach uses 

  4. What does it mean ‘except we’ll be doing it in a more flexible way!’? 

  5. torch.randperm and torch.arange 

  6. why is this returning its own? 

  7. pack and unpack the argument of python 

  8. Why did he renew the model when to test new dataset? 

  9. why is loss of the second less?