Part2 lesson 9, 03_minibatch_training | fastai 2019 course -v3

Jul 14, 2020 · 8 mins read

Since fastai instructor including many people who studied fastai highly emphasized to copy the code from scratch, I’ve been using that approach.
It was fine with Part1, but from Part2, applying same strategy which was trying to copy nbs from blank cell without peeking up original was not enough.
First, there were lots of sub-topics worth to dig in, since we’ve moved to bottom-up phase. And I couldn’t inspect deep inside of each topic(sometimes even don’t know what I was doing) only to replicating the code. It would be okay with part1 session, but seemed insufficient with part2
Second, literally it was too difficult to rely on my own memory. I should have open and close up the Jeremy’s too many times.
So, I thought it would be great if I could draw overall picture of each notebook, making replicating process easier. And it was. (At least with 01_matmul, 02_fully_connected, 02b_initializing)
So I share my own supplementary, hoping it could help you to replicate code much easily and grasp the lesson that Jeremy intended.

From this notebook, I learned how to re-define our NN model and update training process.

Initial Setup

Data

1. Fastai for Colab env
2. import data
1. Hyper parames
1. (Number of input size = data size
2. Number of output unit = value
3. Number of hidden unit = 50
4. Number of hidden size = 1
2. Initialize the model (1st)
1. init with size of nodes
2. instance call with independent variable(s), returning prediction

Loss function: Cross entropy loss

1. Softmax
1. Write down the formula of softmax
2. Why do we need this softmax function at the last layer?
3. Write down softmax as code.
4. Why do I need a log of softmax?
2. Cross Entropy
1. Write down the formula of cross entropy
2. Why is this function adequate to the loss of categorical variables?
3. Negative log likelihood function 1
1. Implement the negative log likelihood function
2. Why do we find a negative value
4. LogSumExp
1. What is LogSumExp
2. Why this is called trick somehow?
3. re-implement log_softmax function using logsumexp
4. Use pytorch’s function

Basic Training Loop

1. Write down the procedure the training loop repeats (4 steps)
2. Define the accuracy function.
3. Grab one batch and test accuracy function
4. Do the above step for an epoch with model defined at setting (1st training loop)
1. Get the cross entropy function from PyTorch and use it at 1 epoch process
2. You don’t have to do this for reversed sequence since you’ve already got gradients from PyTorch.
3. Don’t forget to check the existence of attribute before you update and zero it before you get out of loop
5. Print the loss and accuracy of one batch 2

Using parameters and optimal

Parameters

we will use nn.Module.__setattr__ and move relu to functional. 3

1. Re-define Model class using nn.Module. (2nd Model)
1. What’s difference from that you made first?
2. See the layers inside model using named_children method
1. see one layer
3. Re-define fit function with using parameters, not layers. (2nd Training Loop)
1. What’s difference between nn.Module’s layer / parameters?
2. Why this diff makes shorter code?
4. Make DummyModule class to simulate pytorch’s __setattr__ (3rd Model)
1. 3 dunder method
1. Init
2. setattr
3. repr
2. parameter function which generates each parameters in the layers
3. What does the setattr do here?
1. How can i check if I defined parameters properly?
4. What does the repr do here?
5. Call the instance of dummymodule and see repr and shape of parameters in instance

Registering modules

1. Re-define Model class (4th Model)
1. Use original layers approach
2. Register the modules
2. What is registering modules?
3. See the model instance (i.e. repr)

nn.ModuleList

1. Define class SequentialModel (5th Model)
1. Use nn.ModuleList
2. What does nn.ModuleList do for us?

nn.Sequential

1. Make model instance using nn.Sequential (6th Model)
2. Why does this class make jobs easier?

Optim

1. Define class Optimizer
1. Init - params, lr
2. step
4. Why do we have to no_grad() at step and zero_grad function?
2. Do the one epoch learning with optimizer instance (3rd Training Loop)
3. See the loss and accuracy 4
4. Do the one epoch learning using PyTorch’s optim.SGD functionality (4th Training Loop)
1. Define get_model function
1. why do we need this function
2. return: (1) model instance from nn.Sequential, (2) Optimizer function from optim.SGD
3. See the loss and accuracy 5

We will make a dataset / dataloader class to make iteration through mini-batches more efficiently.

Dataset

1. Make class Dataset which has three essential components which are indispensable. (hint: 3)
1. what are three components?
2. make dataset object 1) get the length and 2) get items using index
1. compare those tensor’s length / shape with originals
3. Do the one epoch learning using data from Datasest (5th training loop)

1. make class DataLoader which takes dataset and batch size, and instance is iterator returning next batch

2. make def fit() which gets data from dataloader class. *(6th training)

Random sampling

1. Why we should shuffle the training set excluding validation set?
2. When to re-shuffle the data? At the beginning of epoch? beginning of minibatch? and why I should?
3. Make Sampler class 6
1. initialized with 3 params, (int) dataset len, (int) bs, and (bool) shuffle
2. make instance as generator which returns [shuffled] index of the next batch
4. Test class Sampler with 10 size of data, and 3 of bs 7, 8
1. suppose this data is train data and see the returnings
2. suppose this data is valid data and see the returnings
5. re-defin DataLoader (2nd)
1. initialized with
1. dataset instance
2. sampler instance
3. collate function 9
2. generates stacked tensor
3. What do we need collate function? How can I variate them?
1. grab one batch and plot the first item
2. grab next batch and plot the first item
3. see the loss and accuracy with newly clutched batch 10

2. from torch, call dataloader, sampler for train and validation
3. see the loss and accuracy
• rendering collate and sampler
• do above same way without previously defined collate and sampler 11

Validation

1. make the new fit function (7th training)
1. what’s difference with previous version?
2. Go back and see the previous process. Has validation process been executed well?
3. Answer to the Jeremy’s Question: Are these validation results correct if batch size varies?
4. make get_dls funciton
1. return: dataloaders for train/valid sets
5. implement in 3 lines of code which obtain the data loader and fit the model

Notes

1. I’ve heard that finding convex downward / upward is not the same and there are preferences. Find out which is easier and explain why.

2. What’s relationship with above one epoch training and this one batch loss / accuracy?

3. Study other dunders

4. I don’t know if I’m getting better loss since I’m using better approach or doing same thing at same data repeatedly -> shallow: check I would get same result when I run this cell before others deep: inspect details of mechanisms each approach uses

5. What does it mean ‘except we’ll be doing it in a more flexible way!’?

6. torch.randperm and torch.arange

7. why is this returning its own?

8. pack and unpack the argument of python

9. Why do we need to dataset and sampler instance which was already given dataset as argument?

10. Why did he renew the model when to test new dataset?

11. why is loss of the second less?