Part2 lesson 11, 07a Layer-wise Sequential Unit Variance Initialization fastai 2019 course -v3

Image source and original paper: link

This notebook follows material from fastai as well as the paper ‘All you need is a good init’. As like the fastai course, this paper contains lots of practical tips.

The author called lsuv as a ‘data-driven weights initialiation’. I think this paper’s originality comes from that the authors applied orthonormal initialization to the first batch data (to decouple from actual dataset) as well as unit variance (to relate actual dataset’s distribution).

Q1. Implement and apply lsuv initialization method with MNIST dataset.

Q2. Compare the results from the standard way of initialization.

A1.

A2.

Part2 lesson 11, 07a Layer-wise Sequential Unit Variance Initialization | fastai 2019 course -v3