Stochastic Gradient Descent from scratch
- Tensor means array
- 2D tensor means matrix
- row * height * col
- rank - how many dimensions / axes are there
- Resnet34 is just a function
learn = create_cnn(data, models.resnet34, metrics=error_rate, pretrained=Flase)
- Y = a_1 X_1 + a_2 X_2 (X_2=1)
- a: coefficient(a_1: slope, a_2: intercept)
- X: parameter
- This is dot product - two thing multiplied and added
- loss.backward(): calculate the gradient
- a.sub_(lr * a.grad): take coefficient a, and substract gradient and multiply with learning rate, and substitute the value
- How gradient is calculated: The matrix calculus you need for deep learning
def update(): y_hat = x@a loss = mse(y, y_hat) if t % 10 == 0: print(loss) loss.backward() with torch.no_grad(): a.sub_(lr * a.grad) a.grad.zero_()
- with torch.no_grad(): # turn gradient calculation off when you do sgd update
- at the real code, we make batch size, and slice some matrix (ex: y[:rand_idx]) and update the value.
Recap the terminology
- Learning rate: a thing that we multiply our gradient by, to decide how much to update the weights by
- Epoch: one complete run through all of our data points(highly related to overfitting)
- minibatch: random bunch of points that you use to update your weights
- SGD: gradient descent using minibatch
- Model / Architecture: kind of mean same thing. Architecture is the mathematical function that you’re fitting the parameters to.
- Parameter: Also known as coefficients, and also known as weights, are the number that you are updating.
- Loss function: the thing that’s telling you how far away or how close you are to correct answer
- When I was going to draw the prediction value, I got this error
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-58-1650cee19828> in <module>() 1 plt.scatter(x[:, 0], y) ----> 2 plt.scatter(x[:,0], x@a) 7 frames /usr/local/lib/python3.6/dist-packages/torch/tensor.py in __array__(self, dtype) 484 def __array__(self, dtype=None): 485 if dtype is None: --> 486 return self.numpy() 487 else: 488 return self.numpy().astype(dtype, copy=False) RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
Regarding to pytorch forum, When I try to scatter it, it moves to numpy and meanwhile I will lose the gradient. so that I should
detach() so that make Tensor does not requiring grad. And after that can move to numpy.