## Stochastic Gradient Descent from scratch

- Tensor means array
- 2D tensor means matrix
- row * height * col
*rank*- how many dimensions / axes are there

- Resnet34 is just a function

```
learn = create_cnn(data, models.resnet34, metrics=error_rate, pretrained=Flase)
```

- Y=aX+b
- Y = a_1 X_1 + a_2 X_2 (X_2=1)
- a: coefficient(a_1: slope, a_2: intercept)
- X: parameter
- This is dot product - two thing multiplied and added

- Optimization
- loss.backward(): calculate the gradient
- a.sub_(lr * a.grad): take coefficient a, and substract gradient and multiply with learning rate, and substitute the value
- How gradient is calculated: The matrix calculus you need for deep learning

```
def update():
y_hat = x@a
loss = mse(y, y_hat)
if t % 10 == 0: print(loss)
loss.backward()
with torch.no_grad():
a.sub_(lr * a.grad)
a.grad.zero_()
```

- with torch.no_grad(): # turn gradient calculation off when you do sgd update
- at the real code, we make batch size, and slice some matrix (ex: y[:rand_idx]) and update the value.

Recap the terminology

- Learning rate: a thing that we multiply our gradient by, to decide how much to update the weights by
- Epoch: one complete run through all of our data points(highly related to overfitting)
- minibatch: random bunch of points that you use to update your weights
- SGD: gradient descent using minibatch
- Model / Architecture: kind of mean same thing. Architecture is the mathematical function that you’re fitting the parameters to.
- Parameter: Also known as coefficients, and also known as weights, are the number that you are updating.
- Loss function: the thing that’s telling you how far away or how close you are to correct answer

Bonus note

- When I was going to draw the prediction value, I got this error

```
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-58-1650cee19828> in <module>()
1 plt.scatter(x[:, 0], y)
----> 2 plt.scatter(x[:,0], x@a)
7 frames
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __array__(self, dtype)
484 def __array__(self, dtype=None):
485 if dtype is None:
--> 486 return self.numpy()
487 else:
488 return self.numpy().astype(dtype, copy=False)
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
```

Regarding to pytorch forum, When I try to scatter it, it moves to numpy and meanwhile I will lose the gradient. so that I should `detach()`

so that make Tensor does not requiring grad. And after that can move to numpy.