Posted onWord count in article: 283Reading time ≈1 mins.
Why ResNet can train a neural network that has more than 1000 layers
Here are the math interpretation
Take as a example,when we talk about gradient descent,we use the formula to update our
could be 10 conv_layers,based on it, we add another 10 conv_layers into our model so
then we continue to use the formula and we could get
If the loss value decreases to close to 0 as the network gets deeper, then the derivation yields a very small value, and multiplying a very small value by a normal value () obtained in the previous layer can potentially make the result smaller and smaller, thus preventing W from being updated in time because the result obtained from the derivation is so small that you can’t make much of a difference even if you increase the learning rate
So here we go to use ResNet
are the things we need to train , we add some layers to make the network much deeper
This allows us to keep the neural network from losing the gradient as it gets smaller and smaller and the gradient disappears as it multiplies layer by layer during backpropagation, resulting in an inability to update the weights