Some_Interpretation_About_Resnet

Why ResNet can train a neural network that has more than 1000 layers

Here are the math interpretation

Take as a example,when we talk about gradient descent,we use the formula to update our


could be 10 conv_layers,based on it, we add another 10 conv_layers into our model
so

then we continue to use the formula and we could get

If the loss value decreases to close to 0 as the network gets deeper, then the derivation yields a very small value, and multiplying a very small value by a normal value () obtained in the previous layer can potentially make the result smaller and smaller, thus preventing W from being updated in time because the result obtained from the derivation is so small that you can’t make much of a difference even if you increase the learning rate

So here we go to use ResNet

are the things we need to train , we add some layers to make the network much deeper

This allows us to keep the neural network from losing the gradient as it gets smaller and smaller and the gradient disappears as it multiplies layer by layer during backpropagation, resulting in an inability to update the weights