521 lines
72 KiB
Plaintext
521 lines
72 KiB
Plaintext
Deep Residual Learning for Image Recognition
|
||
|
||
|
||
|
||
Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
|
||
Microsoft Research
|
||
fkahe, v-xiangz, v-shren, jiansung@microsoft.com
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
arXiv:1512.03385v1 [cs.CV] 10 Dec 2015 Abstract 20 20
|
||
|
||
|
||
|
||
|
||
|
||
training error (%) Deeper neural networks are more difficult to train. We
|
||
|
||
|
||
test error (%) 56-layer
|
||
|
||
present a residual learning framework to ease the training 20-layer 10 10
|
||
56-layer of networks that are substantially deeper than those used
|
||
previously. We explicitly reformulate the layers as learn- 20-layer
|
||
ing residual functions with reference to the layer inputs, in- 0 0 01 2
|
||
iter. (1e4) 3 4 5 6 0 1 2 iter. (1e4) 3 4 5 6
|
||
stead of learning unreferenced functions. We provide com- Figure 1. Training error (left) and test error (right) on CIFAR-10
|
||
prehensive empirical evidence showing that these residual with 20-layer and 56-layer “plain” networks. The deeper network
|
||
networks are easier to optimize, and can gain accuracy from has higher training error, and thus test error. Similar phenomena
|
||
considerably increased depth. On the ImageNet dataset we on ImageNet is presented in Fig.4.
|
||
evaluate residual nets with a depth of up to 152 layers—8 |