905 lines
58 KiB
Plaintext
905 lines
58 KiB
Plaintext
Published as a conference paper at ICLR 2016
|
||
|
||
|
||
|
||
|
||
DEEP COMPRESSION : C OMPRESSING DEEP NEURAL
|
||
NETWORKS WITH PRUNING , T RAINED QUANTIZATION
|
||
AND HUFFMAN CODING
|
||
|
||
|
||
Song Han
|
||
Stanford University, Stanford, CA 94305, USA
|
||
songhan@stanford.edu
|
||
|
||
Huizi Mao
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
arXiv:1510.00149v5 [cs.CV] 15 Feb 2016 Tsinghua University, Beijing, 100084, China
|
||
mhz12@mails.tsinghua.edu.cn
|
||
|
||
William J. Dally
|
||
Stanford University, Stanford, CA 94305, USA
|
||
NVIDIA, Santa Clara, CA 95050, USA
|
||
dally@stanford.edu
|
||
|
||
|
||
|
||
ABSTRACT
|
||
|
||
Neural networks are both computationally intensive and memory intensive, making
|
||
them difficult to deploy on embedded systems with limited hardware resources. To
|
||
address this limitation, we introduce “deep compression”, a three stage pipeline:
|
||
pruning, trained quantization and Huffman coding, that work together to reduce
|
||
the storage requirement of neural networks by35 |