905 lines
58 KiB
Plaintext
905 lines
58 KiB
Plaintext
|
Published as a conference paper at ICLR 2016
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
DEEP COMPRESSION : C OMPRESSING DEEP NEURAL
|
|||
|
NETWORKS WITH PRUNING , T RAINED QUANTIZATION
|
|||
|
AND HUFFMAN CODING
|
|||
|
|
|||
|
|
|||
|
Song Han
|
|||
|
Stanford University, Stanford, CA 94305, USA
|
|||
|
songhan@stanford.edu
|
|||
|
|
|||
|
Huizi Mao
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
arXiv:1510.00149v5 [cs.CV] 15 Feb 2016 Tsinghua University, Beijing, 100084, China
|
|||
|
mhz12@mails.tsinghua.edu.cn
|
|||
|
|
|||
|
William J. Dally
|
|||
|
Stanford University, Stanford, CA 94305, USA
|
|||
|
NVIDIA, Santa Clara, CA 95050, USA
|
|||
|
dally@stanford.edu
|
|||
|
|
|||
|
|
|||
|
|
|||
|
ABSTRACT
|
|||
|
|
|||
|
Neural networks are both computationally intensive and memory intensive, making
|
|||
|
them difficult to deploy on embedded systems with limited hardware resources. To
|
|||
|
address this limitation, we introduce “deep compression”, a three stage pipeline:
|
|||
|
pruning, trained quantization and Huffman coding, that work together to reduce
|
|||
|
the storage requirement of neural networks by35 |