Michele Pratusevich


Back to the Deep Learning project page.

June 15, 2018

Deep Learning Approximation

During my time at Amazon I worked on a deep learning project called Deep Learning Approximation that I got approval to publish my work on ArXiv.

Deep Learning Approximation speeds up the runtime of neural networks, which is especially relevant for pay-per-compute or limited-compute embedded environments. It decouples the task of speeding up neural networks from the task of making them more accurate. It enables tuning off-the-shelf networks for runtime, to complement fine-tuning for accuracy.


Neural networks offer high-accuracy solutions to a range of problems, but are costly to run in production systems because of computational and memory requirements during a forward pass. Given a trained network, we propose a techique called Deep Learning Approximation to build a faster network in a tiny fraction of the time required for training by only manipulating the network structure and coefficients without requiring re-training or access to the training data. Speedup is achieved by by applying a sequential series of independent optimizations that reduce the floating-point operations (FLOPs) required to perform a forward pass. First, lossless optimizations are applied, followed by lossy approximations using singular value decomposition (SVD) and low-rank matrix decomposition. The optimal approximation is chosen by weighing the relative accuracy loss and FLOP reduction according to a single parameter specified by the user. On PASCAL VOC 2007 with the YOLO network, we show an end-to-end 2x speedup in a network forward pass with a 5% drop in mAP that can be re-gained by finetuning.


Read the full paper for all the juicy details!

comments powered by Disqus

All Posts about Deep Learning

2018 June 15 -- Deep Learning Approximation