Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., & Temam, O. (2014). DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGARCH Comput. Archit. News, 42(1), 269-284. DOI:10.1145/2654822.2541967
Chen, Y. H., Krishna, T., Emer, J. S., & Sze, V. (2017). Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits, 52(1), 367-379. DOI:10.1109/JSSC.2016.2616357
Chen, Y.-H., Emer, J., & Sze, V. (2016). Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. SIGARCH Comput. Archit. News, 44(3), 367-379. DOI:10.1145/3007787.3001177
Dettmers, T. (2015). 8-Bit Approximations for Parallelism in Deep Learning. International Conference On Learning Representations(2), 1-9. Retrieved from http://arxiv.org/abs/1511.04561
Fu, Y., Wu, E., Sirasao, A., Attia, S., Khan, K., & Wittig, R. (2016). Deep Learning with INT8 Optimization on Xilinx Devices White Paper (WP485). 486(WP486 (v1.0.1)), 1-11. Retrieved from www.xilinx.com
Fürer, M. (2007, 2007). Faster integer multiplication, San Diego, California, USA.
Gangadharan, S., & Churiwala, S. (2013). Constraining designs for synthesis and timing analysis: a practical guide to Synopsys design constraints ({SDC}): Springer.
Garland, J., & Gregg, D. (2017). Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks. IEEE Computer Architecture Letters, 16(2), 132-135. DOI:10.1109/LCA.2017.2656880
Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with Limited Numerical Precision. Retrieved from http://arxiv.org/abs/1502.02551
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016). EIE: Efficient Inference Engine on Compressed Deep Neural Network. Proceedings – 2016 43rd International Symposium on Computer Architecture, ISCA 2016, 243-254. DOI:10.1109/ISCA.2016.30
Han, S., Mao, H., & Dally, W. J. (2016). Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. CoRR, abs/1510.00149.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI:10.1109/CVPR.2016.90
Krizhevsky, A., Sutskever, I., Hinton, G. E., & Geoffrey E, H. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, 1-9. doi:http://dx.doi.org/10.1016/j.protcy.2014.09.007
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1(4), 541-551. DOI:10.1162/neco.1989.1.4.541
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. DOI:10.1109/5.726791
Ma, Y., Cao, Y., Vrudhula, S., & Seo, J.-s. (2017). Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays – FPGA ’17, 45-54. DOI:10.1145/3020078.3021736
Sabeetha, S., Ajayan, J., Shriram, S., Vivek, K., & Rajesh, V. (2015, 26-27 Feb. 2015). A study of performance comparison of digital multipliers using 22nm strained silicon technology. Paper presented at the 2015 2nd International Conference on Electronics and Communication Systems (ICECS).
Seide, F., Fu, H., Droppo, J., Li, G., & Yu, D. (2014, 2014/09//). 1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs.
Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv e-prints. Retrieved from https://ui.adsabs.harvard.edu/abs/2014arXiv1409.1556S
Szegedy, C., Wei, L., Yangqing, J., Sermanet, P., Reed, S., Anguelov, D., . . . Rabinovich, A. (2015, 7-12 June 2015). Going deeper with convolutions. Paper presented at the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015). Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Paper presented at the Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA. https://dl.acm.org/citation.cfm?doid=2684746.2689060
Other Interesting / Useful Papers
H. Lee, R. Grosse, R. Ranganath, A. Ng, “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations,” in Proc. 26th Int. Conf. Mach. Learn., Montreal, Canada, pp 609-616, 2009, DOI: 10.1145/1553374.1553453
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” in NIPS Deep Learning Workshop, 2014.
L. Orseau, S. Armstrong, “Safely Interruptible Agents,” in Proc. 32nd Conf. on Uncertainty in Artificial Intelligence, Jun. 2016.
Y. Yang, Y. Li, Y. Aloimonos, C. Fermuller, Y, Aloimonos, “Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web,” in Proc. 29th AAAI Conf. on Artificial Intelligence, 2015, pp. 3686-3692.
Stylianos I. Venieris, Alexandros Kouris, and Christos-Savvas Bouganis. 2018. Tool flows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions. ACM Comput. Surv. 51, 3, Article 56 (June 2018), 39 pages. DOI: https://doi.org/10.1145/3186332
Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., & Mordatch, I. (2019). Emergent Tool Use From Multi-Agent Autocurricula. ArXiv, abs/1909.07528.