Publications

This lists my publications of my PhD and other research to date.

Poster(s)
Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

HiPEAC, ACACES 2017
@ARTICLE{ACACES2017POSTER,
  author    = {{Garland}, J. and {Gregg}, D.},
  title     = "{Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks}",
  publisher = "HiPEAC, the European Network of Excellence on High Performance and Embedded Architecture and Compilation",
  year      = 2017,
  month     = july,
  day       = 12}

The poster for the Thirteenth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES 2017), 9-15 July 2017, Fiuggi, Italy.

 

 


Book(s)
Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

 

PASM Block Diagram
HiPEAC, ACACES 2017
@ARTICLE{ACACES2017,
    author    = {{Garland}, J. and {Gregg}, D.},
    title     = "{Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks}",
    book      = "{ACACES 2017 Poster Abstracts}",
    publisher = "HiPEAC, the European Network of Excellence on High Performance and Embedded Architecture and Compilation",
    year      = 2017,
    month     = july,
    ISBN      = "978-88-905806-5-9"
}

The abstract for the poster for the Thirteenth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES 2017), 9-15 July 2017, Fiuggi, Italy.

 

Delay- and Disruption-Tolerant Networking

Delay- and Disruption-Tolerant Networking
@book{1596930632,
  Author = {Stephen Farrell and Vinny Cahill},
  Title = {Delay- and Disruption-Tolerant Networking},
  Publisher = {Artech House Publishers},
  Year = {2006},
  ISBN = {1596930632},
  URL = {https://www.amazon.com/Delay-Disruption-Tolerant-Networking-Stephen-Farrell/dp/1596930632?SubscriptionId=0JYN1NVW651KCA56C102&tag=techkie-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1596930632}
}

Acknowledged by Stephen and Vinny in their book “Delay- and Disruption-Tolerant Networking” for the hardware research I assisted with when employed as a Research Assistant on the SeNDT Project.

 


Journal(s)
Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

PASM Block Diagram

IEEE CAL, 23 January 2017
@ARTICLE{7829315, 
author={J. Garland and D. Gregg}, 
journal={IEEE Computer Architecture Letters}, 
title={Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks}, 
year={2017}, 
volume={PP}, 
number={99}, 
pages={1-1}, 
keywords={Hardware;Indexes;Kernel;Logic gates;Registers;Standards;Timing;Convolutional neural network;arithmetic hardware circuits;multiply accumulate;power efficiency}, 
doi={10.1109/LCA.2017.2656880}, 
ISSN={1556-6056}, 
month={},}

Presentation Slides

Convolutional Neural Networks (CNNs) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNNs require large amounts of processing capacity and memory, which can exceed the resources of low power mobile and embedded systems. Several designs for hardware accelerators have been proposed for CNNs which typically contain large numbers of Multiply Accumulate (MAC) units. One approach to reducing data sizes and memory traffic in CNN accelerators is “weight sharing”, where the full range of values in a trained CNN are put in bins and the bin index is stored instead of the original weight value. In this paper we propose a novel MAC circuit that exploits binning in weight-sharing CNNs. Rather than computing the MAC directly we instead count the frequency of each weight and place it in a bin. We then compute the accumulated value in a subsequent multiply phase. This allows hardware multipliers in the MAC circuit to be replaced with adders and selection logic. Experiments show that for the same clock speed our approach results in fewer gates, smaller logic, and reduced power.

 


Pre-Print Server(s)
Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

  • PASM Block Diagram
    PASM Block Diagram
arXiv, 30 August 2016
@ARTICLE{2016arXiv160905132G,
   author = {{Garland}, J. and {Gregg}, D.},
    title = "{Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1609.05132},
 keywords = {Computer Science - Neural and Evolutionary Computing},
     year = 2016,
    month = aug,
   adsurl = {http://adsabs.harvard.edu/abs/2016arXiv160905132G},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Presentation Slides

Convolutional Neural Networks (CNNs) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNNs require large amounts of processing capacity and memory, which can exceed the resources of low power mobile and embedded systems. Several designs for hardware accelerators have been proposed for CNNs which typically contain large numbers of Multiply Accumulate (MAC) units. One approach to reducing data sizes and memory traffic in CNN accelerators is “weight sharing”, where the full range of values in a trained CNN are put in bins and the bin index is stored instead of the original weight value. In this paper we propose a novel MAC circuit that exploits binning in weight-sharing CNNs. Rather than computing the MAC directly we instead count the frequency of each weight and place it in a bin. We then compute the accumulated value in a subsequent multiply phase. This allows hardware multipliers in the MAC circuit to be replaced with adders and selection logic. Experiments show that for the same clock speed our approach results in fewer gates, smaller logic, and reduced power.