This lists my publications and research of my Ph.D. and other research to date.

**Journal(s)**

*Low Complexity Multiply Accumulate Units for Convolutional Neural Networks with Weight-Sharing *

**ACM TACO, August 2018
**@article{Garland:2018:LCM:3274266.3233300,
author = {Garland, James and Gregg, David},
title = {Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing},
journal = {ACM Trans. Archit. Code Optim.},
issue_date = {August 2018},
volume = {15},
number = {3},
month = sep,
year = {2018},
issn = {1544-3566},
pages = {31:1--31:24},
articleno = {31},
numpages = {24},
url = {http://doi.acm.org/10.1145/3233300},
doi = {10.1145/3233300},
acmid = {3233300},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {ASIC, CNN, FPGA, arithmetic hardware circuits, multiply accumulate, power efficiency},
}

Presented at HiPEAC 2019, the European Network of Excellence on High Performance and Embedded Architecture and Compilation Conference, 21-23 January 2019, Valencia, Spain – Presentation Slides

*Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks*

**IEEE CAL, 23 January 2017**
@ARTICLE{7829315,
author={J. Garland and D. Gregg},
journal={IEEE Computer Architecture Letters},
title={Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks},
year={2017},
volume={16},
number={2},
pages={132-135},
keywords={adders;convolution;embedded systems;feedforward neural nets;learning (artificial intelligence);multiplying circuits;CNN accelerators;MAC circuit;bin index;deep machine learning technologies;embedded systems;hardware accelerators;hardware multipliers;memory traffic;multiply accumulate units;original weight value;subsequent multiply phase;video data;weight-sharing CNN;weight-sharing convolutional neural networks;Convolutional neural networks;Energy efficiency;Logic gates;Machine learning;Neural networks;Convolutional neural network;arithmetic hardware circuits;multiply accumulate;power efficiency},
doi={10.1109/LCA.2017.2656880},
ISSN={1556-6056},
month={July},}

**Poster(s)**

**Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing**

HiPEAC, 2019@PROCEEDINGS{HiPEAC2019POSTER, author = {{Garland}, J. and {Gregg}, D.}, title = "{Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing}", publisher = "HiPEAC, the European Network of Excellence on High Performance and Embedded Architecture and Compilation", year = 2019, month = january, day = 23}

The poster for the HiPEAC, the European Network of Excellence on High Performance and Embedded Architecture and Compilation Conference, 21-23 January 2019, Valencia, Spain.

*Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks*

HiPEAC, ACACES 2017@ARTICLE{ACACES2017POSTER, author = {{Garland}, J. and {Gregg}, D.}, title = "{Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks}", publisher = "HiPEAC, the European Network of Excellence on High Performance and Embedded Architecture and Compilation", year = 2017, month = july, day = 12}

The poster for the Thirteenth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES 2017), 9-15 July 2017, Fiuggi, Italy.

**Book(s)**

**Many-Core Computing: Hardware and Software**

Many-Core Computing: Hardware and Software

Computing has moved away from a focus on performance-centric serial computation, instead towards energy-efficient parallel computation. This provides continued performance increases without increasing clock frequencies and overcomes the thermal and power limitations of the dark-silicon era. As the number of parallel cores increases, we transition into the many-core computing era. There is considerable interest in developing methods, tools, architectures, and applications to support many-core computing. The primary aim of this edited book is to provide a timely and coherent account of the recent advances in many-core computing research. Starting with programming models, operating systems and their applications; the author’s present runtime management techniques, followed by system modelling, verification, and testing methods, and architectures and systems. The book ends with some examples of innovative applications.

*Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks*

**HiPEAC, ACACES 2017**
@ARTICLE{ACACES2017,
author = {{Garland}, J. and {Gregg}, D.},
title = "{Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks}",
book = "{ACACES 2017 Poster Abstracts}",
publisher = "HiPEAC, the European Network of Excellence on High Performance and Embedded Architecture and Compilation",
year = 2017,
month = july,
pages = {53-56},
ISBN = "978-88-905806-5-9"
}

The abstract for the poster for the Thirteenth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES 2017), 9-15 July 2017, Fiuggi, Italy.

**Delay- and Disruption-Tolerant Networking**
@book{1596930632,
Author = {Stephen Farrell and Vinny Cahill},
Title = {Delay- and Disruption-Tolerant Networking},
Publisher = {Artech House Publishers},
Year = {2006},
ISBN = {1596930632},
URL = {https://www.amazon.com/Delay-Disruption-Tolerant-Networking-Stephen-Farrell/dp/1596930632?SubscriptionId=0JYN1NVW651KCA56C102&tag=techkie-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1596930632}
}

Acknowledged by Stephen and Vinny in their book “Delay- and Disruption-Tolerant Networking” for the hardware research I assisted with when employed as a Research Assistant on the SeNDT Project.

Convolutional Neural Networks (CNN’s) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNN’s require large amounts of processing capacity and memory, which can exceed the resources of low power mobile and embedded systems. Several designs for hardware accelerators have been proposed for CNN’s which typically contain large numbers of Multiply-Accumulate (MAC) units. One approach to reducing data sizes and memory traffic in CNN accelerators is “weight sharing”, where the full range of values in a trained CNN are put in bins and the bin index is stored instead of the original weight value. In this paper we propose a novel MAC circuit that exploits binning in weight-sharing CNN’s. Rather than computing the MAC directly we instead count the frequency of each weight and place it in a bin. We then compute the accumulated value in a subsequent multiply phase. This allows hardware multipliers in the MAC circuit to be replaced with adders and selection logic. Experiments show that for the same clock speed our approach results in fewer gates, smaller logic, and reduced power.

**Magazines**

* HiPEAC Info 52*

**Pre-Print Server(s)**

* Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks*

**arXiv, 30 August 2016**
@ARTICLE{2016arXiv160905132G,
author = {{Garland}, J. and {Gregg}, D.},
title = "{Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1609.05132},
keywords = {Computer Science - Neural and Evolutionary Computing},
year = 2016,
month = aug,
adsurl = {http://adsabs.harvard.edu/abs/2016arXiv160905132G},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Convolutional Neural Networks (CNN’s) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNN’s require large amounts of processing capacity and memory, which can exceed the resources of low power mobile and embedded systems. Several designs for hardware accelerators have been proposed for CNN’s which typically contain large numbers of Multiply-Accumulate (MAC) units. One approach to reducing data sizes and memory traffic in CNN accelerators is “weight sharing”, where the full range of values in a trained CNN are put in bins and the bin index is stored instead of the original weight value. In this paper we propose a novel MAC circuit that exploits binning in weight-sharing CNN’s. Rather than computing the MAC directly we instead count the frequency of each weight and place it in a bin. We then compute the accumulated value in a subsequent multiply phase. This allows hardware multipliers in the MAC circuit to be replaced with adders and selection logic. Experiments show that for the same clock speed our approach results in fewer gates, smaller logic, and reduced power.

**Thesis**

**James Garland, Structural Implementation of the CAN Communications Protocol Onto An ASIC Using VHDL, MSc Thesis, Department of Engineering, University of Central England, UK, December 1995.**

**M.Sc. Supervision**

Meehan, Eoin (2006), M.Sc. in Computer Science (Ubiquitous Computing), “Are you looking at me?”, Trinity College Dublin

McKnight, Joseph (2006), M.Sc. in Computer Science (Ubiquitous Computing), “Investigating an Integrated Inertial Gesture Recognition System and Vibrotactile Display”, Trinity College Dublin