Figure 2 - uploaded by Mohsan Ali
Content may be subject to copyright.
Baseline performance (in cycles, per loop iteration) of the code sequence

Baseline performance (in cycles, per loop iteration) of the code sequence

Contexts in source publication

Context 1
... the sample program results in Figure 2 ...
Context 2
... multiprocessor memory systems, lower levels of the memory hierarchy may not be able to be saturated by a single processor but should be able to be saturated by multiple processors working together. Modify the code in Figure 2.29, and run multiple copies at the same time. ...
Context 3
... the memory bandwidth is sometimes 2X this, it would be 5120MB/sec. From Figure 2.14, this is just barely within the bandwidth provided by DDR2-667 DIMMs, so just one memory channel would suffice. ...

Similar publications

Article
Full-text available
Image data contain spatial information only, thus making two-dimensional (2D) Convolutional Neural Networks (CNN) ideal for solving image classification problems. On the other hand, video data contain both spatial and temporal information that must be simultaneously analyzed to solve action recognition problems. 3D CNNs are successfully used for th...
Article
Full-text available
Neuromorphic photonics is a cutting-edge fusion of neuroscience-inspired computing and photonics technology to overcome the constraints of conventional computing architectures. Its significance lies in the potential to transform information processing by mimicking the parallelism and efficiency of the human brain. Using optics and photonics princip...
Preprint
Full-text available
Speculative execution attacks leverage the speculative and out-of-order execution features in modern computer processors to access secret data or execute code that should not be executed. Secret information can then be leaked through a covert channel. While software patches can be installed for mitigation on existing hardware, these solutions can i...
Preprint
Full-text available
In this paper, we propose different alternatives for CNN (Convolutional Neural Networks) segmentation, addressing inference processes on computing architectures composed by multiple Edge TPUs. Specifically, we compare the inference performance for a number of state-of-the-art CNN models taking as a reference inference times on one TPU and a compile...
Preprint
Full-text available
The reciprocal square root is an important computation for which many very sophisticated algorithms exist (see for example \cite{863046,863031} and the references therein). In this paper we develop a simple differential compensation (much like those developed in \cite{borges}) that can be used to improve the accuracy of a naive calculation. The app...