Memory configuration flow diagram

Source publication

Comparison of Cache- and Scratch-Pad based Memory Systems with respect to Performance, Area and Energy Consumption

Technical Report

Full-text available

Nov 2001

In this report we evaluate the options for low power on-chip memories during system design and configuration. Specifically, we compare the use of scratch pad memories with that of cache on the basis of performance, area and energy. The target architecture used in our experiments is the AT91M40400 microcontroller containing an ARM7TDMI core. A packi...

Context 1

... have varied the cache and the scratch pad size from 64 bytes to 8192 bytes. For the benchmark suite that we have chosen, after a certain knee point around 1024 bytes, both for cache and scratch pad, there will not be any performance improvement by increasing their size. This is a reflection of the overall memory requirements of these benchmarks. Fig. 11 shows the performance variation for bubble sort, lattice and selection sort benchmarks for the two on-chip memory options in the same ...

View in full-text

Context 2

... area is represented in terms of number of transistors. These are obtained from the cache and scratch pad orga- nization using [9]. Fig. 12 shows the area vs. performance for the cache and the scratch pad for biquad_N_sections, Table 3 gives the area/performance tradeoff. Column 1 is the size of scratch pad and cache in bytes. Columns 2 and 3 are the cache and scratch pad area in transistors. Columns 4 and 5 are the number of CPU cycles for cache- and scratch-pad based ...

View in full-text

Context 3

... we describe the effect of varying address width on the energy for scratch pad and cache. Next, we give an example of the energy consumption required for a main memory access. Finally, we describe the total energy consumption for the various benchmarks that we have used in the experimental setup. Fig. 15 shows the graph of cache size, scratch pad size in bytes vs the energy estimates per access obtained from estimation using CACTI. The x axis is the size in bytes and the y axis represents energy in ...

View in full-text

Introducing aceMote: An energy efficient 32 bit mote

Conference Paper

Full-text available

May 2014

Wireless sensors networks is an active research topic. The sensor nodes (i.e. motes) are the main building blocks of these networks. There is a permanent concern for building more and more efficient motes in order to satisfy the demanding specifications of a sensor network. This paper introduces a new mote device: aceMOTE which is based on a 32 bit...

Deployment of Energy-Efficient Deep Learning Models on Cortex-M based Microcontrollers using Deep Compression

Preprint

Full-text available

May 2022

Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things, interpreting large quantities of data generated by sensors is becoming an increasingly important task. However, in m...

Evaluation of Dynamic Frequency Control on an Automotive Microcontroller

Chapter

Full-text available

Jan 2022

This article presents an extensive study which was performed on Infineon’s Aurix2G microcontroller to evaluate the impact of overclocking and underclocking on several parameters such as throughput, current, and power. Guided by the study, both overclocking and underclocking were utilized together to develop a dynamic frequency control approach to a...

A Prototype of a Multi-core Wireless Sensor Node for Reducing Power Consumption

Conference Paper

Full-text available

Jul 2008

This paper presents initial experiment results toward realizing a multi-core CPU for wireless sensor nodes. The multi-core CPU reduces power consumption with enabling users to easily manage hard real-time tasks. The results show a sensor node with triple CPUs can eliminate about 76 % of power consumption compared to a single CPU sensor node.

Figure 1. Internal Operation of an Electromechanical Meter (Nova...

Figure 2. Diagram of the processing circuit and conditioning of current...

Comparison of energy totalization on standard meter and prototype.

Comparison of readings of electrical variables in standard and...

Prototype Development Open Source Platform-Based Electricity Meter with Minimum Rate of Change According to Module Five of the Electricity Distribution Procedure in the Brazilian National Electrical System (PRODIST)

Article

Full-text available

Nov 2019

The article proposes the development of a low-cost, well-performing open source residential energy consumption demand meter compared to commercial meters in order to achieve accurate results and remain within the limits determined by technical standards, with the objective of measuring real and instant energy consumption through microcontroller and...

Scratchpad Memory Management Random Sampling Algorithm for Multi-core Processor Chronicle Abstract

Article

Mar 2024

Traditional compiler-based SPM management often fails to accurately predict the memory access characteristics of system scheduling and task switching in a Multi-core Processor environment, thus affecting the effect of SPM management. The use of runtime dynamic detection can make up for this flaw and provide an accurate and efficient dynamic management method. This research focuses on the analysis of the similarities and differences between SPM management in Multi-core Processor environment and single-task environment, and builds a real-time operating system (RTOS) supporting multi-task scheduling according to experimental requirements, which is necessary for the random sampling of SPM allocation algorithm and improvements to meet the needs of adaptive SPM allocation for program runtime in a Multi-core Processor environment. The activity of the random sampling algorithm in Multi-core Processor environment is analysed which proves the effectiveness of the allocation algorithm for Multi-core Processor environment.

COMPAD: A heterogeneous cache-scratchpad CPU architecture with data layout compaction for embedded loop-dominated applications

Article

Full-text available

Oct 2023
J SYST ARCHITECT

Weight Fixing Networks

Preprint

Full-text available

Oct 2022

Modern iterations of deep learning models contain millions (billions) of unique parameters, each represented by a b-bit number. Popular attempts at compressing neural networks (such as pruning and quantisation) have shown that many of the parameters are superfluous, which we can remove (pruning) or express with less than b-bits (quantisation) without hindering performance. Here we look to go much further in minimising the information content of networks. Rather than a channel or layer-wise encoding, we look to lossless whole-network quantisation to minimise the entropy and number of unique parameters in a network. We propose a new method, which we call Weight Fixing Networks (WFN) that we design to realise four model outcome objectives: i) very few unique weights, ii) low-entropy weight encodings, iii) unique weight values which are amenable to energy-saving versions of hardware multiplication, and iv) lossless task-performance. Some of these goals are conflicting. To best balance these conflicts, we combine a few novel (and some well-trodden) tricks; a novel regularisation term, (i, ii) a view of clustering cost as relative distance change (i, ii, iv), and a focus on whole-network re-use of weights (i, iii). Our Imagenet experiments demonstrate lossless compression using 56x fewer unique weights and a 1.9x lower weight-space entropy than SOTA quantisation approaches.

Scratchpad Memory Architectures and Allocation Algorithms for Hard Real-Time Multicore Processors

Article

Jun 2015

Time predictability is crucial in hard real-time and safety-critical systems. Cache memories, while useful for improving the average-case memory performance, are not time predictable, especially when they are shared in multicore processors. To achieve time predictability while minimizing the impact on performance, this paper explores several time-predictable scratch-pad memory (SPM) based architectures for multicore processors. To support these architectures, we propose the dynamic memory objects allocation based partition, the static allocation based partition, and the static allocation based priority L2 SPM strategy to retain the characteristic of time predictability while attempting to maximize the performance and energy efficiency. The SPM based multicore architectural design and the related allocation methods thus form a comprehensive solution to hard real-time multicore based computing. Our experimental results indicate the strengths and weaknesses of each proposed architecture and the allocation method, which offers interesting on-chip memory design options to enable multicore platforms for hard real-time systems.

A Survey: Software-Managed On-Chip Memories.

Article

Full-text available

Jun 2015
COMPUT INFORM

Processors are unable to achieve significant gains in speed using the conventional methods. For example increasing the clock rate increases the average access time to on-chip caches which in turn lowers the average number of instructions per cycle of the processor. On-chip memory system will be the major bottleneck in future processors. Software-managed on-chip memories (SMCs) are on-chip caches where software can explicitly read and write some or all of the memory references within a block of caches. This paper1 analyzes the current trends for optimizing the use of these SMCs. We separate and compare these trends based on general classifications developed during our study. The paper not only serves as a collection of recent references, information and classifications for easy comparison and analysis but also as a motivation for improving the SMC management framework for embedded systems. It will also make a first step towards making them useful for general purpose multicore processors.

Two-Level Scratchpad Memory Architectures to Achieve Time Predictability and High Performance

Article

Dec 2014

In modern computer architectures, caches are widely used to shorten the gap between processor speed and memory access time. However, caches are time-unpredictable, and thus can significantly increase the complexity of worst-case execution time (WCET) analysis, which is crucial for real-time systems. This paper proposes a time-predictable two-level scratchpad-based architecture and an ILP-based static memory objects assignment algorithm to support real-time computing. Moreover, to exploit the load/store latencies that are known statically in this architecture, we study a Scratch-pad Sensitive Scheduling method to further improve the performance. Our experimental results indicate that the performance and energy consumption of the two-level scratchpad-based architecture are superior to the similar cache based architecture for most of the benchmarks we studied.

Design space exploration for low-power memory systems in embedded signal processing applications

Conference Paper

Full-text available

Aug 2013

This paper presents an energy-aware electronic design automation (EDA) methodology for the system-level exploration of hierarchical storage organizations, focusing mainly on data-intensive signal processing applications. Starting from the high-level behavioral specification of a given application, several memory management tasks are addressed in a common algebraic framework, using data-dependence analysis techniques similar to those used in modern compilers. Within this memory management software system, the designer can explore different algorithmic specifications functionally equivalent, by computing their minimum storage requirements. The system can perform an exploration based on energy consumption of signal assignments to the off- and on-chip memory layers, followed by a storage-efficient mapping of signals to the physical memories. The last phase of the methodology is an exploration approach for energy-aware banking of the on-chip memory, which takes into account both the static and dynamic energy consumption.

Current Trends and the Future of Software-Managed On-Chip Memories in Modern Processors

Article

Full-text available

Apr 2010

Shahid Alam

Processors are unable to achieve significant gains in speed using the conventional methods. For example increasing the clock rate increases the average access time to on-chip caches which in turn lowers the average number of instructions per cycle of the processor. On-chip memory system will be the major bottleneck in future processors. Software-managed on-chip memories (SMCs) are on-chip caches where software can explicitly read and write some or all of the memory references within a block of caches. This paper analyzes the current trends for optimizing the use of these SMCs. We separate and compare these trends based on general classifications developed during our study. The paper not only serves as a collection of recent references, information and classifications for easy comparison and analysis but also as a motivation for improving the SMC management framework for embedded systems. It will also make a first step towards making them useful for general purpose multicore processors.

Formal Model for the Reduction of the Dynamic Energy Consumption in Multi-Layer Memory Subsystems

Article

Full-text available

Dec 2008
IEICE T FUND ELECTR

SUMMARY In real-time data-dominated communication and multime- dia processing applications, a multi-layer memory hierarchy is typically used to enhance the system performance and also to reduce the energy con- sumption. Savings of dynamic energy can be obtained by accessing fre- quently used data from smaller on-chip memories rather than from large background memories. This paper focuses on the reduction of the dynamic energy consumption in the memory subsystem of multidimensional signal processing systems, starting from the high-level algorithmic specification of the application. The paper presents a formal model which identifies those parts of arrays more intensely accessed, taking also into account the relative lifetimes of the signals. Tested on a two-layer memory hierarchy, this model led to savings of dynamic energy from 40% to over 70% relative to the energy used in the case of flat memory designs.

Vergleich des Energieverbrauchs von Cache- uns Scratch-Pad-Speichern für den ARM7-Prozessor

Article

Jun 2005

Bo-Sik Lee

Memory configuration flow diagram

Contexts in source publication

Similar publications

Citations