Conference PaperPDF Available

Hardware Support for Real-time Operating Systems

November 2003

November 2003

DOI:10.1109/CODESS.2003.1275254

Source
IEEE Xplore

Conference: Hardware/Software Codesign and System Synthesis, 2003. First IEEE/ACM/IFIP International Conference on

Authors:

Bruce Jacob

University of Maryland Global Campus

The growing complexity of embedded applications and pressure on time-to-market has resulted in the increasing use of embedded real-time operating systems. Unfortunately, RTOSes can introduce a significant performance degradation. The paper presents the Real-Time Task Manager (RTM) - a processor extension that minimizes the performance drawbacks associated with RTOSes. The RTM accomplishes this by supporting, in hardware, a few of the common RTOS operations that are performance bottlenecks: task scheduling, time management, and event management. By exploiting the inherent parallelism of these operations, the RTM completes them in constant time, thereby significantly reducing RTOS overhead. It decreases both the processor time used by the RTOS and the maximum response time by an order of magnitude.

Content uploaded by Bruce Jacob

Content may be subject to copyright.

This paper and many others

can be found in PDF form on my website:

http://www.ece.umd.edu/~blj

(alternatively, try searching for my name

and/or the article name at google.com)

Every single thing I have ever published

(other than books) is available

on my website in PDF.

Hardware Support for Efficient and Low-Power Data Sorting in Massive Data Application: The 3-D Sorting Method

Article

Apr 2021

Sorting is an inseparable part of applications that process massive amounts of data. A hardware-designed sorter increases the performance at the cost of increasing the required resources, the issue that is limited in the FPGA chips. This paper proposes a new Ultra-Low-Power 3-dimensional hardware sorting architecture, the so-called ULPSorter, based on the Multi-Dimensional Sorting Algorithm (MDSA). The hardware resources and power consumption are significantly decreased in the ULPSorter architecture in comparison with previous state-of-the-art techniques. The simulation results show that ULP-Sorter reduces the number of Look-Up Tables (LUT) and registers by 70% and 35.7%, respectively, in comparison to previous state-of-the-art techniques. ULP-Sorter reduces the FPGA's power consumption on average by 48.7% in comparison with previous state-of-the-art techniques. The results indicate that ULP-Sorter is a suitable architecture for edge computing devices with limited power and area constraints.

Hardware support in a middleware for distributed and real-time embedded applications

Article

Full-text available

Sep 2007

One of the main challenges in the development of tools and methodologies for a multiprocessor realtime embedded system is to reuse already developed software, but at the same time obtaining low memory footprint, low energy consumption, and minimal area, obviously addressing the real-time constraints. This work aims to face these problems at the middleware level.We show that adaptations in the platform architecture, for instance exploring hardware implementations of middleware services, such as task scheduling and communication, can drive better gains in application requirements like energy and performance, which are essential for embedded applications. This approach is coupled with a high flexibility in choosing either a hardware or a software implementation, because services are encapsulated into objects and the application development and the design space exploration at middleware level can be performed independently from each other, in a fully transparent way. Furthermore, the use of the object-oriented approach reduces time-to-market and development costs.

3D-Sorter: 3D Design of a Resource-Aware Hardware Sorter for Edge Computing Platforms Under Area and Energy Consumption Constraints

Conference Paper

Jul 2020

In this paper, we proposed a 3-dimensional hardware sorting architecture (3D-Sorter), based on Multi-Dimensional Sorting Algorithm (MDSA). the proposed architecture transforms a sequence of input records into a 3-dimensional matrix. Records of every dimension are sorted in several MDSA phases, using partial sorting methods. Our synthesis results, provided by Xilinx Vivado indicate that the 3D-Sorter design decreases the number of Look-Up Tables (LUT) and registers by 54% and 42.7%, compared to the state-of-the-art hardware sorter. Also, the power consumption is reduced by 48.15% on average. The results show that the proposed architecture is a remarkable power/area saving for edge components. 3D-Sorter is the extended version of Real-Time Hardware Sorter(RTHS) that we proposed in the previous work.

A TrustZone-assisted Secure Silicon on a Co-Design Framework

Thesis

Full-text available

Dec 2018

Sérgio Pereira

Embedded systems were for a long time, single-purpose and closed systems, characterized by hardware resource constraints and real-time requirements. Nowadays, their functionality is ever-growing, coupled with an increasing complexity and heterogeneity. Embedded applications increasingly demand employment of general-purpose operating systems (GPOSs) to handle operator interfaces and general-purpose computing tasks, while simultaneously ensuring the strict timing requirements. Virtualization, which enables multiple operating systems (OSs) to run on top of the same hardware platform, is gaining momentum in the embedded systems arena, driven by the growing interest in consolidating and isolating multiple and heterogeneous environments. The penalties incurred by classic virtualization approaches is pushing research towards hardware-assisted solutions. Among the existing commercial off-the-shelf (COTS) technologies for virtualization, ARM TrustZone technology is gaining momentum due to the supremacy and lower cost of TrustZone-enabled processors. Programmable system-on-chips (SoCs) are becoming leading players in the embedded systems space, because the combination of a plethora of hard resources with programmable logic enables the efficient implementation of systems that perfectly fit the heterogeneous nature of embedded applications. Moreover, novel disruptive approaches make use of field-programmable gate array (FPGA) technology to enhance virtualization mechanisms. This master’s thesis proposes a hardware-software co-design framework for easing the economy of addressing the new generation of embedded systems requirements. ARM TrustZone is exploited to implement the root-of-trust of a virtualization-based architecture that allows the execution of a GPOS side-by-side with a real-time OS (RTOS). RTOS services were offloaded to hardware, so that it could present simultaneous improvements on performance and determinism. Instead of focusing in a concrete application, the goal is to provide a complete framework, specifically tailored for Zynq-base devices, that developers can use to accelerate a bunch of distinct applications across different embedded industries.

HW SW Co-design of Adaptive Task Scheduler for Real Time Systems

Article

Full-text available

Mar 2016

Dinesh G Harkut

In embedded system, a real-time operating system (RTOs) is often used to structure the application code and ensure that the deadlines are met by reacting on events in the environment by executing the functions within precise time. Most embedded systems are bound to real-time constraints with determinism and latency as a critical metrics. Generally RTOs are implemented in software, which in turns increases computational overheads, jitter and memory footprint which can be reduced even if not remove completely by utilizing latest FPGA technology, which enables the implementation of a full featured and flexible hardware based RTOs. Scheduling algorithms play an important role in the design of real-time systems. This paper proposes the novel FIS based adaptive hardware task scheduler for multiprocessor systems that minimizes the processor time for scheduling activity which uses fuzzy logic to model the uncertainty at first stage along with adaptive framework that uses feedback which allows processors share of task running on multiprocessor to be controlled dynamically at runtime. This Fuzzy logic based adaptive hardware scheduler breakthroughs the limit of the number of total task and thus improves efficiency of the entire real-time system. The increased computation overheads resulted from proposed model can be compensated by exploiting the parallelism of the hardware as being migrated to FPGA

Arrival Order Processing of Service Requests in Full Hardware Implementation of RTOS-Based Systems

Conference Paper

Jun 2023

Automatic Generation of Management Module for Full Hardware Implementation of RTOS-Based Systems

Conference Paper

Jun 2023

Full Hardware Implementation of FreeRTOS-Based Real-Time Systems

Conference Paper

Dec 2021

A Method for Designing and Implementing a Real-Time Operating System for Industrial Devices

Conference Paper

May 2019

Synthesis of Full Hardware Implementation of RTOS-Based Systems

Conference Paper

Oct 2018

The performance and energy consumption of three embedded real-time operating systems

Conference Paper

Full-text available

Jan 2001

This paper presents the modeling of embedded systems with SimBed, an execution-driven simulation testbed that measures the execution behavior and power consumption of embedded applications and RTOSs by executing them on an accurate architectural model of a microcontroller with simulated real-time stimuli. We briefly describe the simulation environment and present a study that compares three RTOSs: μC/OS-II, a popular public-domain embedded real-time operating system; Echidna, a sophisticated, industrial-strength (commercial) RTOS; and NOS, a bare-bones multi-rate task scheduler reminiscent of typical "roll-your-own" RTOSs found in many commercial embedded systems. The microcontroller simulated in this study is the Motorola M-CORE processor: a low-power, 32-bit CPU core with 16-bit instructions, running at 20MHz.

MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems.

Conference Paper

Full-text available

Dec 1997

Significant advances have been made in compilation technology for capitalizing on instruction-level parallelism (ILP). The vast majority of ILP compilation research has been conducted in the context of general-purpose computing, and more specifically the SPEC benchmark suite. At the same time, a number of microprocessor architectures have emerged which have VLIW and SIMD structures that are well matched to the needs of the ILP compilers. Most of these processors are targeted at embedded applications such as multimedia and communications, rather than general-purpose systems. Conventional wisdom, and a history of hand optimization of inner-loops, suggests that ILP compilation techniques are well suited to these applications. Unfortunately, there currently exists a gap between the compiler community and embedded applications developers. This paper presents MediaBench, a benchmark suite that has been designed to fill this gap. This suite has been constructed through a three-step process: intuition and market driven initial selection, experimental measurement to establish uniqueness, and integration with system synthesis algorithms to establish usefulness.

The performance and energy consumption of three embedded real-time operating systems

Conference Paper

Jan 2001

The system-on-a-chip lock cache

Article

Sep 2002

Lock synchronization overheads may be significant in a shared-memory multiprocessor system-on-a-chip (SoC) implementation. These overheads are observed in terms of lock latency, lock delay and memory bandwidth consumption in the system. There has been much previous work to speed up access of lock variables via specialized aches[1], software queues [2]-[5] and delayed loops, e.g., exponential backoff [2]. However, in the context of SoC, these previously reported techniques all have drawbacks not present in our technique. We present a novel, efficient, small and very simple hardware unit, SoC Lock Cache (SoCLC), which resolves the critical section (CS) interactions among multiple processors and improves the performance criteria in terms of lock latency, lock delay and bandwidth consumption in a shared-memory multiprocessor SoC. Our mechanism is capable of handling short CSs as well as long CSs. This combined support has been established at both the hardware architecture level and the software architecture level including the real-time operating system (RTOS) kernel level facilities (such as support for preemptive versus non-preemptive synchronization, scheduling of lock variable accesses, interrupt handling and RTOS initialization). The experimental results of a microbenchmark program, which simulates an application with high-contention critical section accesses under a four-process or platform with shared-memory, showed an overall speedup of 55%. Furthermore, a database application example with client server pairs of tasks, run on the same platform, showed that our mechanism achieved an overall speedup of 27%.

Power analysis of embedded operating systems

Conference Paper

Jan 2000

MicroC/OS-II - The Real-Time Kernel

Book

Feb 2002

Jean J Labrosse

The System-on-a-Chip Lock Cache

Article

Sep 2002

Lock synchronization overheadsmay be significant in a shared-memory multiprocessor system-on-a-chip (SoC)implementation. These overheads are observed in terms of lock latency, lockdelay and memory bandwidth consumption in the system. There has been muchprevious work to speedup access of lock variables via specialized caches [1],software queues [2]–[5] and delayed loops, e.g., exponential backoff [2]. However, in the context of SoC, these previously reported techniquesall have drawbacks not present in our technique. We present a novel, efficient,small and very simple hardware unit, SoC Lock Cache (SoCLC), which resolvesthe critical section (CS) interactions among multiple processors and improvesthe performance criteria in terms of lock latency, lock delay and bandwidthconsumption in a shared-memory multiprocessor SoC. Our mechanism is capableof handling short CSs as well as long CSs. This combined support has beenestablished at both the hardware architecture level and the software architecturelevel including the real-time operating system (RTOS) kernel level facilities(such as support for preemptive versus non-preemptive synchronization, schedulingof lock variable accesses, interrupt handling and RTOS initialization). Theexperimental results of a microbenchmark program, which simulates an applicationwith high-contention critical section accesses under a four-processor platformwith shared-memory, showed an overall speedup of 55%. Furthermore, a databaseapplication example with client–server pairs of tasks,run on the same platform, showed that our mechanism achieved an overall speedupof 27%.

Hardware/Software Co-Design of Run-Time Schedulers for Real-Time Systems

Article

Sep 2000

Wepresent the Serra Run-Time Scheduler Synthesis and AnalysisTool which automatically generates a run-time scheduler froma heterogeneous system-level specification in both Verilog HDLand C. Part of the run-time scheduler is implemented in hardware,which allows the scheduler to be predictable in being able tomeet hard real-time constraints, while part is implemented insoftware, thus supporting features typical of software schedulers. Serra's real-time analysis generates a priority assignment forthe software tasks in the mixed hardware-software system. Thetasks in hardware and software have precedence constraints, resourceconstraints, relative timing constraints, and a rate constraint.A heuristic scheduling algorithm assigns the static prioritiessuch that a hard real-time rate constraint can be predictablymet. Serra supports the specification of critical regions insoftware, thus providing the same functionality as semaphores.We describe the task control/data-flow extraction,synthesis of the control portion of the run-time scheduler inhardware, real-time analysis and priority scheduler template.We also show how our approach fits into an overall tool flowand target architecture. Finally, we conclude with a sample applicationof the novel run-time scheduler synthesis and analysis tool toa robotics design example.

TI's New 'C6x DSP Screams at 1,600 MIPS

Article

Jan 1997

The Design of the Unix Operating System

Book

Jan 1986

Maurice J. Bach

Contenido: Repaso general del sistema; Introducción al Kernel; Caché; Representación interna de los archivos; Llamadas del sistema por el sistema de archivos; La estructura de los procesos; Control de procesos; Proceso de programación y tiempo; Políticas de administración de memoria; El subsistema de entrada-salida; Comunicación de procesos internos; Sistemas de multiprocesamiento; Sistemas distribuidos de UNIX.

Hardware Support for Real-time Operating Systems

Abstract

Recommended publications

Architecture of Task Manager for Real Time OS Explaining Real Time Operating Systems Issues

The performance and energy consumption of three embedded real-time operating systems

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

The performance and energy consumption of embedded real-time operating systems