Conference PaperPDF Available

Hardware Support for Real-time Operating Systems

Authors:

Abstract

The growing complexity of embedded applications and pressure on time-to-market has resulted in the increasing use of embedded real-time operating systems. Unfortunately, RTOSes can introduce a significant performance degradation. The paper presents the Real-Time Task Manager (RTM) - a processor extension that minimizes the performance drawbacks associated with RTOSes. The RTM accomplishes this by supporting, in hardware, a few of the common RTOS operations that are performance bottlenecks: task scheduling, time management, and event management. By exploiting the inherent parallelism of these operations, the RTM completes them in constant time, thereby significantly reducing RTOS overhead. It decreases both the processor time used by the RTOS and the maximum response time by an order of magnitude.
This paper and many others
can be found in PDF form on my website:
http://www.ece.umd.edu/~blj
(alternatively, try searching for my name
and/or the article name at google.com)
Every single thing I have ever published
(other than books) is available
on my website in PDF.
... Among the various data processing tasks of these applications, sorting is one of the most important [2], [3]. Various applications such as real-time schedulers [4], [5], searching [6], [7],robotics [8], [9], video and image processing [10], Artificial Intelligence (AI) [11], [12] and multimedia systems [13], Balrug Sorting Distributed data center Distributed data center Distributed data center Centralized data center Sorting Sorting Sorting Fig. 1: Sorting is an essential and important computing process in data centers. [14] use sorting methods in their structures. ...
Article
Sorting is an inseparable part of applications that process massive amounts of data. A hardware-designed sorter increases the performance at the cost of increasing the required resources, the issue that is limited in the FPGA chips. This paper proposes a new Ultra-Low-Power 3-dimensional hardware sorting architecture, the so-called ULPSorter, based on the Multi-Dimensional Sorting Algorithm (MDSA). The hardware resources and power consumption are significantly decreased in the ULPSorter architecture in comparison with previous state-of-the-art techniques. The simulation results show that ULP-Sorter reduces the number of Look-Up Tables (LUT) and registers by 70% and 35.7%, respectively, in comparison to previous state-of-the-art techniques. ULP-Sorter reduces the FPGA's power consumption on average by 48.7% in comparison with previous state-of-the-art techniques. The results indicate that ULP-Sorter is a suitable architecture for edge computing devices with limited power and area constraints.
... They extend the multithreaded programming model to abstract the FPGA components, which are attached to the CPU bus. The work presented in [8] describes the Real-Time Task Manager (RTM). It supports in hardware a few of the common RTOS operations that represent performance bottlenecks, like task scheduling, time management, and event management. ...
Article
Full-text available
One of the main challenges in the development of tools and methodologies for a multiprocessor realtime embedded system is to reuse already developed software, but at the same time obtaining low memory footprint, low energy consumption, and minimal area, obviously addressing the real-time constraints. This work aims to face these problems at the middleware level.We show that adaptations in the platform architecture, for instance exploring hardware implementations of middleware services, such as task scheduling and communication, can drive better gains in application requirements like energy and performance, which are essential for embedded applications. This approach is coupled with a high flexibility in choosing either a hardware or a software implementation, because services are encapsulated into objects and the application development and the design space exploration at middleware level can be performed independently from each other, in a fully transparent way. Furthermore, the use of the object-oriented approach reduces time-to-market and development costs.
... Many researches have been conducted on sorting problem to find the best and fastest way to sort big data. Sorting is a frequently used operation in various applications such as search [1], scheduling and real-time systems [2], [3], robotics [4], video and image processing [5], [6], Artificial Neural Network (ANN) [7] and MIMO systems [8]. The computational overhead and memory requirement of high-performance sorting methods make the sorting process a bottleneck. ...
Conference Paper
In this paper, we proposed a 3-dimensional hardware sorting architecture (3D-Sorter), based on Multi-Dimensional Sorting Algorithm (MDSA). the proposed architecture transforms a sequence of input records into a 3-dimensional matrix. Records of every dimension are sorted in several MDSA phases, using partial sorting methods. Our synthesis results, provided by Xilinx Vivado indicate that the 3D-Sorter design decreases the number of Look-Up Tables (LUT) and registers by 54% and 42.7%, compared to the state-of-the-art hardware sorter. Also, the power consumption is reduced by 48.15% on average. The results show that the proposed architecture is a remarkable power/area saving for edge components. 3D-Sorter is the extended version of Real-Time Hardware Sorter(RTHS) that we proposed in the previous work.
... In behalf of ensuring the hard real-time requirement of critical software, some RTOS services were oflloaded to the reconfigurable logic fabric. The scheduler, mutexes and the kernel software timers services represent the major sources of jitter and overhead [KGJ03] in RTOSes and therefore, are major candidates for hardware offloading. The RTOS hardware subsystem is agnostic from the RTOS software implementation that might be running in the secure world. ...
Thesis
Full-text available
Embedded systems were for a long time, single-purpose and closed systems, characterized by hardware resource constraints and real-time requirements. Nowadays, their functionality is ever-growing, coupled with an increasing complexity and heterogeneity. Embedded applications increasingly demand employment of general-purpose operating systems (GPOSs) to handle operator interfaces and general-purpose computing tasks, while simultaneously ensuring the strict timing requirements. Virtualization, which enables multiple operating systems (OSs) to run on top of the same hardware platform, is gaining momentum in the embedded systems arena, driven by the growing interest in consolidating and isolating multiple and heterogeneous environments. The penalties incurred by classic virtualization approaches is pushing research towards hardware-assisted solutions. Among the existing commercial off-the-shelf (COTS) technologies for virtualization, ARM TrustZone technology is gaining momentum due to the supremacy and lower cost of TrustZone-enabled processors. Programmable system-on-chips (SoCs) are becoming leading players in the embedded systems space, because the combination of a plethora of hard resources with programmable logic enables the efficient implementation of systems that perfectly fit the heterogeneous nature of embedded applications. Moreover, novel disruptive approaches make use of field-programmable gate array (FPGA) technology to enhance virtualization mechanisms. This master’s thesis proposes a hardware-software co-design framework for easing the economy of addressing the new generation of embedded systems requirements. ARM TrustZone is exploited to implement the root-of-trust of a virtualization-based architecture that allows the execution of a GPOS side-by-side with a real-time OS (RTOS). RTOS services were offloaded to hardware, so that it could present simultaneous improvements on performance and determinism. Instead of focusing in a concrete application, the goal is to provide a complete framework, specifically tailored for Zynq-base devices, that developers can use to accelerate a bunch of distinct applications across different embedded industries.
... Paul Kohot et al. in [33], developed Real-Time Manager (RTM) which leverages the potential of hardware parallelism, In this system, routine housekeeping tasks are implemented in hardware and thus free the processor for critical functions which boosts the overall performance. RTM supports static priority scheduling and handles task, time and event management. ...
Article
Full-text available
In embedded system, a real-time operating system (RTOs) is often used to structure the application code and ensure that the deadlines are met by reacting on events in the environment by executing the functions within precise time. Most embedded systems are bound to real-time constraints with determinism and latency as a critical metrics. Generally RTOs are implemented in software, which in turns increases computational overheads, jitter and memory footprint which can be reduced even if not remove completely by utilizing latest FPGA technology, which enables the implementation of a full featured and flexible hardware based RTOs. Scheduling algorithms play an important role in the design of real-time systems. This paper proposes the novel FIS based adaptive hardware task scheduler for multiprocessor systems that minimizes the processor time for scheduling activity which uses fuzzy logic to model the uncertainty at first stage along with adaptive framework that uses feedback which allows processors share of task running on multiprocessor to be controlled dynamically at runtime. This Fuzzy logic based adaptive hardware scheduler breakthroughs the limit of the number of total task and thus improves efficiency of the entire real-time system. The increased computation overheads resulted from proposed model can be compensated by exploiting the parallelism of the hardware as being migrated to FPGA
Conference Paper
Full-text available
This paper presents the modeling of embedded systems with SimBed, an execution-driven simulation testbed that measures the execution behavior and power consumption of embedded applications and RTOSs by executing them on an accurate architectural model of a microcontroller with simulated real-time stimuli. We briefly describe the simulation environment and present a study that compares three RTOSs: μC/OS-II, a popular public-domain embedded real-time operating system; Echidna, a sophisticated, industrial-strength (commercial) RTOS; and NOS, a bare-bones multi-rate task scheduler reminiscent of typical "roll-your-own" RTOSs found in many commercial embedded systems. The microcontroller simulated in this study is the Motorola M-CORE processor: a low-power, 32-bit CPU core with 16-bit instructions, running at 20MHz.
Conference Paper
Full-text available
Significant advances have been made in compilation technology for capitalizing on instruction-level parallelism (ILP). The vast majority of ILP compilation research has been conducted in the context of general-purpose computing, and more specifically the SPEC benchmark suite. At the same time, a number of microprocessor architectures have emerged which have VLIW and SIMD structures that are well matched to the needs of the ILP compilers. Most of these processors are targeted at embedded applications such as multimedia and communications, rather than general-purpose systems. Conventional wisdom, and a history of hand optimization of inner-loops, suggests that ILP compilation techniques are well suited to these applications. Unfortunately, there currently exists a gap between the compiler community and embedded applications developers. This paper presents MediaBench, a benchmark suite that has been designed to fill this gap. This suite has been constructed through a three-step process: intuition and market driven initial selection, experimental measurement to establish uniqueness, and integration with system synthesis algorithms to establish usefulness.
Article
Lock synchronization overheads may be significant in a shared-memory multiprocessor system-on-a-chip (SoC) implementation. These overheads are observed in terms of lock latency, lock delay and memory bandwidth consumption in the system. There has been much previous work to speed up access of lock variables via specialized aches[1], software queues [2]-[5] and delayed loops, e.g., exponential backoff [2]. However, in the context of SoC, these previously reported techniques all have drawbacks not present in our technique. We present a novel, efficient, small and very simple hardware unit, SoC Lock Cache (SoCLC), which resolves the critical section (CS) interactions among multiple processors and improves the performance criteria in terms of lock latency, lock delay and bandwidth consumption in a shared-memory multiprocessor SoC. Our mechanism is capable of handling short CSs as well as long CSs. This combined support has been established at both the hardware architecture level and the software architecture level including the real-time operating system (RTOS) kernel level facilities (such as support for preemptive versus non-preemptive synchronization, scheduling of lock variable accesses, interrupt handling and RTOS initialization). The experimental results of a microbenchmark program, which simulates an application with high-contention critical section accesses under a four-process or platform with shared-memory, showed an overall speedup of 55%. Furthermore, a database application example with client server pairs of tasks, run on the same platform, showed that our mechanism achieved an overall speedup of 27%.
Article
Lock synchronization overheadsmay be significant in a shared-memory multiprocessor system-on-a-chip (SoC)implementation. These overheads are observed in terms of lock latency, lockdelay and memory bandwidth consumption in the system. There has been muchprevious work to speedup access of lock variables via specialized caches [1],software queues [2]–[5] and delayed loops, e.g., exponential backoff [2]. However, in the context of SoC, these previously reported techniquesall have drawbacks not present in our technique. We present a novel, efficient,small and very simple hardware unit, SoC Lock Cache (SoCLC), which resolvesthe critical section (CS) interactions among multiple processors and improvesthe performance criteria in terms of lock latency, lock delay and bandwidthconsumption in a shared-memory multiprocessor SoC. Our mechanism is capableof handling short CSs as well as long CSs. This combined support has beenestablished at both the hardware architecture level and the software architecturelevel including the real-time operating system (RTOS) kernel level facilities(such as support for preemptive versus non-preemptive synchronization, schedulingof lock variable accesses, interrupt handling and RTOS initialization). Theexperimental results of a microbenchmark program, which simulates an applicationwith high-contention critical section accesses under a four-processor platformwith shared-memory, showed an overall speedup of 55%. Furthermore, a databaseapplication example with client–server pairs of tasks,run on the same platform, showed that our mechanism achieved an overall speedupof 27%.
Article
Wepresent the Serra Run-Time Scheduler Synthesis and AnalysisTool which automatically generates a run-time scheduler froma heterogeneous system-level specification in both Verilog HDLand C. Part of the run-time scheduler is implemented in hardware,which allows the scheduler to be predictable in being able tomeet hard real-time constraints, while part is implemented insoftware, thus supporting features typical of software schedulers. Serra's real-time analysis generates a priority assignment forthe software tasks in the mixed hardware-software system. Thetasks in hardware and software have precedence constraints, resourceconstraints, relative timing constraints, and a rate constraint.A heuristic scheduling algorithm assigns the static prioritiessuch that a hard real-time rate constraint can be predictablymet. Serra supports the specification of critical regions insoftware, thus providing the same functionality as semaphores.We describe the task control/data-flow extraction,synthesis of the control portion of the run-time scheduler inhardware, real-time analysis and priority scheduler template.We also show how our approach fits into an overall tool flowand target architecture. Finally, we conclude with a sample applicationof the novel run-time scheduler synthesis and analysis tool toa robotics design example.
Book
Contenido: Repaso general del sistema; Introducción al Kernel; Caché; Representación interna de los archivos; Llamadas del sistema por el sistema de archivos; La estructura de los procesos; Control de procesos; Proceso de programación y tiempo; Políticas de administración de memoria; El subsistema de entrada-salida; Comunicación de procesos internos; Sistemas de multiprocesamiento; Sistemas distribuidos de UNIX.