Conference PaperPDF Available

RTOS Scheduler Implementation in Hardware and Software for Real Time Applications

June 2006
Proceedings of the International Workshop on Rapid System Prototyping

June 2006

DOI:10.1109/RSP.2006.34

Source
IEEE Xplore

Conference: Rapid System Prototyping, IEEE International Workshop on

Authors:

Luciano Ost

Loughborough University

César A. M. Marcon

Pontifícia Universidade Católica do Rio Grande do Sul

Carlos Eduardo Reif

Pontifícia Universidade Católica do Rio Grande do Sul

Show all 5 authorsHide

In order to enhance performance and improve predictability of the real time systems, implementing some critical operating system functionalities, like time management and task scheduling, in software and others in hardware is an interesting approach. Scheduling decision for real-time embedded software applications is an important problem in real-time operating system (RTOS) and has a great impact on system performance. In this paper, we evaluate the pros and cons of migrating RTOS scheduler implementation from software to hardware. We investigate three different RTOS scheduler implementation approaches: (i) implemented in software running in the same processor of the application tasks, (ii) implemented in software running in a co-processor, and (iii) implemented in hardware, while application tasks are running on a processor. We demonstrate the effectiveness of each approach by simulating and analyzing a set of benchmarks representing different embedded application classes.

Comparison among related work.

…

and Figure 2 illustrate a comparison of a scheduler implemented in the same processor where application tasks are running, with the one implemented in a different processing element (hardware or software).

…

Example of a RTOS scheduler implemented in software running on a single processor.

…

SoRTS block diagram architecture.

…

Co-SoRTS block diagram architecture.

…

Figures - uploaded by Luciano Ost

Content may be subject to copyright.

Content uploaded by Luciano Ost

Content may be subject to copyright.

RTOS Scheduler Implementation in Hardware and Software for

Real Time Applications

Melissa Vetromille, Luciano Ost, César A. M. Marcon, Carlos Reif, Fabiano Hessel

PPGCC - FACIN – PUCRS - Av. Ipiranga, 6681, Porto Alegre, RS – Brazil

{mvetromille, ost, marcon, reif, hessel}@inf.pucrs.br

Abstract

In order to enhance performance and improve

predictability of the real time systems, implementing some

critical operating system functionalities, like time

management and task scheduling, in software and others in

hardware is an interesting approach. Scheduling decision

for real-time embedded software applications is an

important problem in real-time operating system (RTOS)

and has a great impact on system performance. In this

paper, we evaluate the pros and cons of migrating RTOS

scheduler implementation from software to hardware. We

investigate three different RTOS scheduler implementation

approaches: (i) implemented in software running in the

same processor of the application tasks, (ii) implemented in

software running in a co-processor, and (iii) implemented

in hardware, while application tasks are running on a

processor. We demonstrate the effectiveness of each

approach by simulating and analyzing a set of benchmarks

representing different embedded application classes.

1 Introduction

The development of real-time embedded systems is

continuously increasing. Real-time applications, which

require fast response and high synchronization, are

becoming even more popular. The operating system is

without hesitation the most important software of all system

programs in a real-time embedded system. Hence, a Real-

Time Operating System (RTOS), which handles both soft

and hard real-time tasks, is extremely necessary to the

effectiveness of those designs.

As the system becomes larger, the scheduling of tasks

and communications becomes more complex and its impact

on the entire system performance becomes more significant

[1]. Furthermore, real-time demands inject an additional

correctness criterion into embedded systems. It is not just

the result that is important, timing issues also have to be

considered. Moving RTOS scheduler functionalities from

software to hardware can enhance performance of RTOS

systems. However, this approach can increase design

complexity and enlarge silicon area occupation.

This work investigates and discusses the pros and cons

of three different scheduler implementations: software,

software-software, and hardware/software. A software

implementation considers a processor running the scheduler

and the application tasks. A software-software

implementation considers a processor running the

application tasks, and a co-processor running the scheduler.

In a hardware-software implementation, the scheduler is

implemented directly in hardware, and a processor running

the application tasks.

Figure 1 and Figure 2 illustrate a comparison of a

scheduler implemented in the same processor where

application tasks are running, with the one implemented in

a different processing element (hardware or software).

CPU

Time

t1t2t3t4t5t6t7t8t9

CS+OS

T1CS+OS

T2T2

INT: Interrupt handling time

CRi: Context recoveringof i-th tas (Ti)ti:i-thsliceoftime

Csi: Context saving of i-th task (Ti)

RTOS: RTOS slice of time

Ti:i-thtaskCS+OS: Context switching and RTOS time

INT CsTRTOS

CRRTOS CRT

CsRTOS

CsRTOS: Context s aving of RTOS

CRRTOS: Context recovering of RTOS

Figure 1: Example of a RTOS scheduler implemented in

software running on a single processor.

In Figure 1, the system is entirely implemented in

software running on a single processor. As a result, every

determined time slice (t2,t

4,t

6and t8)theprocessoris

interrupted to enable a new task scheduling. When the

interrupt occurs, if the scheduler is implemented as a

process [11][12], the processor takes time dealing with the

interrupt routine, performs the RTOS functionalities as well

as four context switches: (i) execution task information

saving; (ii) RTOS execution status recovering; (iii)RTOS

execution status information saving; and, (iv)nexttask

information recovering. It occurs; even if the previous task

would be elected to continue running, as it is illustrated in

time slices t2,t

4and t8of Figure 1. Here, it is obvious that

the processor wastes time with OS tasks scheduling and

unnecessary context switches.

In Figure 2, the RTOS scheduler is being executed in

parallel with the application tasks processing, enabling to

interrupt the task processing only if a priority task has to be

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06)

executed. With this implementation the processor does not

waste time with unnecessary context switches, increasing

the overall system performance.

CPU

Time

tatbtc

T1T2

INT CsT1 CRT2

RTOS Scheduler

interruption

Figure 2: Example of a RTOS scheduler running on a

co-processor or specific hardware.

Let RTOS interrupt interval be the interval of time

between to instant that the processor is interrupted to

perform RTOS scheduling. Therefore, when the scheduler

is totally implemented in software, we must consider the

time necessary to fulfill deadlines, and this time varies from

zero to RTOS interrupt interval.

Normally, RTOS interrupt intervals are based on a

multiple of clock system frequency, and in general, not all

application tasks deadline are multiple of this value,

implying variations in the deadline fulfilling. These

variations characterize the jitter problem (Figure 3). For

some application, the jitter problem can damage the real-

time operation, since real-time applications depend not only

on the results achieved, but also on the instants that the

results are achieved.

t1t2t3t4t5t6t7

Jitter

Task deadline

RTOS interrupt

instant

Figure 3: Jitter problem.

The remaining of this paper is organized as follows. The

related work is presented in Section 2. Section 3 provides

an overview of three scheduler models implementation.

Section 4 shows a case study and Section 5 presents our

conclusions, and future work.

2 Related work

Few researches address hardware/software RTOS

implementation. Table 1 shows a comparison among them.

Mooney [8] proposed a framework to generate a

partitioned hardware/software RTOS. Independent of tasks

requirements, this approach generates only one OS that is

replicated on every processor. The designer does not have

the flexibility to choose which components are

implemented in software or hardware. Additionally, the

designer cannot control the task mapping onto the target

processors.

Nakano [9] implemented a partitioned OS, called

STRON (Silicon TRON). Nevertheless, the system does not

allow choosing which components are going to be

implemented in hardware and which ones are going to be

developed in software.

Ortiz [3] described the implementation of the scheduler

and the processes control queues directly in hardware. Such

as the others approaches presented, this one implements just

a few predetermined components in hardware.

In order to investigate area overhead and performance,

Cho [1] proposed the implementation of centralized and

distributed schedulers in a multiprocessor SoC. This

approach considers only static scheduler implementation.

Samuelsson [10] presented a performance comparison

between a real-time kernel implemented in hardware and an

equivalent one implemented in software. They used a

hardware multiprocessor platform, called SARA. The

hardware kernel implements the scheduler, inter-process

communication methods, semaphores and timer.

Table 1: Comparison among related work.

Kernel Functions Comparison Result

Mooney Atalanta Deadlock control unit, block

cache and memory management

Performance (kernel software x

partitioned kernel) Better performance in hardware

Nakano μITRON Event flags, task queues, module

control, scheduler and timer

Performance (kernel software x

partitioned kernel) Better performance in hardware

Ortiz KURT-Linux Scheduler, event queues,

interrupt handling

Performance (kernel software x

partitioned kernel)

Better performance in hardware

just for tasks executing in WCET

Cho - Scheduler Performance and area (centralized

scheduler x distributed scheduler)

Distributed scheduler occupies

greater area, but presents better

performance

Samuelsson Kernel model Scheduler, IPC methods,

semaphores and timer

Performance (kernel software x kernel

hardware) Better performance in hardware

Vetromille Kernel model Scheduler and task queues

Performance (kernel software x

partitioned kernel – software/software

and software/hardware)

Better performance in hardware

or software, depending on the

application class

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06)

Applications can be classified according to some

relevant characteristics, like communication and

computation requirements. Applications of a given class

have similar behavior front of a given stimuli, requiring

similar mechanisms to work properly. For instance, hard

real time applications need to operate with hard time

constraints, implying in using hard scheduling policy.

Different from the others, this work analyzes what

scheduler implementation is more suitable for a given

application class: software using a single processor,

software partitioned using two processors and

hardware/software partitioned using a processor and a

dedicated hardware block. These topics are better discussed

in Sections 4 and 5.

3 Scheduler models implementation for

real-time applications

This Section provides an overview of three scheduler

models implementation: (i) SoRTS (Software Real-Time

Scheduler), (ii) Co-SoRTS (Co-processor Software Real-

Time Scheduler), and (iii) HaRTS (Hardware Real-Time

Scheduler). The architecture and the execution of the

scheduler policy for each approach are discussed next.

We implemented these schedulers on a Xilinx Virtex-II

Pro XC2VP30 FPGA, using Xilinx Embedded

Development Kit (EDK) and Modelsim.

In order to validate the software scheduler

implementations, we used MicroBlaze processor available

in the EDK environment. The MicroBlaze is a 32-bit

Harvard RISC architecture and its operating frequency was

determined to be 50 MHz, for prototyping purposes.

3.1 SoRTS

The SoRTS architecture (Figure 4) consists of six

components: (i) MicroBlaze processor, (ii) Block RAM

memory, (iii) OPB (On-chip Peripheral Bus), (iv)

communication interface, (v) interrupt and time control, and

(vi) UART.

Figure 4: SoRTS block diagram architecture.

The MicroBlaze executes application tasks, which are

characterized by: (i) period, (ii) deadlines, (iii) task ID, (iv)

execution time. The Block RAM stores two structures:

Ready queue and Idle queue.Theready queue contains an

ordered list of tasks that can be executed according to their

priorities, which is determined by the scheduling policy.

The idle queue has a list of executed tasks that are waiting

for a new time slice to execute. The communication among

architecture components is performed by a 32-bit OPB. The

communication interface is a specific model instanced by

EDK that allows the communication between software

(RTOS and application tasks running in the MicroBlaze)

and proprietary hardware (interrupt and time control) via

OPB. This communication is based on two registers: (i)

time – which returns the system time, and (ii) the int –

which is used to send an interruption to the MicroBlaze.

These registers are accessed through functions available in

EDKtool.Finally,theUART provides communication

between the Xilinx development board and the host

computer (EDK development kit), which has been used to

validate our experiments.

3.2 Co-SoRTS

Co-SoRTS increases SoRTS architecture with an

additional MicroBlaze. The first MicroBlaze processor

executes a set of tasks stored in the Block RAM. The

second MicroBlaze is used as a co-processor for RTOS

scheduler implementation, as it is illustrated in Figure 5.

Figure 5: Co-SoRTS block diagram architecture.

This approach eliminates the incidence of non-necessary

context switches and reduces the jitter problem. Here, the

context switch only occurs if a new task is scheduled. At

this moment, the co-processor send an interrupt signal to

the processor to performs a context switch.

In some application class, the decrease of context

switches is a potential advantage if compared to a scheduler

running in a single processor. Three new internal registers

had been used to attend the MicroBlazes

intercommunication requirements. The remaining system

components are responsible for supplying the same

functionalities adopted and described in the Section 3.1.

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06)

3.3 HaRTS

Similar to the Co-SoRTS, the HaRTS architecture uses a

dedicated hardware component for scheduling tasks and list

management, as shown in Figure 6.

The dedicated hardware (Figure 7) has four main

modules: (i) scheduler module, (ii) queue control, (iii)

communication interface, (iv) time control.

The scheduler module is composed by three blocks: (i)

fail process; (ii) running process; and (iii) ready process.

The running and ready process are responsible for task

scheduling, according to parameters previously stored in

the queue control. In order to reduce the area cost the queue

control implements only one list for task management. This

list is accessed to find out the current task state (fail,

running or ready), in order to perform the scheduling

policy. No task is removed from de list; just its states are

updated. The fail process verifies the occurrences of task

fails and signalizes the scheduler module.

Figure 6: HaRTS block diagram architecture.

6FKHGXOHU0RGXOH

)DLO3URFHVV

5XQQLQJ

3URFHVV

5HDG\3URFHVV

4XHXH

&RQWURO

0RGXOH

7LPH

&RQWURO

0RGXOH

&RPPXQLFDWLRQ

,QWHUIDFH

0RGXOH

Figure 7: Dedicated hardware architecture.

The HaRTS communication interface is much more

complex than the one implemented in SoRTS and Co-

SoRTS, implying the usage of sixteen internal registers due

to native MicroBlaze communication protocol.

Finally, the time control is responsible for the time

system management.

4 Case study

This Section presents a case study composed by a set of

synthetic benchmarks, representing different embedded

application classes. We are interested in compare the

number of deadline fails, the number of context switches

and the CPU occupation time dedicated to tasks execution.

It allows verifying what scheduling implementation

approach (SoRTS, Co-SoRTS, and HaRTS) is better suited

to execute a specific application class. Each benchmark is

composed by a set of tasks modeled by its period, deadline

and average case execution time. We vary the context

switches time of each benchmark in the range of 25, 50, 75

and 100 us. For each context switch time we applied three

different values for RTOS interrupt interval (250, 400 and

500 us). Results were achieved by 10 seconds of execution

of each benchmark. In all benchmarks we used RM (Rate

Monotonic) as scheduling policy.

4.1 Context switch

Figure 8 illustrates the number of context switches

(vertical axis) and the context switches execution time

(horizontal axis) after 10 seconds of system execution,

considering SoRTS implementations.

45605

85577 84919 84357 83471

56149 55891 55615 55233

46379 45909

46125

40000

45000

50000

55000

60000

65000

70000

75000

80000

85000

90000

25 50 75 100

Time of Cont ext Switche s (us)

Numb er of Cont ext Switches

SoRTS (Timer 250us) SoRTS (Timer 400us) SoRTS (Timer 500us)

Figure 8: Comparison of number of context switches

between different SoRTS implementation.

Figure 8 show that when the time of the context switches

increases, the number of context switches reduces. This

reduction happens due to the increase of tasks fails (Figure

10). Additionally, we can observe that the number of

context switches decreases with the increase of the RTOS

interrupt interval (250, 400 and 500 us). Obviously, it

happens due to the increase of task interrupt frequency.

However, this augment implies in the jitter increase,

consequently some real-time tasks may not have the correct

result in the correct time.

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06)

Analyzing HaRTS and Co-SoRTS scheduler

implementations, we conclude that HaRTS presents less

number of context switches than Co-SoRTS (Figure 9). It

happens since the communication protocol used in Co-

SoRTSismorecomplexbecauseoftheintrinsicprocessor

communication interface, which generates larger

communication overhead.

8773

8893 8887

8997

8573 8565

8653

8727

8500

8550

8600

8650

8700

8750

8800

8850

8900

8950

9000

9050

25 50 75 100

Time of Context Switches (us)

Number of Context Sw itches

Co-SoRTS HaRTS

Figure 9: Comparison between Co-SoRTS and HaRTS.

Comparing Figure 8 and Figure 9, we can observe that

Co-SoRTS and HaRTS present less number of context

switches than SoRTS. It happens because application tasks

run concurrently with Co-SoRTS or HaRTS schedulers,

eliminating unnecessary context switches. In addition, these

approaches reduce the jitter increasing the predictability of

the real-time system.

4.2 Deadline fails

Figure 10 illustrates the number of fails (vertical axis)

taking into account different time of context switches

(horizontal axis).

546

946

1212

1877

812

546

413

177

546

413

249

49 00

200

400

600

800

1000

1200

1400

1600

1800

2000

25 50 75 100

Time of Context S witches (us)

Num ber of fails

SoRTS (Timer 250us )

SoRTS (Timer 400us )

SoRTS (Timer 500us )

Co-SoRTS / HaRTS

Figure 10: Number of fails after 10 seconds of execution.

We can observe that an increase of the context switch

time also increases the number of fails for SoRTS scheduler

implementation. Besides, Figure 10 shows that for SoRTS

approach, the enlargement of the RTOS interrupt interval

(250, 400 and 500 us) reduces the number of fails.

Furthermore, as larger as the time of context switches,

lesser is the available CPU time for tasks execution (Figure

11), increasing the number of context switches and

inducing the system to fail. Figure 10 shows that Co-

SoRTS and HaRTS do not present deadline fails. We can

conclude that Co-SoRTS and HaRTS are indicated

scheduling approaches for hard real-time applications.

4.3 CPU utilization

Figure 11 shows the CPU availability for task execution

(vertical axis) considering different time of context

switches (horizontal axis).

In SoRTS approach, the increase of the context switches

time decreases the CPU availability for tasks execution.

This behavior is expected since the number of context

switches and fails increases. As a result, the CPU wastes

more time accomplishing context switches, providing less

time to execute tasks. The context switch time does not

affect the CPU availability for Co-SoRTS and HaRTS since

for these approaches the scheduler executes in parallel with

the application tasks.

24,74%

35,86%

46,46%

57,45% 65,30%

58,40% 51,17%

72,93%

77,95%

59,85%

65,75%

71,74%

80,73%

80,73%80,73%

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

80,00%

90,00%

25 50 75 100

Time of Context S witches (us)

CPU A vailability (%)

SoRTS (Timer 250us ) SoRTS (Timer 400us )

SoRTS (Timer 500us ) Co-SoRTS / HaRTS

Figure 11: CPU availability.

5 Conclusions

This paper compares three scheduler implementations:

SoRTS, Co-SoRTS and HaRTS, in order to relate

application classes with scheduler approaches. The idea is

to find out which approach is the most suitable for a given

application class.

For all applications HaRTS scheduler implementation

always achieved better performance results, fulfilling all

application deadlines. Co-SoRTS and HaRTS have similar

results. However, a scheduler implemented in the dedicated

hardware of HaRTS can be implemented compromising

less energy and area consumption if compared to an

equivalent one implemented in a Co-SoRTS co-processor.

The overall system performance for schedulers

implemented in the same processor than application tasks

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06)

(SoRTS approach) is more affected by relative variations of

the RTOS interrupt interval and the time necessary for

context switch.

It’s important to consider the implementation efforts and

cost of each approach. The HaRTS approach is more

complex and expensive compared to Co-SoRTS or SoRTS

approach due to the complex nature of the hardware

implementation. Comparing Co-SoRTS and SoRTS

approaches, we also find an extra complexity. Relating the

considerations discussed here, we conclude that Co-SoRTS

and HaRTS present the best results for hard real-timer

application. On the other hand, SoRTS is suitable for soft

real-time systems.

Future work includes the development of a MPSOC

RTOS scheduler that allows the processor task migration in

an efficient way.

6 Acknowledgments

The authors gratefully acknowledge the support from

CNPq and FINEP (project # 1929/04) agencies for R&D in

the form of scholarships and grants.

7 References

[1] Y. Cho, S. Yoo, K. Choi, N-E. Zergainoh, A. Jerraya.

Scheduler implementation in MPSoC Design.In:Asia

South Pacific Design Automation Conference (ASP-

DAC’05), 2005, pp. 151-156.

[2] D. Andrews, D. Niehaus, and P. Ashenden. Programming

models for hybrid CPU/FPGA chips.IEEEComputer,v.

37(1), 2004, pp.118–120.

[3] J. Ortiz. Hardware/Software co-design of schedulers for

real time and embedded systems. Master’s thesis on

Computer Science, University of Kansas. 2004. Available at:

http://www.ittc.ku.edu/research/thesis/documents/jorge_ortiz

_thesis.pdf.

[4] P. Kohout, B. Ganesh, and B. Jacob. Hardware support for

real-time operating systems. Design, Automation and Test

in Europe Conference (DATE’03), 2003, pp. 45–51.

[5] V.Mooney III, J. Lee, A. Daleby, K. Ingstrom, T. Klevin, and

L. Lindth. A comparison of the RTU hardware RTOS

with a hardware/software RTOS. In: Design Automation

Conference (DAC’03), 2003, pp. 683–688.

[6] M. Barabanov. A Linux-based Real-Time Operating

System. Master’s thesis, New Mexico Institute of Mining

and Technology. 1997. Available at:

http://www.fsmlabs.com/images/stories/pdf/archive/thesis.ps.

[7] K. Lahiri, S. Raghunathan, and S. Dey. System-level

performance analysis for designing on-chip

communication architecture. IEEE Transaction on

Computer-Aided Design of Integrated Circuits and Systems,

v. 20(6), 2001, pp. 768-783.

[8] V. Mooney III. Hardware/software partitioning of

operating systems. In: Design, Automation and Test in

Europe Conference (DATE’03), 2003, pp. 338–339.

[9] T. Nakano, A. Utama, M. Itabashi, A. Shiomi, and M. Imai.

Hardware implementation of a real-time operating

system. In: 12th TRON Project International Symposium

(TRON’95), 1995, pp. 34–42.

[10] T. Samuelsson, M. Åkerholm, P. Nygren, J. Stärner, L.

Lindh. A Comparison of Multiprocessor Real-Time

Operating Systems Implemented in Hardware and

Software. In: International Workshop on Advanced Real-

Time Operating System Services (ARTOSS’03), 2003.

[11] W. Wolf. Computer as components: principles of

embedded system design. Morgan Kaufmann Publishers,

2001, pp. 688.

[12] G. Buttazzo. Hard real-time co mputing systems:

predictable scheduling algorithms and applications.

Kluwer Academic Publishers, 1997, pp. 400.

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06)

Exploration of Power-Savings on Multi-Core Architectures With Offloaded Real-Time Operating System

Article

Full-text available

Jan 2024

A Real-time Operating System (RTOS) manages the execution order of tasks with a scheduling algorithm to meet timing requirements. The scheduler frequently checks for ready tasks during context-switching. However, high task numbers can cause longer processing time in this routine. RTOSs are mainly implemented in software, but reconfigurable computing enables offloading to reduce, e.g., the processing time of context-switching. On the other hand, optimizing the energy efficiency of running applications is desirable. Power-saving techniques allow adapting current dissipation to required operating conditions. However, unplanned use can lead to missed deadlines in real-time applications. Therefore, real-time capability and energy efficiency have to be appropriately balanced. This work explores the impact of power-saving techniques on real-time requirements while supporting RTOS with offloading methodologies. A mapping strategy assigns tasks to Processing Elements (PEs) based on task dependency, inter-task/processor communication, and power consumption metrics. A multi-core architecture is designed with a Network-on-Chip (NoC) and four PEs in a 2D-mesh topology. The master PE manages the system architecture, executes the mapping strategy, and dynamically scales voltage to reduce power consumption while running an RTOS. The task scheduling is offloaded to the co-processor. On the other hand, each slave PE executes assigned tasks with an RTOS and performs an inter-task/processor communication. The task scheduling here runs on the reconfigurable hardware. Each slave PE locally adapts power with frequency scaling and clock gating. The experimental results show that co-processor offloading reduces scheduling overhead by 26.58%, and hardware offloading reduces it by 33.33%. Additionally, the proposed solution has reduced overall power by 47.27% and energy consumption by 89.47%.

Operating Systems for Reconfigurable Computing: Concepts and Survey

Chapter

Jan 2021

Operating systems for reconfigurable computing (RCOS) facilitate the usage of Field Programmable Gate Arrays (FPGAs). RCOSes abstract from hardware details, utilise virtualisation, and provide standardised functionality. They allow different applications to run hardware tasks concurrently on the same FPGA by managing shared resources like FPGA area, I/O, and memory. Next to spatial partitioning, time multiplexed sharing of the FPGA can be reached via Dynamic Partial Reconfiguration (DPR). In this way, operating systems for reconfigurable computing support user applications to increase their performance and decrease energy consumption without the need to know the underlying concepts. Therefore, RCOSes pave the way for applications to exploit the advantages of FPGAs under consideration of their limitations like limited area and limited accessibility of configuration ports. Furthermore, RCOS can benefit from outsourcing parts of the OS into the FPGA. This survey outlines key concepts and gives an overview over state-of-the-art operating systems for reconfigurable computing. It points out general and specific limitations of RCOS. Finally, future trends are identified, which include a specialisation of RCOS with respect to their application’s requirements like real-time processing, low energy consumption, reliability, safety, and security.

Building Fine-Grained Configurable ITRON Based RTOS

Article

Jan 2020

As IoT (Internet of Things) is prevailing, the number of devices which have strict resource constraints is increasing. In developing such a system, RTOS (Real Time Operating System) helps to increase productivity. However, in the view of cost reduction, it is desirable that resources for RTOS be small and the execution time be short. In this paper, we propose a method to develop an application-specific system with RTOS. Methods of removing unnecessary code for the application from RTOS kernel are explained. In addition, we implemented a reconfigurable hardware RTOS on an FPGA and applied the method for removing unnecessary code from the hardware implementation. The evaluation results show that the proposed methods reduce hardware resources, RTOS kernel execution time, and the size of the software parts in each application.

Arrival Order Processing of Service Requests in Full Hardware Implementation of RTOS-Based Systems

Conference Paper

Jun 2023

Automatic Generation of Management Module for Full Hardware Implementation of RTOS-Based Systems

Conference Paper

Jun 2023

Hardware Fuzzy Scheduler for Real-Time Independent Tasks

Article

Feb 2022

Several scheduling algorithms that have been proposed for Real-Time Operating System (RTOS) are supposed to be optimal. However, optimal scheduling is only theoretical due to the possibility of system overload where it cannot meet the deadlines of tasks. Besides, these algorithms are implemented in the RTOS, which generates additional overheads that can lead to the “nonscheduling” of certain independent tasks. In this paper, we propose an original solution for nonschedulable independent tasks in embedded systems. This solution, named Hybrid Fuzzy Earliest Deadline First Scheduling algorithm (HFEDFS), is based on the Earliest Deadline First algorithm (EDF) and Fuzzy Logic. It is characterized by a rejection policy and a rescheduling mechanism. The experimental results show that our proposed algorithm improves the system’s performance. To reduce extra overheads of RTOS, this algorithm is implemented on a Field-Programmable Gate Array (FPGA) circuit (Xilinx Virtex-5 LX50T-1156 board from DIGILENT).

Full Hardware Implementation of FreeRTOS-Based Real-Time Systems

Conference Paper

Dec 2021

Power-Aware Real-Time Operating Systems on Reconfigurable Architectures

Conference Paper

Aug 2021

Implementation of Direct Memory Access for Parallel Processing

Conference Paper

Nov 2020

VisSched: An Auction-Based Scheduler for Vision Workloads on Heterogeneous Processors

Article

Oct 2020

With the growth of edge computing, application-specific workloads based on computer vision are steadily migrating to edge cloudlets. Scheduling has been identified to be a major problem in these cloudlets. In this article, we propose a generic architectural solution, VisSched , that leverages the fact that most vision workloads share similar code kernels (such as library code for linear algebra), and as a result, they tend to exhibit similar phase behavior. This allows us to create an auction theory-based scheduling mechanism, where we give each thread a replenishable virtual wallet, and threads are scheduled based on the amounts that they bid for executing on a free core. We show that in 20%–40% of the cases, our scheduling algorithm is theoretically optimal, and in the remaining cases, it reaches a global optimum obtained using Monte Carlo simulations 90%–95% of the time. Our results for the MEVBench vision workloads show a 17% higher performance and a 14% lower ED^2 as compared to the nearest competing algorithm in the literature. Read the full paper here: https://www.cse.iitd.ac.in/~diksha/files/papers/vissched.pdf

Scheduler implementation in MPSoC Design

Conference Paper

Full-text available

Feb 2005

In the design of a heterogeneous multiprocessor system on chip, we face a new design problem; scheduler implementation. In this paper, we present an approach to implementing a static scheduler, which controls all the task executions and communication transactions of a system according to a pre-determined schedule. For the scheduler implementation, we consider both intra-processor and inter-processor synchronization. We also consider scheduler overhead, which is often neglected. In particular, we address the issue of centralized implementation versus distributed implementation. We investigate the pros and cons of the two different scheduler implementations. Through experiments with synthetic examples and a real world multimedia application, we show the effectiveness of our approach.

Hardware Support for Real-time Operating Systems

Conference Paper

Full-text available

Nov 2003

The growing complexity of embedded applications and pressure on time-to-market has resulted in the increasing use of embedded real-time operating systems. Unfortunately, RTOSes can introduce a significant performance degradation. The paper presents the Real-Time Task Manager (RTM) - a processor extension that minimizes the performance drawbacks associated with RTOSes. The RTM accomplishes this by supporting, in hardware, a few of the common RTOS operations that are performance bottlenecks: task scheduling, time management, and event management. By exploiting the inherent parallelism of these operations, the RTM completes them in constant time, thereby significantly reducing RTOS overhead. It decreases both the processor time used by the RTOS and the maximum response time by an order of magnitude.

A comparison of the RTU hardware RTOS with a hardware/software RTOS

Conference Paper

Full-text available

Feb 2003

In this paper, we show the performance comparison and analysis result among three RTOSs: the real-time unit (RTU) hardware RTOS (real-time operating system), the pure software Atalanta RTOS and a hardware/software RTOS composed of part of Atalanta interfaced to the system-on-a-chip lock cache (SoCLC) hardware. We also present our RTOS configuration framework that can automatically configure these three RTOSs. The average-case simulation result of a database application example on a three-processor system running thirty tasks with RTU and the same system with SoCLC showed 36% and 19% overall speedups, respectively, as compared to the pure software RTOS system.

Computers as Components: Principles of Embedded Computer Systems Design

Article

Jan 2001

W. Wolf

A Linux-based Real-Time Operating System; New Mexico Institute of Mining and Technology

Article

M. Barabanov

Real-Time Computing Systems-Predictable Scheduling Algorithms and Applications

Article

Jan 2004

Buttazzo GC

Hardware/Software Co-design of Schedulers for Real Time and Embedded Systems

Article

Jorge Ortiz

Abstract Embedded ,systems can ,no longer ,depend ,on independent ,hardware ,or software solutions to real time problems due to cost, efficiency, flexibility, upgradeability, and development time. System designers ,are now ,turning to hardware/software co-design approaches that offer real time capabilities while maintaining flexibility tosupport increasing complex systems. Although long desired, reconfigurable technologies and supporting design tools are finally reaching a level of maturity that are allowing system designers ,to perform hardware/software co-design of operating systemcore functionality such as time management and task scheduling

Hardware/Software Partitioning of Operating Systems

Chapter

Jan 2004

Vincent J. Mooney

We present a few specific hardware/software partitions for real-time operating systems and a framework able to automatically generate a large variety of such partitioned RTOSes. Starting from the traditional view of an operating system, we explore novel ways to partition the OS functionality between hardware and software. We show how such partitioning can result in large performance gains in specific cases involving multiprocessor System-on-a-Chip scenarios.

Hardware/software partitioning of operating systems [SoC applications]

Conference Paper

Feb 2003

Vincent John Mooney III

We present a hardware/software RTOS (real-time operating systems) generation framework for system-on-a-chip (SoC) applications. We claim that current SoC designs tend to ignore the RTOS until late in the SoC design phase. In contrast, we propose RTOS/SoC codesign where both the multiprocessor SoC architecture and a custom RTOS (with part potentially in hardware) are designed together. Thus, this paper introduces a hardware/software RTOS generation framework for customized design of an RTOS within specific predefined RTOS services and capabilities available in software and/or hardware (depending on the service or capability).

Hardware implementation of a real-time operating system

Conference Paper

Jan 1995

This paper proposes a new approach to realize a very high performance real-time OS using VLSI technology. In this method, quick and steady response can be guaranteed by implementing basic operations of a real-time OS as a peripheral chip (Silicon TRON) to be connected to general purpose microprocessors. In order to confirm the effectiveness of this method, most basic system calls of μITRON have been designed using an HDL. Synthesis results using a 0.8 μm CMOS technology show that most important part of the system calls can be realized as a VLSI chip. According to the evaluation results based on an FPGA implementation, hardware portion of these functionalities can be executed within 250 ns and the task scheduling can be performed within 750 ns simultaneously, which are about 6 to 50 times faster than software implementation. Accordingly, very high performance real-time systems can be realized by the proposed method

RTOS Scheduler Implementation in Hardware and Software for Real Time Applications

Abstract and Figures

Recommended publications

XenoJetBench: An Open Source Hard-Real-Time Multiprocessor Benchmark

Embedded Real Time Operating Systems For Dynamic Reconfiguration

Using a Reflective Real-Time Operating System to Implement a Just-in-Time Scheduling Policy for a Fl...

Parallel administration of events in real-time systems