Physical Memory Layout for 2 Processes

Source publication

ML-RSIM Reference Manual

Article

Full-text available

Dec 2002

ML-RSIM is an execution-driven computer system simulator that combines detailed models of modern computer hardware, including the I/O subsystem, with a fully-functional operating system kernel. These features make the simulation environment particular attractive for studies involving applications with significant I/O or operating system activity.

Linear-tree rule structure for firewall optimization

Conference Paper

Full-text available

Jan 2007

Given a list of filtering rules with individual hitting prob- abilities, it is known that the average processing time of a linear-search based firewall can be minimized by searching rules in some appropriate order. This paper proposes a new yet simple technique called the linear-tree structure. It uti- lizes an advanced feature of modern firewalls,...

Simulation-based Autonomous Algorithm Selection for Dynamic Vehicle Routing Problems with the Help of Supervised Learning Methods

Conference Paper

Full-text available

Dec 2018

Multi-constrained Vehicle Routing Problems are gaining steadily in importance. Especially, the dynamic version of the problem has become more emphasis due to modern service requirements, such as short-term or express delivery. With a growing number of dedicated solution approaches for these problems, we investigate a simulation-based supervised lea...

Figure 2. Flow Chart of Discrete Event Barber Shop Simulation

Figure 3. Depiction of no. of people arriving at any instant.

Figure 4. Depiction of no. of people at the queue before service

A Discrete Event Barber Shop Simulation

Article

Full-text available

Feb 2012

A simulation based project is designed which can be practically implemented in a workspace (in this case, a barber shop). The design algorithm provides the user different time varying features such as number of people arrival, number of people being served, number of people waiting at the queue etc depending upon input criteria and system capabilit...

Resolution of weak signals and estimation of their parameters in the presence of interfering sources in the Fresnel zone using modern adaptive algorithms

Article

Full-text available

Jul 2013

The specific features of detecting signals and estimating their parameters in the far-field zone of an antenna in the presence of intense interferences in the near-field zone are considered using modern adaptive algorithms. An imitative simulation shows the possibilities of the adaptive methods for localizing sources in the far- and near-field ante...

A Comparison of x86 Computer Architecture Simulators

Technical Report

Full-text available

Oct 2016

Ayaz Akram

86 computer architecture simulators: A comparative study

Conference Paper

Oct 2016

Full-System Critical-Path Analysis and Performance Prediction.

Article

Jan 2009

Ali Ghassan Saidi

Many important workloads today, such as web-hosted services, are limited not by processor core performance but by interactions among the cores, the memory system, I/O devices, and the complex software layers that tie these components together. Architects who optimize system designs for these workloads are challenged to identify performance bottlenecks before the systems are built. This identification is challenging because, as in any concurrent system, overheads in one component may be hidden due to overlapping with other operations. These overlaps span the user/kernel and software/hardware boundaries, making traditional tools inadequate. Common software profiling techniques cannot account for hardware bottlenecks or situations in which software overheads are hidden due to overlapping with hardware operations. This thesis presents a methodology for identifying true end-to-end critical paths in systems composed of multiple layers of hardware and software, particularly in the domain of high-speed networking. The state machines that implicitly or explicitly govern the behavior of all the layers are modeled and their local interactions captured to build an end-to-end dependence graph that can be used to locate bottlenecks. This is done incrementally, with modest effort and only local understanding. Furthermore, it is shown that queue-based interactions are necessary and sufficient to capture information from complex protocols, multiple connections and multiple processors. The resulting dependence graph is created and analyzed distilling the huge amount of collected data into a set bottleneck locations including where the most un-overlapped time is spent, and locations where the addition of some buffering could improve the systems performance without any other optimizations. Additionally, this technique provides accurate quantitative predictions of the benefit of eliminating bottlenecks. The end result of this analysis, minutes after the data is gathered, is: 1) the identity of the component that causes the bottleneck; 2) the extent to which a component must be improved before it is no longer the bottleneck; 3) the next bottleneck that will be exposed in the system; and 4) the performance improvement that will occur before the next bottleneck is reached. The analysis can be repeated for successive bottlenecks and is far faster than the available alternatives.

Simulating Red Storm: Challenges and Successes in Building a System Simulation

Conference Paper

Full-text available

Jan 2007

Supercomputers are increasingly complex systems merg- ing conventional microprocessors with system on a chip level designs that provide the network interface and router. At Sandia National Labs, we are developing a simulator to explore the complex interactions that occur at the sys- tem level. This paper presents an overview of the simula- tion framework with a focus on the enhancements needed to transform traditional simulation tools into a simulator ca- pable of modeling system level hardware interactions and running native software. Initial validation results demon- strate simulated performance that matches the Cray Red Storm system installed at Sandia. In addition, we include a ìwhat ifî study of performance implications on the Red Storm network interface.

Design trade-offs for user-level I/O architectures

Article

Full-text available

Sep 2006
IEEE T COMPUT

To address the growing I/O bottleneck, next-generation distributed I/O architectures employ scalable point-to-point interconnects and minimize operating system overhead by providing user-level access to the I/O subsystem. Reduced I/O overhead allows I/O intensive applications to efficiently employ latency hiding techniques for improved throughput. This paper presents the design of a novel scalable user-level I/O architecture and evaluates the impact of various architectural mechanisms in terms of overall performance improvement. Results demonstrate that eliminating data movement across protection domains is the dominant contributor to improved scalability. Eliminating system call and interrupt overhead only has a small additional benefit that may not justify the additional hardware support required. While this evaluation is based on one specific design, the conclusions can be generalized to other user-level I/O architectures.

Performance Analysis of System Overheads in TCP/IP Workloads.

Conference Paper

Jan 2005

Current high-performance computer systems are unable to saturate the latest available high-bandwidth networks such as 10 Gigabit Ethernet. A key obstacle in achieving 10 gigabits per second is the high overhead of communication between the CPU and network interface controller (NIC), which typically resides on a standard I/O bus with high access latency. Using several network-intensive benchmarks, we investigate the impact of this overhead by analyzing the performance of hypothetical systems in which the NIC is more closely coupled to the CPU, including integration on the CPU die. We find that systems with high-latency NICs spend a significant amount of time in the device driver. NIC integration can substantially reduce this overhead, providing significant throughput benefits when other CPU processing is not a bottleneck. NIC integration also enables cache placement of DMA data. This feature has tremendous benefits when pay-loads are touched quickly, but potentially can harm performance in other situations due to cache pollution.

The performance potential of an integrated network interface

Article

Full-text available

Jan 2004

High-bandwidth TCP/IP networking is a core component of current and future computer systems. Though networking is central to computing today, the vast majority of end-host networking research focuses on the current paradigm of the network interface being merely a peripheral device. Most optimizations focus solely on software changes or on moving some of the computation from the primary CPU to the off-chip network interface controller (NIC). We present an alternative approach for achieving high performance networking. Rather than increasing the complexity of the NIC, we directly integrate a conventional NIC on the CPU die. To evaluate this approach, we have developed a simulation environment specifically targeted for networked systems. It simulates server and client systems along with a network in a single process. Full-system simulation captures the execution of both application and OS code. Our model includes a detailed out-of-order CPU, event-driven memory hierarchy, and Ethernet interface device. Using this simulator, we find that tighter integration of the network interface can provide benefits in TCP/IP throughput and latency. We also see that the interaction of the NIC with the on-chip memory hierarchy has a greater impact on performance than the raw improvements in bandwidth and latency that come from integration.

EPIC-Explorer: A Parameterized VLIW-based Platform Framework for Design Space Exploration.

Conference Paper

Full-text available

Jan 2003

The constant increase in levels of integration and the reduction of the time-to-market have led to the definition of new methodologies stressing reuse. This involves not only the reuse of pre-designed processing components in the form of intellectual properties (IPs) but also that of pre-designed architectures. For such architectures to be reused for various applications they have to be heavily parameterized. Several manufacturers, in fact, produce pre-packed solutions for various classes of applications, in the form of parameterized system-on- a-chip (SOC) platforms. In this paper we present EPIC-Explorer, a framework to simulate a parameterized VLIW-based platform that will allow an embedded system designer to evaluate any instance of the platform in terms of performance, area and power consumption. The results obtained show that the framework can be effectively used to explore the space of possible configurations to evaluate the area/performance/power trade-off. The increase in levels of integration forecast for the coming decade indicate an enormous increase in the number of tran- sistors as compared with the previous decade and the imple- mentation of a whole system on a single chip. Unfortunately,

RSIM x86: A cost-effective performance simulator

Article

Full-text available

Aug 1993

In this paper we present RSIM x86, a port of the widely used RSIM performance simulator for cc-NUMA multi-processors to GNU/Linux and x86 hardware. Then, we evaluate the simulation throughput obtained by RSIM in several platforms with respect to the hardware cost of each platform. We show that this port of RSIM obtains much better execution times using cheaper and more eas-ily available hardware than the original RSIM, allowing a more efficient usage of our research resources.

Performance validation of network-intensive workloads on a full-system simulator

Article

Full-text available

Performance accuracy is a critical but often ne-glected aspect of architectural performance simulators. One approach to evaluating performance accuracy is to attempt to reproduce observed performance results from a real machine. In this paper, we attempt to model the performance of a Compaq Alpha XP1000 workstation using the M5 full-system simulator. There are two novel aspects to this work. First, we simulate complex TCP/IP networking workloads and use network bandwidth as our primary performance metric. Unlike conventional CPU-intensive applications, these workloads spend most of their time in the operating system kernel and include significant interactions with platform hardware such as the interrupt controller and network interface device. Second, we attempt to achieve performance accuracy without extremely precise modeling of the reference hardware. Instead, we use simple generic component models and tune them to achieve appropriate bandwidths and latencies. Overall, we were able to achieve reasonable accuracy even with our relatively imprecise model, matching the bandwidth of the real system within 15% in most cases. We also used profiling to break CPU time down into categories, and found that the simulation results correlated well with the real machine.

Physical Memory Layout for 2 Processes

Similar publications

Citations