Examples of call stack comparison

Source publication

Improving the Performance of Trace-based Systems by False Loop Filtering

Conference Paper

Full-text available

Mar 2011

Trace-based compilation is a promising technique for language compilers and binary translators. It offers the potential to expand the compilation scopes that have traditionally been limited by method boundaries. Detecting repeating cyclic execution paths and capturing the detected repetitions into traces is a key requirement for trace selection alg...

Context 1

... fact the call stacks are longer in actual execution (there are call stack elements corresponding to the callers of method f), but only the relevant parts are shown in Figure 2, and the irrelevant parts are shown as boxes with "...". The irrelevant parts are always excluded from the top k elements compared. ...

View in full-text

Context 2

... element of a call stack contains a return address. For example, the top element B of the call stack "...-B" in the first row of Figure 2 (b) means that control will return to B after the next method return. ...

View in full-text

2013 Clin Neurophysiol - IBM

Data

Full-text available

Jun 2013

SYSTEM WYBORU OPTYMALNYCH DECYZJI TRÓJWYMIAROWYCH PRZY WYKORZYSTANIU METODY EKSPERTÓW I WSPOMAGANIA KOMPUTEROWEGO

Article

Full-text available

Jan 1993

Stanisław Kowalik

Praca dotyczy wyboru decyzji optymalnej na podstawie na podstawie zbioru n decyzji podanych przez ekspertów. Decyzje trójwymiarowe albo przestrzenne określają trzy cechy wybranego zagadnienia. W pracy wprowadzono pojęcie bliskości dwóch decyzji trójwymiarowych oraz bliskości decyzji od zbioru decyzji sobie bliskich. Decyzje nie bliskie są eliminowa...

Operation of a school adaptation program considering the interpersonal needs of medical freshmen

Article

Full-text available

Dec 2014

We examined two overarching topics: What are the Fundamental Interpersonal Relations Orientation-Behavior (FIRO-B), Perceived Stress Scale (PSS), and Self-rating Depression Scale (SDS) scores in medical students? Do their interpersonal needs correlate with stress and depression? FIRO-B, PSS-10, and SDS were administered to 82 freshmen in College of...

Servitization e Knowledge Management. Il Caso della IBM Corporation.

Conference Paper

Full-text available

Jan 2014

Luna Leoni

Business-centric monitoring of service composites

Conference Paper

Full-text available

Dec 2011

Monitoring enterprise applications that consist of multiple heterogeneous components executing in different runtimes is a challenging problem particularly from a business centric perspective. We propose a business centric monitoring approach that involves using business information fields (invariants) to relate service activity to business composit...

Pycket: A Tracing JIT for a Functional Language

Article

Full-text available

Aug 2015
ACM SIGPLAN NOTICES

We present Pycket, a high-performance tracing JIT compiler for Racket. Pycket supports a wide variety of the sophisticated features in Racket such as contracts, continuations, classes, structures, dynamic binding, and more. On average, over a standard suite of benchmarks, Pycket outperforms existing compilers, both Racket's JIT and other highly-optimizing Scheme compilers. Further, Pycket provides much better performance for Racket proxies than existing systems, dramatically reducing the overhead of contracts and gradual typing. We validate this claim with performance evaluation on multiple existing benchmark suites. The Pycket implementation is of independent interest as an application of the RPython meta-tracing framework (originally created for PyPy), which automatically generates tracing JIT compilers from interpreters. Prior work on meta-tracing focuses on bytecode interpreters, whereas Pycket is a high-level interpreter based on the CEK abstract machine and operates directly on abstract syntax trees. Pycket supports proper tail calls and first-class continuations. In the setting of a functional language, where recursion and higher-order functions are more prevalent than explicit loops, the most significant performance challenge for a tracing JIT is identifying which control flows constitute a loop-we discuss two strategies for identifying loops and measure their impact.

Compilação Just-In-Time: Histórico, Arquitetura, Princípios e Sistemas

Article

Full-text available

May 2013

Improving Dynamic Binary Optimization Through Early-Exit Guided Code Region Formation

Conference Paper

Full-text available

Mar 2013
ACM SIGPLAN NOTICES

Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces. In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multi-threaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.

HQEMU: A multi-threaded and retargetable dynamic binary translator on multicores

Article

Full-text available

Mar 2012

Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and security. However, there are several factors that often impede its performance: (1) emulation overhead before translation; (2) translation and optimization overhead, and (3) translated code quality. On the dynamic binary translator itself, the issues also include its retargetability to support guest applications from different instruction-set architectures (ISAs) to host machines also with different ISAs, an important feature for system virtualization. In this work, we take advantage of the ubiquitous multicore platforms, using multithreaded approach to implement DBT. By running the translators and the dynamic binary optimizers on different threads on different cores, it could off-load the overhead caused by DBT on the target applications; thus, afford DBT of more sophisticated optimization techniques as well as the support of its retargetability. Using QEMU (a popular retargetable DBT for system virtualization) and LLVM (Low Level Virtual Machine) as our building blocks, we demonstrated in a multi-threaded DBT prototype, called HQEMU, that it could improve QEMU performance by a factor of 2.4X and 4X on the SPEC 2006 integer and floating point benchmarks for x86 to x86-64 emulations, respectively, i.e. it is only 2.5X and 2.1X slower than native execution of the same benchmarks on x86-64, as opposed to 6X and 8.4X slowdown on QEMU. For ARM to x86-64 emulation, HQEMU could gain a factor of 2.4X speedup over QEMU for the SPEC 2006 integer benchmarks.

Detecting and Escaping Infinite Loops with Jolt

Conference Paper

Full-text available

Jul 2011

Infinite loops can make applications unresponsive. Potential problems include lost work or output, denied access to application functionality, and a lack of responses to urgent events. We present Jolt, a novel system for dynamically detecting and escaping infinite loops. At the user’s request, Jolt attaches to an application to monitor its progress. Specifically, Jolt records the program state at the start of each loop iteration. If two consecutive loop iterations produce the same state, Jolt reports to the user that the application is in an infinite loop. At the user’s option, Jolt can then transfer control to a statement following the loop, thereby allowing the application to escape the infinite loop and ideally continue its productive execution. The immediate goal is to enable the application to execute long enough to save any pending work, finish any in-progress computations, or respond to any urgent events. We evaluated Jolt by applying it to detect and escape eight infinite loops in five benchmark applications. Jolt was able to detect seven of the eight infinite loops (the eighth changes the state on every iteration). We also evaluated the effect of escaping an infinite loop as an alternative to terminating the application. In all of our benchmark applications, escaping an infinite loop produced a more useful output than terminating the application. Finally, we evaluated how well escaping from an infinite loop approximated the correction that the developers later made to the application. For two out of our eight loops, escaping the infinite loop produced the same output as the corrected version of the application.

Dynamic Flow Analysis for JavaScript

Chapter

Feb 2019

Static flow analyses compute a safe approximation of a program’s dataflow without executing it. Dynamic flow analyses compute a similar safe approximation by running the program on test data such that it achieves sufficient coverage. We design and implement a dynamic flow analysis for JavaScript. Our formalization and implementation observe a program’s execution in a training run and generate flow constraints from the observations. We show that a solution of the constraints yields a safe approximation to the program’s dataflow if each path in every function is executed at least once in the training run. As a by-product, we can reconstruct types for JavaScript functions from the results of the flow analysis. Our implementation shows that dynamic flow analysis is feasible for JavaScript. While our formalization concentrates on a core language, the implementation covers full JavaScript. We evaluated the implementation using the SunSpider benchmark.

Improving Sequential Performance of Erlang Based on a Meta-tracing Just-In-Time Compiler: Methods and Protocols

Chapter

Jan 2019

In widely-used actor-based programming languages, such as Erlang, sequential execution performance is as important as scalability of concurrency. In order to improve sequential performance of Erlang, we develop Pyrlang, an Erlang virtual machine with a just-in-time (JIT) compiler by applying an existing meta-tracing JIT compiler. In this paper, we overview our implementation and present the optimization techniques for Erlang programs, most of which heavily rely on function recursion. Our preliminary evaluation showed approximately 38% speedup over the standard Erlang interpreter.

Processor-Tracing Guided Region Formation in Dynamic Binary Translation

Article

Full-text available

Nov 2018
ACM T ARCHIT CODE OP

Region formation is an important step in dynamic binary translation to select hot code regions for translation and optimization. The quality of the formed regions determines the extent of optimizations and thus determines the final execution performance. Moreover, the overall performance is very sensitive to the formation overhead, because region formation can have a non-trivial cost. For addressing the dual issues of region quality and region formation overhead, this article presents a lightweight region formation method guided by processor tracing, e.g., Intel PT. We leverage the branch history information stored in the processor to reconstruct the program execution profile and effectively form high-quality regions with low cost. Furthermore, we present the designs of lightweight hardware performance monitoring sampling and the branch instruction decode cache to minimize region formation overhead. Using ARM64 to x86-64 translations, the experiment results show that our method achieves a performance speedup of up to 1.53× (1.16× on average) for SPEC CPU2006 benchmarks with reference inputs, compared to the well-known software-based trace formation method, Next Executing Tail (NET). The performance results of x86-64 to ARM64 translations also show a speedup of up to 1.25× over NET for CINT2006 benchmarks with reference inputs. The comparison with a relaxed NETPlus region formation method further demonstrates that our method achieves the best performance and lowest compilation overhead.

Pyrlang: a high performance Erlang virtual machine based on RPython

Conference Paper

Oct 2015

In widely-used actor-based programming languages, such as Erlang, sequential execution performance is as important as scalability of concurrency. We are developing a virtual machine called Pyrlang for the Erlang BEAM bytecode with a just-in-time (JIT) compiler. By using RPython’s tracing JIT compiler, our preliminary evaluation showed approximately twice speedup over the standard Erlang interpreter. In this poster, we overview the design of Pyrlang and the tech- niques to apply RPython’s tracing JIT compiler to BEAM bytecode programs written in the Erlang’s functional style of programming.

Parallel Architecture Benchmarking: From Embedded Computing to HPC, a FiPS Project Perspective

Conference Paper

Aug 2014

With the growing numbers of both parallel architectures and related programming models, the benchmarking tasks become very tricky since parallel programming requires architecture-dependent compilers and languages as well as high programming expertise. More than just comparing architectures with synthetic benchmarks, benchmarking is also more and more used to design specialized systems composed of heterogeneous computing resources to optimize the performance or performance/watt ratio (e.g. embedded systems designers build System-on-Chip (SoC) out of dedicated and well-chosen components). In the High-Performance-Computing (HPC) domain, systems are designed with symmetric and scalable computing nodes built to deliver the highest performance on a wide variety of applications. However, HPC is now facing cost and power consumption issues which motivate the design of heterogeneous systems. This is one of the rationales of the European FiPS project, which proposes to develop hardware architecture and software methodology easing the design of such systems. Thus, having a fair comparison between architectures while considering an application is of growing importance. Unfortunately, porting it on all available architectures using the related programming models is impossible. To tackle this challenge, we introduced a novel methodology to evaluate and to compare parallel architectures in order to ease the work of the programmer. Based on the usage of micro benchmarks, code profiling and characterization tools, this methodology introduces a semi-automatic prediction of sequential applications performances on a set of parallel architectures. In addition, performance estimation is correlated with the cost of other criteria such as power or portability effort. Introduced for targeting vision-based embedded applications, our methodology is currently being extended to target more complex applications from HPC world. This paper extends our work with new experiments and early results on a real HPC application of DNA sequencing.

Examples of call stack comparison

Contexts in source publication

Similar publications

Citations