Native memory layout vs. DCLmanaged application memory layout.

Source publication

Dynamic code footprint optimization for the IBM Cell Broadband Engine

Article

Full-text available

May 2009

Multicore designers often add a small local memory close to each core to speed up access and to reduce off-chip IO. But this approach puts a burden on the programmer, the compiler, and the runtime system, since this memory lacks hardware support (cache logic, MMU, ...) and hence needs to be managed in software to exploit its performance poten-tial....

A Quantitative Analysis of Space Waste from Java Strings and its Elimination at Garbage Collection Time

Article

Full-text available

This paper describes a novel approach to reduce the memory consumption of Java programs, by reducing the string memory waste in the runtime. In recent Java applications, string data occupies a large amount of the heap area. For example, more than 30% of the live heap area is used for string data when WebSphere Applica-tion Server with Trade6 is run...

Efficient Memory Management for Long-Lived Objects

Article

Full-text available

Jan 2009

Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , pa...

International Workshop on Multicore Software Engineering (IWMSE 2009)

Conference Paper

Full-text available

Jan 2009

Microprocessor performance can no longer be greatly improved by simply increasing clock frequencies; instead, higher performance will have to come from parallelism. As multi/manycore processors with multiple CPUs on a chip become standard and affordable for everyone, software engineers face the challenge of parallelizing applications of all sorts. However, compared to sequential applications, our repertoire of tools and methods for cost-effectively developing reliable, parallel applications is spotty. The mission of this workshop is to bring together researchers and practitioners with diverse backgrounds in order to advance the state of the art in software engineering for multi/-manycore parallel applications. This is the second in a series of workshops specifically focusing on software engineering challenges of multi/manycore.

CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

Conference Paper

Jan 2013

The potential of heterogeneous multicores, like the Cell BE, can only be exploited if the host and the accelerator cores are used in par-allel and if the specific features of the cores are considered. Parallel pro-gramming, especially when applied to irregular task-parallel problems, is challenging itself. However, heterogeneous multicores add to that com-plexity due to their memory hierarchy and specialized accelerators. As a solution for these issues we present CellCilk, a prototype implementation of Cilk for heterogeneous multicores with a host/accelerator design, using the Cell BE in particular. CellCilk introduces a new keyword (spu spawn) for task creation on the accelerator cores. Task scheduling and load bal-ancing are done by a novel dynamic cross-hierarchy work-stealing regime. Furthermore, the CellCilk runtime employs a garbage collection mecha-nism for distributed data structures that are created during scheduling. On benchmarks we achieve a good speedup and reasonable runtimes, even when compared to manually parallelized codes.

Reducing memory space consumption through dataflow analysis

Article

Oct 2011
COMPUT LANG SYST STR

Ozcan Ozturk

Memory is a key parameter in embedded systems since both code complexity of embedded applications and amount of data they process are increasing. While it is true that the memory capacity of embedded systems is continuously increasing, the increases in the application complexity and dataset sizes are far greater. As a consequence, the memory space demand of code and data should be kept minimum. To reduce the memory space consumption of embedded systems, this paper proposes a control flow graph (CFG) based technique. Specifically, it tracks the lifetime of instructions at the basic block level. Based on the CFG analysis, if a basic block is known to be not accessible in the rest of the program execution, the instruction memory space allocated to this basic block is reclaimed. On the other hand, if the memory allocated to this basic block cannot be reclaimed, we try to compress this basic block. This way, it is possible to effectively use the available on-chip memory, thereby satisfying most of instruction/data requests from the on-chip memory. Our experiments with this framework show that it outperforms the previously proposed CFG-based memory reduction approaches.

Native memory layout vs. DCLmanaged application memory layout.

Similar publications

Citations