Home
Larry Rudolph

Larry Rudolph
Two Sigma Investments

About

212

Publications

21,516

Reads

9,018

Citations

Skills and Expertise

Parallel Computing

January 2009 - present

VMware

January 2007 - present

Singapore-MIT Alliance

September 1996 - February 2007

Singapore-MIT Alliance

false
Cambridge, United States

Position

Principle ResearchScientist

Publications

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

Preprint

May 2020

We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). Specifically, we investigate the consequences of "code-level optimizations:" algorithm augmentations found only in implementations or described...

Anergy to Synergy – The Energy Fueling the RxCOVEA Framework

Article

Jan 2020

We write to introduce our novel group formed to confront some of the issues raised by the COVID-19 pandemic. Information about the group, which we named "cure COVid for Ever and for All" (RxCOVEA), its dynamic membership (changing regularly), and some of its activities-described in more technical detail for expert perusal and commentary-are availab...

D3N: A multi-layer cache for the rest of us

Conference Paper

Full-text available

Dec 2019

Supporting Security Sensitive Tenants in a Bare-Metal Cloud

Preprint

Full-text available

Jul 2019

Bolted is a new architecture for bare-metal clouds that enables tenants to control tradeoffs between security, price, and performance. Security-sensitive tenants can minimize their trust in the public cloud provider and achieve similar levels of security and control that they can obtain in their own private data centers. At the same time, Bolted ne...

A Secure Cloud with Minimal Provider Trust

Preprint

Full-text available

Jul 2019

Bolted is a new architecture for a bare metal cloud with the goal of providing security-sensitive customers of a cloud the same level of security and control that they can obtain in their own private data centers. It allows tenants to elastically allocate secure resources within a cloud while being protected from other previous, current, and future...

Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Preprint

Nov 2018

We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. We propose a fine-grained analysis of state-of-the-art methods based on key aspects of this framework: gradient estimation, value prediction, optimization landscapes, and trust region enforcement. We find that from this persp...

A Secure Cloud with Minimal Provider Trust

Conference Paper

Jul 2018

Sundial: harmonizing concurrency control and caching in a distributed OLTP database management system

Article

Jun 2018

Distributed transactions suffer from poor performance due to two major limiting factors. First, distributed transactions suffer from high latency because each of their accesses to remote data incurs a long network delay. Second, this high latency increases the likelihood of contention among distributed transactions, leading to high abort rates and...

Managing travel demand: Location recommendation for system efficiency based on mobile phone data

Article

Oct 2016

Growth in leisure travel has become increasingly significant economically, socially, and environmentally. However, flexible but uncoordinated travel behaviors exacerbate traffic congestion. Mobile phone records not only reveal human mobility patterns, but also enable us to manage travel demand for system efficiency. In this paper, we propose a loca...

Thunderstrike

Conference Paper

May 2015

There are several flaws in Apple's MacBook firmware security that allows untrusted modifications to be written to the SPI Flash boot ROM of these laptops. This capability represents a new class of persistent firmware rootkits, or 'bootkits', for the popular Apple MacBook product line. Stealthy bootkits can conceal themselves from detection and prev...

Using OpenStack for an Open Cloud Exchange(OCX)

Article

Full-text available

Mar 2015

We are developing a new public cloud, the Massachusetts Open Cloud (MOC) based on the model of an Open Cloud eXchange (OCX). We discuss in this paper the vision of an OCX and how we intend to realize it using the OpenStack open-source cloud platform in the MOC. A limited form of an OCX can be achieved today by layering new services on top of OpenSt...

Analytical cache models with applications to cache partitioning

Article

Jun 2014

This paper presents the author retrospective on the analytical cache modeling work published in the 2001 International Conference on Supercomputing (ICS). We summarize the history of the work, revisit primary observations and lessons that we learned from the modeling effort, and also briefly describe follow-up work to show how the research directio...

A Virtualization Infrastructure that Supports Pervasive Computing

Article

Full-text available

Jan 2010

Larry Rudolph

Smart phones might well be the most powerful pervasive embedded device and the ideal platform for pervasive computing. Virtualization technology offers a practical means for the widespread deployment of the necessary middleware.

Dynamic Optical Circuit Switching Applied to Storage Area Networks

Conference Paper

Nov 2009

This paper presents a new weight incidence representation of Dynamic wavelength addressing in optical fiber networks utilizing wavelength division multiplexing (WDM) can form the basis for a high-performance, high-bandwidth, low-latency any-to-any interconnection network. WDM optical fiber networks exploit the fact that photons of different wavelen...

Cognitive personal positioning based on activity map and adaptive particle filter

Conference Paper

Full-text available

Oct 2009

This paper presents a cognitive approach for a reliable yet battery-friendly personal positioning. A user's position is learned from both historical log and possible measurements. Firstly, user's past activities recorded in the log are summarized into an activity map. Accordingly, a user-habit guided particle filtering algorithm is presented for po...

Bluetooth Essentials for Programmers

Book

Sep 2009

This book provides an introduction to Bluetooth programming, with a specific focus on developing real code. The authors discuss the major concepts and techniques involved in Bluetooth programming, with special emphasis on how they relate to other networking technologies. They provide specific descriptions and examples for creating applications in a...

Mining User Position Log for Construction of Personalized Activity Map

Conference Paper

Full-text available

Aug 2009

Consider a scenario in which a smart phone automatically saves the user’s positional records for personalized location-based applications. The smart phone will infer patterns of user activities from the historical records and predict user’s future movements. In this paper, we present algorithms for mining the evolving positional logs in order to id...

Controlling Uncertainty in Personal Positioning at Minimal Measurement Cost

Conference Paper

Full-text available

Jun 2008

One interesting scenario in personal positioning involves an energy-conscious mobile user who tries to obtain estimates about his positions with sufficiently high confidence while consuming as little battery energy as possible. Besides obtaining estimates directly from a position measuring device, the user can rely on extrapolative calculations bas...

"Zen" and the art of petascale ocean modeling: a conceptual analysis of how virtualization could be key to bringing individual science back to petascale ocean modeling

Article

Mar 2008

We speculate on a novel role virtualization could play in creating a rounded, balanced physical science and engineering software ecosystem to support petascale computational science. The motivation for this analysis is a quest for ways to engage a broader spectrum of expertise in state-of-the-art petascale modeling activities. Current generation pe...

How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation

Conference Paper

Full-text available

Mar 2008

Application debugging is a tedious but inevitable chore in any software development project. An effective debugger can make programmers more productive by allowing them to pause execution and inspect the state of the process, or monitor writes to memory to detect data corruption. The latter is a notoriously difficult category of bugs to diagnose an...

Report on education roundtable: experimentaion in the computer science curriculum

Conference Paper

Jun 2007

Measuring the indirect cost of context switch is a challenging problem. In this paper, we show our results of experimentally quantifying the indirect cost of context switch using a synthetic workload. Specifically, we measure the impact of program data ...

General-purpose operating systems, such as Linux,

Conference Paper

Apr 2007

Modern memory systems play a critical role in the performance of applications, but a detailed understanding of the application behavior in the memory system is not trivial to attain. It requires time consuming simulations and detailed modeling of the memory hierarchy, often using long address traces. It is increasingly possible to access hardware p...

Ubiquitous Memory Introspection.

Conference Paper

Full-text available

Jan 2007

Modern memory systems play a critical role in the per- formance of applications, but a detailed understanding of the application behavior in the memory system is not trivial to attain. It requires time consuming simulations and de- tailed modeling of the memory hierarchy, often using long address traces. It is increasingly possible to access hard-...

Ubiquitous Memory Introspection (Preliminary Manuscript)

Article

Full-text available

Sep 2006

Modern memory systems play a critical role in the performance ofapplications, but a detailed understanding of the application behaviorin the memory system is not trivial to attain. It requires timeconsuming simulations of the memory hierarchy using long traces, andoften using detailed modeling. It is increasingly possible to accesshardware performa...

DEP: Detailed execution profile

Conference Paper

Full-text available

Sep 2006

In many areas of computer architecture design and program development, the knowledge of dynamic program behavior can be very handy. Several challenges beset the accurate and complete collection of dynamic control flow and mem- ory reference information. These include scalability issues, runtime-overhead, and code coverage. For example, while Tallam...

Cooperative checkpointing

Conference Paper

Full-text available

Jun 2006

Cooperative checkpointing uses global knowledge of the state and health of the machine to improve performance and reliability by dynamically deciding when to skip checkpoint requests made by applications. Using results from cooperative checkpointing theory, this paper proves that periodic checkpointing is not expected to be competitive with the off...

Cooperative checkpointing: A robust approach to large-scale systems reliability

Conference Paper

Full-text available

Jun 2006

Cooperative checkpointing increases the performance and robustness of a system by allowing checkpoints requested by applications to be dynamically skipped at runtime. A robust system must be more than merely resilient to failures; it must be adaptable and flexible in the face of new and evolving challenges. A simulation-based experimental analysis...

Robust network connectivity

Article

Jun 2006

This work analyzes the connectivity of large diameter networks where every link has an independent probability p of failure. We give a (relatively simple) topological condition that guarantees good connectivity between regions of such a network. Good connectivity means that the regions are connected by nearly as many disjoint, fault-free paths as t...

Parallel job scheduling: Issues and approaches

Conference Paper

Full-text available

Jan 2006

Parallel job scheduling is beginning to gain recognition as an important topic that is distinct from the scheduling of tasks within a parallel job by the programmer or runtime system. The main issue is how to share the resources of the parallel machine among a number of competing jobs, giving each the required level of service. This level of schedu...

A complexity theory of efficient parallel algorithms

Conference Paper

Jan 2006

Theoretical research on parallel algorithms has focused on NC theory. This motivates the development of parallel algorithms that are extremely fast, but possibly wasteful in their use of processors. Such algorithms seem of limited interest for real applications currently run on parallel computers. This paper explores an alternative approach that em...

Robust network connectivity: when it’s the big picture that matters

Conference Paper

Full-text available

Jan 2006

This work analyzes the connectivity of large diameter net- works where every link has an independent probability p of failure. We give a (relatively simple) topological condi- tion that guarantees good connectivity between regions of such a network. Good connectivity means that the regions are connected by nearly as many disjoint, fault-free paths...

Modeling Information Flow in Face-to-Face Meetings while Protecting Privacy

Article

Full-text available

Dec 2005

Social networks have been used to understand how information flows through an organization as well as identifying individuals that appear to have control over this information flow. Such individuals are identified as being central nodes in a graph representation of the social network and have high "betweenness" values. Rather than looking at graphs...

Relaxing Routing Table to Alleviate Dynamism in P2P Systems

Article

Full-text available

Dec 2005

In dynamic P2P networks, nodes join and depart from the system frequently, which partially damages the predefined P2P structure, and impairs the system performance such as basic lookup functionality. Therefore stabilization process has to be done to restore the logical topology. This paper presents an approach to relax the requirement on routing ta...

Probabilistic QoS Guarantees for Supercomputing Systems.

Conference Paper

Full-text available

Jan 2005

Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the system and users to negotiate a mutually desirable risk strategy; in order to accomplish this, the system makes probabilistic guarantees on quality of service (QoS), of the...

Kimono: Kiosk-mobile phone knowledge sharing system

Conference Paper

Full-text available

Jan 2005

The functionality of an information kiosk can be extended by allowing it to interact with a smartphone, as demonstrated by the Kimono system, and the user interface can be greatly simplified by "associations" between pieces of information. A kiosk provides information that is relevant to a particu- lar location and can use valuable context informat...

Job Scheduling Strategies for Parallel Processing, 11th International Workshop, JSSPP 2005, Cambridge, MA, USA, June 19, 2005, Revised Selected Papers

Article

Jan 2005

Job Scheduling Strategies for Parallel Processing, 10th International Workshop, JSSPP 2004, New York, NY, USA, June 13, 2004, Revised Selected Papers

Article

Jan 2005

Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, Cambridge, Massachusetts, USA, June 20-22, 2005

Article

Jan 2005

Lecture Notes in Computer Science: Preface

Article

Jan 2005

Job Scheduling Strategies for Parallel Processing: 11th International Workshop, JSSPP 2005, Cambridge, MA, USA, June 19, 2005, Revised Selected Papers

Book

Jan 2005

A Privacy Conscious Bluetooth Infrastructure for Location Aware Computing

Article

Full-text available

Dec 2004

We present a low cost and easily deployed infrastructure for location aware computing that is built using standard Bluetooth® technologies and personal computers. Mobile devices are able to determine their location to room-level granularity with existing bluetooth technology, and to even greater resolution with the use of the recently adopted bluet...

Proactive Detection and Recovery of Lost Mobile Phones

Article

Full-text available

Dec 2004

This paper describes the successful implementation of a prototype software application that independently and proactively detects whether a mobile phone is lost or misused. When the mobile phone is detected as being lost or misused, the application takes steps to mitigate the impact of loss and to gather evidence. The goal is to aid in the recovery...

Parallel Job Scheduling --- A Status Report

Conference Paper

Jul 2004

Introduction Scheduling parallel jobs has been a popular research topic for many years. A couple of surveys have been written on this topic in the context of parallel supercomputers [17, 20]. The purpose of the present paper is to update this material, and to extend it to include work concerning clusters and the grid. The first part of the paper de...

Dynamic Partitioning of Shared Cache Memory

Article

Full-text available

Apr 2004

This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches. Since memory reference characteristics of processes/threads can change over time, our method collects the cache miss characteristics of processes/threads at run-ti...

PADCAM: A Human-Centric Perceptual Interface for Temporal Recovery of Pen-Based Input

Article

Full-text available

Jan 2004

We present a perceptual interface for pen-based input that uses live video of handwriting and recovers the time-ordered sequence of strokes that were written. Our system employs a novel algorithm for page detection and explores the use of frame differencing and pen tracking to reconstruct the tempo-ral information in the input video sequence. We pr...

Passdoodles; a Lightweight Authentication Method

Article

Full-text available

Jan 2004

Content Exchange Appliances

Article

Full-text available

Dec 2003

WWW and current Information Technology have made it easy to display a wide variety of content on desktops and personal devices. Unfortunately, little progress has been made for access to the content in public areas. Some technologies, such as Internet Kiosks and narrowcast, enable content access (primarily viewing), but not exchange. There is a gro...

commanimation: Creating and managing animations via speech

Article

Full-text available

Dec 2003

A speech controlled animation system is both a useful application program as well as a laboratory in which to investigate context aware applications as well as controlling errors. The user need not have prior knowledge or experience in animation and is yet able to create interesting and meaningful animation naturally and fluently. The system can be...

A Dynamically Partitionable Compressed Cache

Article

Full-text available

Nov 2003

The effective size of an L2 cache can be increased by using a dictionary-based compression scheme. Naive application of this idea performs poorly since the data values in a cache greatly vary in their “compressibility.” The novelty of this paper is a scheme that dynamically partitions the cache into sections of different compressibilities. While co...

What I did on my fall vacation - A pervasive computing class

Article

Full-text available

May 2003

Larry Rudolph

This Column describes a project-based, hands-on pervasive computing course offered at MIT during the fall 2001 semester. Later, I helped distill the course into an intensive one-week experience that MIT offered in spring 2002 and once again in winter 2003. Much has been carried from these three instantiations.

Job Scheduling Strategies for Parallel Processing, 9th International Workshop, JSSPP 2003, Seattle, WA, USA, June 24, 2003, Revised Papers

Article

Jan 2003

Job Scheduling Strategies for Parallel Processing: 9th International Workshop, JSSPP 2003, Seattle, WA, USA, June 24, 2003. Revised Paper

Book

Jan 2003

Malleable Caches

Article

Full-text available

Nov 2002

Managing the memory hierarchy is important for providing good performance of data intensive computation. This effort has explored several techniques for managing the cache in a microprocessor. This report examines column caching, cache partitioning, and cache compression techniques, especially in regards to the Data Intensive System (DIS) benchmark...

Scheduling and Load Balancing

Conference Paper

Aug 2002

Despite the large number of papers that have been published, scheduling and load balancing continue to be an active area of research. The topic covers all aspects related to scheduling and load balancing including application and system level techniques, theoretical foundations and practical tools. New aspects of parallel and distributed systems, s...

ParC - An Extension of C for Shared Memory Parallel Processing

Article

Jun 2002

this paper we describe the features and semantics of ParC. The rest of this section explains the motivation for designing a new language, the eect of the motivating forces on the design, and the structure of the software environment that surrounds it. The next section describes the parallel constructs and scoping rules. The exact semantics of paral...

A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

Conference Paper

Full-text available

Mar 2002

We propose a low overhead, online memory monitoring scheme utilizing a set of novel hardware counters. The counters indicate the marginal gain in cache hits as the size of the cache is increased, which gives the cache miss-rate as a function of cache size. Using the counters, we describe a scheme that enables an accurate estimate of the isolated mi...

Job Scheduling Strategies for Parallel Processing, 8th International Workshop, JSSPP 2002, Edinburgh, Scotland, UK, July 24, 2002, Revised Papers

Article

Jan 2002

Job Scheduling Strategies for Parallel Processing: 8th International Workshop, JSSPP 2002 Edinburgh, Scotland, UK, July 24, 2002 Revised Papers

Book

Nov 2002

Effects of Memory Performance on Parallel Job Scheduling

Conference Paper

Full-text available

Sep 2001

We develop a new metric for job scheduling that includes the effects of memory contention amongst simultaneously-executing jobs that share a given level of memory. Rather than assuming each job or process has a fixed, static memory requirement, we consider a general scenario wherein a process' performance monotonically increases as a function of al...

Dynamic Cache Partitioning for Simultaneous Multithreading Systems

Article

Full-text available

Sep 2001

This paper proposes a dynamic cache partitioning method for simultaneous multithreading systems. We present a general partitioning scheme that can be applied to setassociative caches at any partition granularity. Furthermore, in our scheme threads can have overlapping partitions, which provides more degrees of freedom when partitioning caches with...

Evaluation of Design Choices for Gang Scheduling Using Distributed Hierarchical Control

Article

Aug 2001

Gang scheduling --- the scheduling of a number of related threads to execute simultaneously on distinct processors --- appears to meet the requirements of interactive, multiuser, generalpurpose parallel systems. Distributed Hierarchical Control (DHC) has been proposed as an efficient mechanism for coping with the dynamic processor partitioning nece...

Application-Specific Memory Management for Embedded Systems Using Software-Controlled Caches

Article

Full-text available

Jun 2001

We propose a way to improve the performance of embedded processors running data-intensive applications by allowing software to allocate on-chip memory on an applicationspecific basis. On-chip memory in the form of cache can be made to act like scratchpad memory via a novel hardware mechanism, which we call column caching. Column caching enables dyn...

Analytical Cache Models with Applications to Cache Partitioning

Article

Full-text available

Jun 2001

An accurate, tractable, analytic cache model for time-shared systems is presented, which estimates the overall cache missrate of a multiprocessing system with any cache size and time quanta. The input to the model consists of the isolated miss-rate curves for each process, the time quanta for each of the executing processes, and the total cache siz...

Message Passing Support for Multi-grained,

Article

Apr 2001

In order to become generally useful, message passing mechanisms not only need to provide high performance, but also the three M's: multi-granularity, multi-threading and multiprocessing.

Computation Structures Group Memo 392

Article

Apr 2001

This paper presents the architecture of a network interface unit (NIU) to provide a wide range of shared memory and message passing (S&M) semantics within the confines of an open system composed of a cluster of SMP's and a high speed interconnection network. An open system allows commodity items to be easily assembled and replaced without requiring...

Computation Structures Group Memo 387

Article

Apr 2001

No single message passing mechanism can efficiently support all the different types of communication that occur naturally in most parallel or distributed programs. MIT's StarT-Voyager, a hybrid message passing/shared memory parallel machine, provides four message passing mechanisms to achieve very high performance over a wide spectrum of communicat...

Developing and Refining an Adaptive Token-Passing Strategy.

Conference Paper

Full-text available

Apr 2001

Token rotation algorithms play an important role in distributed computing, to support such activities as mutual exclusion, round-robin scheduling, group membership and group communication protocols. Ring-based protocols maximize throughput in busy systems but can incur a linear (in the number of processors) delay when a processor needs to obtain a...

Software-assisted Cache Replacement Mechanisms for Embedded Systems

Conference Paper

Full-text available

Feb 2001

We address the problem of improving cache predictability and performance in embedded systems through the use of software-assisted replacement mechanisms. These mechanisms require additional software controlled state information that affects the cache replacement decision. Software instructions allow a program to kill a particular cache element, i.e...

Project Oxygen: Pervasive, Human-Centric Computing – An Initial Experience

Conference Paper

Full-text available

Jan 2001

Larry Rudolph

For the past six months, I have been integrating several experimental, cutting-edge technologies developed by my colleagues at MIT as part of the MIT LCS/AIL Oxygen project. This paper gives a snapshot of this work-in-progress. Project Oxygen is a collaborative effort involving many research activities throughout the Laboratory for Computer Science...

Job Scheduling Strategies for Parallel Processing, 7th International Workshop, JSSPP 2001, Cambridge, MA, USA, June 16, 2001, Revised Papers

Book

Jan 2001

Job-Speculative Prefetching: Eliminating Page Faults From Context Switches in Time-Shared Systems

Article

Full-text available

Jan 2001

Abstract When multiple applications have to time-share limited physical memory resources, they can incur significant performance,degradation at the beginning of their respective time slices due to page faults. We propose a method to significantly improve memory,system and overall performance,in time-shared computers,using job-speculative prefetchin...

Scheduler-Based Prefetching for Multilevel Memories

Article

Jan 2001

Memory latency is a significant bottleneck in modern computing systems. With few exceptions, the process/thread/task currently running on the CPU implicitly owns all caches and main memory. Essentially, the memory is scheduled with the CPU. Generally, however, at least one memory resource is larger than any one process/thread/task and thus there is...

Valuation of Ultra-scale Computing Systems

Conference Paper

May 2000

The goal of the Ultra-Scale Computing Valuation Project is to understand utilization issues for both users and managers of the largest scientific computing systems and to begin developing appropriate metrics and models for such system. This paper describes a few aspects of the project.

Micro-Architectures of High Performance, Multi-User System Area Network Interface Cards.

Conference Paper

Full-text available

Jan 2000

This paper examines two Network Interface Card micro-architectures that support low latency, high bandwidth user level message passing in multi-user environments. The two are at different ends of a design spectrum-the Resident queues design relies completely on hardware, while the Non-resident queues design is heavily firmware driven. Through actua...

Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, JSSPP 2000, Cancun, Mexico, May 1, 2000, Proceedings

Article

Jan 2000

Application-Specific Memory Management for Embedded Systems

Article

Jan 2000

We propose a way to improve the performance of embedded processorsrunning data-intensive applications by allowing softwareto allocate on-chip memory on an application-specific basis. Onchipmemory in the form of cache can be made to act like scratchpadmemory via a novel hardware mechanism, which we call column caching. Column caching enables dynamic...

Dynamic Cache Partitioning via Columnization

Article

Jan 2000

This paper introduces column caching, a flexible mechanism that allows software to dynamically customize cache behavior through fine-grain control of its placement policy. For a set-associative cache, specific data can be restricted to a subset of the usual target cache set during replacement. Through this simple enhancement, column caching enables...

Job Scheduling Strategies for Parallel Processing: IPDPS 2000 Workshop, JSSPP 2000 Cancun, Mexico, May 1, 2000 Proceedings

Book

Jan 2000

CACHET: An Adaptive Cache Coherence Protocol for Distributed Shared-Memory Systems

Conference Paper

May 1999

Abstract An adaptive cache coherence protocol changes its ac - tions to address changing program behaviors We present an adaptive protocol called Cachet for distributed shared - memory systems Cachet is a seamless integration of several micro - protocols, each of which has been optimized for a par - ticular memory access pattern Cachet embodies bot...

Commit-reconcile & fences (CRF)

Article

May 1999

We present a new mechanism-oriented memory model called Commit-Reconcile & Fences (CRF) and define it using algebraic rules. Many existing memory models can be described as restricted versions of CRF. The model has been designed so that it is both easy for architects to implement, and stable enough to serve as a target machine interface for compile...

Message passing support on StarT-Voyager

Conference Paper

Full-text available

Jan 1999

No single message passing mechanism can efficiently support all types of communication that commonly occur in most parallel or distributed programs. MIT's StarT-Voyager, a hybrid message passing/shared memory parallel machine, provides four message passing mechanisms to achieve high performance over a wide spectrum of communication types and sizes....

Commit-Reconcile & Fences (CRF): A New Memory Model for Architects and Compiler Writers.

Conference Paper

Full-text available

Jan 1999

We present a new mechanism-oriented memory model called Commit-Reconcile & Fences (CRF) and define it using algebraic rules. Many existing memory models can be described as restricted versions of CRF. The model has been designed so that it is both easy for architects to implement and stable enough to serve as a target machine interface for compiler...

Job Scheduling Strategies for Parallel Processing, IPPS/SPDP'99 Workshop, JSSPP'99, San Juan, Puerto Rico, April 16, 1999, Proceedings

Article

Jan 1999

Job Scheduling Strategies for Parallel Processing: IPPS/SPDP’99Workshop, JSSPP’99 San Juan, Puerto Rico, April 16, 1999 Proceedings

Book

Jan 1999

Optical switchcubes for communications in parallel processors

Article

Jan 1999

We propose the concept of an “optical switchcube” to help handling the communication load in a parallel computer composed of tens or hundreds of processor clusters operating under the loosely synchronous model of parallel computing. A switchcube can establish a communication and send one packet of data from any node to any combination of the other...

Limitations on Optical Free-Space Crossbar-Like Interconnection Networks

Article

Full-text available

Dec 1998

Providing the required interconnections between the processors of a parallel computer is a difficult problem: latency, switching control, cost, and crosstalk effects have to be taken into account. It is widely believed that the design might be simplified if optical technology is used. However, even optical interconnections cannot cater for an unlim...

StarT-Voyager: A Flexible Platform for Exploring Scalable SMP Issues

Conference Paper

Dec 1998

This paper describes StarT-Voyager, a machine designed as an experimental platform for research in cluster system communication. The heart of StarT-Voyager is a network interface unit (NIU) that connects the memory bus of a PowerPC-based SMP to the MIT Arctic network. The NIU is highly flexible, with its set of functions easily modified by firmware...

Accelerating Multi-Media Processing by Implementing Memoing in Multiplication and Division Units

Article

Nov 1998

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root ...) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-...

The START-VOYAGER parallel system

Conference Paper

Full-text available

Nov 1998

This paper presents the communication architecture of the START-VOYAGER system, a parallel machine composed of a cluster of unmodified IBM 604e-based SMP's connected via a high speed interconnection network. A custom network interface unit (NIU) plugs into a processor card slot of each SMP, providing a high-performance message passing substrate tha...

Accelerating MultiMedia Processing by Implementing Memoing in Multiplication and Division Units

Article

Oct 1998

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-Me...

Toward Convergence in Job Schedulers for Parallel Supercomputers

Article

Full-text available

Sep 1998

. The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is p...

Theory and Practice in Parallel Job Scheduling

Article

Full-text available

Sep 1998

The scheduling of jobs on parallel supercomputer is becoming the subject of much research. However, there is concern about the divergence of theory and practice. We review theoretical research in this area, and recommendations based on recent results. This is contrasted with a proposal for standard interfaces among the components of a scheduling sy...

The NYU ultracomputer—designing a MIMD, shared-memory parallel machine

Conference Paper

Aug 1998

Do parallel computers really need optical interconnection networks?

Conference Paper

Jul 1998

Larry Rudolph

Several recent architectural trends appear to require some of the special properties of optical interconnection networks. The memory hierarchy will probably get deeper and the hardware will be more dynamic. Reconfigurable hardware and software are the subject of intense research. It is likely that the basic word size and the number of communication...

Design Options for Interconnecting a 100+ TFlop/sec Parallel Supercomputer in 2004

Conference Paper

Jul 1998

Not Available

Metrics and Benchmarking for Parallel Job Scheduling

Conference Paper

Full-text available

Mar 1998

The evaluation of parallel job schedulers hinges on two things: the use of appropriate metrics, and the use of appropriate workloads on which the scheduler can operate. We argue that the focus should be on on-line open systems, and propose that a standard workload should be used as a benchmark for schedulers. This benchmark will specify distributio...

Job Scheduling Strategies for Parallel Processing, IPPS/SPDP'98 Workshop, Orlando, Florida, USA, March 30, 1998, Proceedings

Article

Jan 1998

Job scheduling for parallel supercomputers

Article

Jan 1998

Theory and Practice in Parallel Job Scheduling

Conference Paper

Full-text available

Sep 1997

Parallel job scheduling has gained increasing recognition in recent years as a distinct area of study. However, there is concern about the divergence of theory and practice in the field. We review theoretical research in this area, and recommendations based on recent results. This is contrasted with a proposal for standard interfaces among the comp...

Implications of I/O for Gang Scheduled Workloads

Conference Paper

Full-text available

Apr 1997

The job workloads of general-purpose multiprocessors usually include both compute-bound parallel jobs, which often require gang scheduling, as well as I/O-bound jobs, which require high CPU priority for the individual gang members of the job in order to achieve interactive response times. Our results indicate that an effective interactive multiproc...

Network

Eli Upfal
Brown University
Miron Livny
University of Wisconsin–Madison
Ramin Yahyapour
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
Nancy Lynch
Massachusetts Institute of Technology
Joel L. Wolf
IBM