Larry Rudolph

Larry Rudolph
Two Sigma Investments

About

212
Publications
21,516
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,018
Citations
Introduction
Skills and Expertise
Additional affiliations
January 2009 - present
January 2007 - present
September 1996 - February 2007
Singapore-MIT Alliance
Position
  • Principle ResearchScientist

Publications

Publications (212)
Preprint
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). Specifically, we investigate the consequences of "code-level optimizations:" algorithm augmentations found only in implementations or described...
Article
We write to introduce our novel group formed to confront some of the issues raised by the COVID-19 pandemic. Information about the group, which we named "cure COVid for Ever and for All" (RxCOVEA), its dynamic membership (changing regularly), and some of its activities-described in more technical detail for expert perusal and commentary-are availab...
Preprint
Full-text available
Bolted is a new architecture for bare-metal clouds that enables tenants to control tradeoffs between security, price, and performance. Security-sensitive tenants can minimize their trust in the public cloud provider and achieve similar levels of security and control that they can obtain in their own private data centers. At the same time, Bolted ne...
Preprint
Full-text available
Bolted is a new architecture for a bare metal cloud with the goal of providing security-sensitive customers of a cloud the same level of security and control that they can obtain in their own private data centers. It allows tenants to elastically allocate secure resources within a cloud while being protected from other previous, current, and future...
Preprint
We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. We propose a fine-grained analysis of state-of-the-art methods based on key aspects of this framework: gradient estimation, value prediction, optimization landscapes, and trust region enforcement. We find that from this persp...
Conference Paper
Bolted is a new architecture for a bare metal cloud with the goal of providing security-sensitive customers of a cloud the same level of security and control that they can obtain in their own private data centers. It allows tenants to elastically allocate secure resources within a cloud while being protected from other previous, current, and future...
Article
Distributed transactions suffer from poor performance due to two major limiting factors. First, distributed transactions suffer from high latency because each of their accesses to remote data incurs a long network delay. Second, this high latency increases the likelihood of contention among distributed transactions, leading to high abort rates and...
Article
Growth in leisure travel has become increasingly significant economically, socially, and environmentally. However, flexible but uncoordinated travel behaviors exacerbate traffic congestion. Mobile phone records not only reveal human mobility patterns, but also enable us to manage travel demand for system efficiency. In this paper, we propose a loca...
Conference Paper
There are several flaws in Apple's MacBook firmware security that allows untrusted modifications to be written to the SPI Flash boot ROM of these laptops. This capability represents a new class of persistent firmware rootkits, or 'bootkits', for the popular Apple MacBook product line. Stealthy bootkits can conceal themselves from detection and prev...
Article
Full-text available
We are developing a new public cloud, the Massachusetts Open Cloud (MOC) based on the model of an Open Cloud eXchange (OCX). We discuss in this paper the vision of an OCX and how we intend to realize it using the OpenStack open-source cloud platform in the MOC. A limited form of an OCX can be achieved today by layering new services on top of OpenSt...
Article
This paper presents the author retrospective on the analytical cache modeling work published in the 2001 International Conference on Supercomputing (ICS). We summarize the history of the work, revisit primary observations and lessons that we learned from the modeling effort, and also briefly describe follow-up work to show how the research directio...
Article
Full-text available
Smart phones might well be the most powerful pervasive embedded device and the ideal platform for pervasive computing. Virtualization technology offers a practical means for the widespread deployment of the necessary middleware.
Conference Paper
This paper presents a new weight incidence representation of Dynamic wavelength addressing in optical fiber networks utilizing wavelength division multiplexing (WDM) can form the basis for a high-performance, high-bandwidth, low-latency any-to-any interconnection network. WDM optical fiber networks exploit the fact that photons of different wavelen...
Conference Paper
Full-text available
This paper presents a cognitive approach for a reliable yet battery-friendly personal positioning. A user's position is learned from both historical log and possible measurements. Firstly, user's past activities recorded in the log are summarized into an activity map. Accordingly, a user-habit guided particle filtering algorithm is presented for po...
Book
This book provides an introduction to Bluetooth programming, with a specific focus on developing real code. The authors discuss the major concepts and techniques involved in Bluetooth programming, with special emphasis on how they relate to other networking technologies. They provide specific descriptions and examples for creating applications in a...
Conference Paper
Full-text available
Consider a scenario in which a smart phone automatically saves the user’s positional records for personalized location-based applications. The smart phone will infer patterns of user activities from the historical records and predict user’s future movements. In this paper, we present algorithms for mining the evolving positional logs in order to id...
Conference Paper
Full-text available
One interesting scenario in personal positioning involves an energy-conscious mobile user who tries to obtain estimates about his positions with sufficiently high confidence while consuming as little battery energy as possible. Besides obtaining estimates directly from a position measuring device, the user can rely on extrapolative calculations bas...
Article
We speculate on a novel role virtualization could play in creating a rounded, balanced physical science and engineering software ecosystem to support petascale computational science. The motivation for this analysis is a quest for ways to engage a broader spectrum of expertise in state-of-the-art petascale modeling activities. Current generation pe...
Conference Paper
Full-text available
Application debugging is a tedious but inevitable chore in any software development project. An effective debugger can make programmers more productive by allowing them to pause execution and inspect the state of the process, or monitor writes to memory to detect data corruption. The latter is a notoriously difficult category of bugs to diagnose an...
Conference Paper
Measuring the indirect cost of context switch is a challenging problem. In this paper, we show our results of experimentally quantifying the indirect cost of context switch using a synthetic workload. Specifically, we measure the impact of program data ...
Conference Paper
Modern memory systems play a critical role in the performance of applications, but a detailed understanding of the application behavior in the memory system is not trivial to attain. It requires time consuming simulations and detailed modeling of the memory hierarchy, often using long address traces. It is increasingly possible to access hardware p...
Conference Paper
Full-text available
Modern memory systems play a critical role in the per- formance of applications, but a detailed understanding of the application behavior in the memory system is not trivial to attain. It requires time consuming simulations and de- tailed modeling of the memory hierarchy, often using long address traces. It is increasingly possible to access hard-...
Article
Full-text available
Modern memory systems play a critical role in the performance ofapplications, but a detailed understanding of the application behaviorin the memory system is not trivial to attain. It requires timeconsuming simulations of the memory hierarchy using long traces, andoften using detailed modeling. It is increasingly possible to accesshardware performa...
Conference Paper
Full-text available
In many areas of computer architecture design and program development, the knowledge of dynamic program behavior can be very handy. Several challenges beset the accurate and complete collection of dynamic control flow and mem- ory reference information. These include scalability issues, runtime-overhead, and code coverage. For example, while Tallam...
Conference Paper
Full-text available
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve performance and reliability by dynamically deciding when to skip checkpoint requests made by applications. Using results from cooperative checkpointing theory, this paper proves that periodic checkpointing is not expected to be competitive with the off...
Conference Paper
Full-text available
Cooperative checkpointing increases the performance and robustness of a system by allowing checkpoints requested by applications to be dynamically skipped at runtime. A robust system must be more than merely resilient to failures; it must be adaptable and flexible in the face of new and evolving challenges. A simulation-based experimental analysis...
Article
This work analyzes the connectivity of large diameter networks where every link has an independent probability p of failure. We give a (relatively simple) topological condition that guarantees good connectivity between regions of such a network. Good connectivity means that the regions are connected by nearly as many disjoint, fault-free paths as t...
Conference Paper
Full-text available
Parallel job scheduling is beginning to gain recognition as an important topic that is distinct from the scheduling of tasks within a parallel job by the programmer or runtime system. The main issue is how to share the resources of the parallel machine among a number of competing jobs, giving each the required level of service. This level of schedu...
Conference Paper
Theoretical research on parallel algorithms has focused on NC theory. This motivates the development of parallel algorithms that are extremely fast, but possibly wasteful in their use of processors. Such algorithms seem of limited interest for real applications currently run on parallel computers. This paper explores an alternative approach that em...
Conference Paper
Full-text available
This work analyzes the connectivity of large diameter net- works where every link has an independent probability p of failure. We give a (relatively simple) topological condi- tion that guarantees good connectivity between regions of such a network. Good connectivity means that the regions are connected by nearly as many disjoint, fault-free paths...
Article
Full-text available
Social networks have been used to understand how information flows through an organization as well as identifying individuals that appear to have control over this information flow. Such individuals are identified as being central nodes in a graph representation of the social network and have high "betweenness" values. Rather than looking at graphs...
Article
Full-text available
In dynamic P2P networks, nodes join and depart from the system frequently, which partially damages the predefined P2P structure, and impairs the system performance such as basic lookup functionality. Therefore stabilization process has to be done to restore the logical topology. This paper presents an approach to relax the requirement on routing ta...
Conference Paper
Full-text available
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the system and users to negotiate a mutually desirable risk strategy; in order to accomplish this, the system makes probabilistic guarantees on quality of service (QoS), of the...
Conference Paper
Full-text available
The functionality of an information kiosk can be extended by allowing it to interact with a smartphone, as demonstrated by the Kimono system, and the user interface can be greatly simplified by "associations" between pieces of information. A kiosk provides information that is relevant to a particu- lar location and can use valuable context informat...
Article
Full-text available
We present a low cost and easily deployed infrastructure for location aware computing that is built using standard Bluetooth® technologies and personal computers. Mobile devices are able to determine their location to room-level granularity with existing bluetooth technology, and to even greater resolution with the use of the recently adopted bluet...
Article
Full-text available
This paper describes the successful implementation of a prototype software application that independently and proactively detects whether a mobile phone is lost or misused. When the mobile phone is detected as being lost or misused, the application takes steps to mitigate the impact of loss and to gather evidence. The goal is to aid in the recovery...
Conference Paper
Introduction Scheduling parallel jobs has been a popular research topic for many years. A couple of surveys have been written on this topic in the context of parallel supercomputers [17, 20]. The purpose of the present paper is to update this material, and to extend it to include work concerning clusters and the grid. The first part of the paper de...
Article
Full-text available
This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches. Since memory reference characteristics of processes/threads can change over time, our method collects the cache miss characteristics of processes/threads at run-ti...
Article
Full-text available
We present a perceptual interface for pen-based input that uses live video of handwriting and recovers the time-ordered sequence of strokes that were written. Our system employs a novel algorithm for page detection and explores the use of frame differencing and pen tracking to reconstruct the tempo-ral information in the input video sequence. We pr...
Article
Full-text available
WWW and current Information Technology have made it easy to display a wide variety of content on desktops and personal devices. Unfortunately, little progress has been made for access to the content in public areas. Some technologies, such as Internet Kiosks and narrowcast, enable content access (primarily viewing), but not exchange. There is a gro...
Article
Full-text available
A speech controlled animation system is both a useful application program as well as a laboratory in which to investigate context aware applications as well as controlling errors. The user need not have prior knowledge or experience in animation and is yet able to create interesting and meaningful animation naturally and fluently. The system can be...
Article
Full-text available
The effective size of an L2 cache can be increased by using a dictionary-based compression scheme. Naive application of this idea performs poorly since the data values in a cache greatly vary in their “compressibility.” The novelty of this paper is a scheme that dynamically partitions the cache into sections of different compressibilities. While co...
Article
Full-text available
This Column describes a project-based, hands-on pervasive computing course offered at MIT during the fall 2001 semester. Later, I helped distill the course into an intensive one-week experience that MIT offered in spring 2002 and once again in winter 2003. Much has been carried from these three instantiations.
Article
Full-text available
Managing the memory hierarchy is important for providing good performance of data intensive computation. This effort has explored several techniques for managing the cache in a microprocessor. This report examines column caching, cache partitioning, and cache compression techniques, especially in regards to the Data Intensive System (DIS) benchmark...
Conference Paper
Despite the large number of papers that have been published, scheduling and load balancing continue to be an active area of research. The topic covers all aspects related to scheduling and load balancing including application and system level techniques, theoretical foundations and practical tools. New aspects of parallel and distributed systems, s...
Article
this paper we describe the features and semantics of ParC. The rest of this section explains the motivation for designing a new language, the eect of the motivating forces on the design, and the structure of the software environment that surrounds it. The next section describes the parallel constructs and scoping rules. The exact semantics of paral...
Conference Paper
Full-text available
We propose a low overhead, online memory monitoring scheme utilizing a set of novel hardware counters. The counters indicate the marginal gain in cache hits as the size of the cache is increased, which gives the cache miss-rate as a function of cache size. Using the counters, we describe a scheme that enables an accurate estimate of the isolated mi...
Conference Paper
Full-text available
We develop a new metric for job scheduling that includes the effects of memory contention amongst simultaneously-executing jobs that share a given level of memory. Rather than assuming each job or process has a fixed, static memory requirement, we consider a general scenario wherein a process' performance monotonically increases as a function of al...
Article
Full-text available
This paper proposes a dynamic cache partitioning method for simultaneous multithreading systems. We present a general partitioning scheme that can be applied to setassociative caches at any partition granularity. Furthermore, in our scheme threads can have overlapping partitions, which provides more degrees of freedom when partitioning caches with...
Article
Gang scheduling --- the scheduling of a number of related threads to execute simultaneously on distinct processors --- appears to meet the requirements of interactive, multiuser, generalpurpose parallel systems. Distributed Hierarchical Control (DHC) has been proposed as an efficient mechanism for coping with the dynamic processor partitioning nece...
Article
Full-text available
We propose a way to improve the performance of embedded processors running data-intensive applications by allowing software to allocate on-chip memory on an applicationspecific basis. On-chip memory in the form of cache can be made to act like scratchpad memory via a novel hardware mechanism, which we call column caching. Column caching enables dyn...
Article
Full-text available
An accurate, tractable, analytic cache model for time-shared systems is presented, which estimates the overall cache missrate of a multiprocessing system with any cache size and time quanta. The input to the model consists of the isolated miss-rate curves for each process, the time quanta for each of the executing processes, and the total cache siz...
Article
In order to become generally useful, message passing mechanisms not only need to provide high performance, but also the three M's: multi-granularity, multi-threading and multiprocessing.
Article
This paper presents the architecture of a network interface unit (NIU) to provide a wide range of shared memory and message passing (S&M) semantics within the confines of an open system composed of a cluster of SMP's and a high speed interconnection network. An open system allows commodity items to be easily assembled and replaced without requiring...
Article
No single message passing mechanism can efficiently support all the different types of communication that occur naturally in most parallel or distributed programs. MIT's StarT-Voyager, a hybrid message passing/shared memory parallel machine, provides four message passing mechanisms to achieve very high performance over a wide spectrum of communicat...
Conference Paper
Full-text available
Token rotation algorithms play an important role in distributed computing, to support such activities as mutual exclusion, round-robin scheduling, group membership and group communication protocols. Ring-based protocols maximize throughput in busy systems but can incur a linear (in the number of processors) delay when a processor needs to obtain a...
Conference Paper
Full-text available
We address the problem of improving cache predictability and performance in embedded systems through the use of software-assisted replacement mechanisms. These mechanisms require additional software controlled state information that affects the cache replacement decision. Software instructions allow a program to kill a particular cache element, i.e...
Conference Paper
Full-text available
For the past six months, I have been integrating several experimental, cutting-edge technologies developed by my colleagues at MIT as part of the MIT LCS/AIL Oxygen project. This paper gives a snapshot of this work-in-progress. Project Oxygen is a collaborative effort involving many research activities throughout the Laboratory for Computer Science...
Article
Full-text available
Abstract When multiple applications have to time-share limited physical memory resources, they can incur significant performance,degradation at the beginning of their respective time slices due to page faults. We propose a method to significantly improve memory,system and overall performance,in time-shared computers,using job-speculative prefetchin...
Article
Memory latency is a significant bottleneck in modern computing systems. With few exceptions, the process/thread/task currently running on the CPU implicitly owns all caches and main memory. Essentially, the memory is scheduled with the CPU. Generally, however, at least one memory resource is larger than any one process/thread/task and thus there is...
Conference Paper
The goal of the Ultra-Scale Computing Valuation Project is to understand utilization issues for both users and managers of the largest scientific computing systems and to begin developing appropriate metrics and models for such system. This paper describes a few aspects of the project.
Conference Paper
Full-text available
This paper examines two Network Interface Card micro-architectures that support low latency, high bandwidth user level message passing in multi-user environments. The two are at different ends of a design spectrum-the Resident queues design relies completely on hardware, while the Non-resident queues design is heavily firmware driven. Through actua...
Article
We propose a way to improve the performance of embedded processorsrunning data-intensive applications by allowing softwareto allocate on-chip memory on an application-specific basis. Onchipmemory in the form of cache can be made to act like scratchpadmemory via a novel hardware mechanism, which we call column caching. Column caching enables dynamic...
Article
This paper introduces column caching, a flexible mechanism that allows software to dynamically customize cache behavior through fine-grain control of its placement policy. For a set-associative cache, specific data can be restricted to a subset of the usual target cache set during replacement. Through this simple enhancement, column caching enables...
Conference Paper
Abstract An adaptive cache coherence protocol changes its ac - tions to address changing program behaviors We present an adaptive protocol called Cachet for distributed shared - memory systems Cachet is a seamless integration of several micro - protocols, each of which has been optimized for a par - ticular memory access pattern Cachet embodies bot...
Article
We present a new mechanism-oriented memory model called Commit-Reconcile & Fences (CRF) and define it using algebraic rules. Many existing memory models can be described as restricted versions of CRF. The model has been designed so that it is both easy for architects to implement, and stable enough to serve as a target machine interface for compile...
Conference Paper
Full-text available
No single message passing mechanism can efficiently support all types of communication that commonly occur in most parallel or distributed programs. MIT's StarT-Voyager, a hybrid message passing/shared memory parallel machine, provides four message passing mechanisms to achieve high performance over a wide spectrum of communication types and sizes....
Conference Paper
Full-text available
We present a new mechanism-oriented memory model called Commit-Reconcile & Fences (CRF) and define it using algebraic rules. Many existing memory models can be described as restricted versions of CRF. The model has been designed so that it is both easy for architects to implement and stable enough to serve as a target machine interface for compiler...
Article
We propose the concept of an “optical switchcube” to help handling the communication load in a parallel computer composed of tens or hundreds of processor clusters operating under the loosely synchronous model of parallel computing. A switchcube can establish a communication and send one packet of data from any node to any combination of the other...
Article
Full-text available
Providing the required interconnections between the processors of a parallel computer is a difficult problem: latency, switching control, cost, and crosstalk effects have to be taken into account. It is widely believed that the design might be simplified if optical technology is used. However, even optical interconnections cannot cater for an unlim...
Conference Paper
This paper describes StarT-Voyager, a machine designed as an experimental platform for research in cluster system communication. The heart of StarT-Voyager is a network interface unit (NIU) that connects the memory bus of a PowerPC-based SMP to the MIT Arctic network. The NIU is highly flexible, with its set of functions easily modified by firmware...
Article
This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root ...) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-...
Conference Paper
Full-text available
This paper presents the communication architecture of the START-VOYAGER system, a parallel machine composed of a cluster of unmodified IBM 604e-based SMP's connected via a high speed interconnection network. A custom network interface unit (NIU) plugs into a processor card slot of each SMP, providing a high-performance message passing substrate tha...
Article
This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-Me...
Article
Full-text available
. The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is p...
Article
Full-text available
The scheduling of jobs on parallel supercomputer is becoming the subject of much research. However, there is concern about the divergence of theory and practice. We review theoretical research in this area, and recommendations based on recent results. This is contrasted with a proposal for standard interfaces among the components of a scheduling sy...
Conference Paper
Several recent architectural trends appear to require some of the special properties of optical interconnection networks. The memory hierarchy will probably get deeper and the hardware will be more dynamic. Reconfigurable hardware and software are the subject of intense research. It is likely that the basic word size and the number of communication...
Conference Paper
Full-text available
The evaluation of parallel job schedulers hinges on two things: the use of appropriate metrics, and the use of appropriate workloads on which the scheduler can operate. We argue that the focus should be on on-line open systems, and propose that a standard workload should be used as a benchmark for schedulers. This benchmark will specify distributio...
Conference Paper
Full-text available
Parallel job scheduling has gained increasing recognition in recent years as a distinct area of study. However, there is concern about the divergence of theory and practice in the field. We review theoretical research in this area, and recommendations based on recent results. This is contrasted with a proposal for standard interfaces among the comp...
Conference Paper
Full-text available
The job workloads of general-purpose multiprocessors usually include both compute-bound parallel jobs, which often require gang scheduling, as well as I/O-bound jobs, which require high CPU priority for the individual gang members of the job in order to achieve interactive response times. Our results indicate that an effective interactive multiproc...

Network

Cited By