![Francois Abel](https://i1.rgstatic.net/ii/profile.image/535227136253952-1504619501178_Q128/Francois-Abel-2.jpg)
Francois AbelIBM · IBM Research Zurich
Francois Abel
About
46
Publications
13,380
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
737
Citations
Introduction
I'm a research staff member at the IBM Zurich Research Laboratory (Switzerland). I am currently working on a disaggregated cloud and computing infrastructure for FPGAs. The goal of this project is to deploy FPGAs at large scale in hyperscale data centers (see https://www.zurich.ibm.com/cci/cloudFPGA/).
My area of research is high-speed data networking, with an emphasis on architecture and VLSI design of server interconnect fabrics and accelerators for computer interconnection networks. I maintain a personal home page at:
http://researcher.watson.ibm.com/researcher/view.php?person=zurich-fab
and a LinkedIn page at https://www.linkedin.com/in/francois-abel/
Skills and Expertise
Publications
Publications (46)
The slow-down of technology scaling combined with the exponential growth of modern machine learning and artificial intelligence models has created a demand for specialized accelerators, such as GPUs, ASICs, and field-programmable gate arrays (FPGAs). FPGAs can be reconfigured and have the potential to outperform other accelerators, while also being...
Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs. Fie...
The evolution of cloud applications into loosely-coupled microservices opens new opportunities for hardware accelerators to improve workload performance. Existing accelerator techniques for cloud sacrifice the consolidation benefits of microservices. This paper presents CloudiFi, a framework to deploy and compare accelerators as a cloud service. We...
Slide deck for the paper:
“Programming Reconfigurable Heterogeneous Computing Clusters Using MPI With Transpilation,”
by B. Ringlein, F. Abel, A. Ditter, B. Weiss, C. Hagleitner and D. Fey,
presented in 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), 2020.
With the slowdown of Moore's law and the stop of Dennard scaling, energy efficiency of compute hardware translates to compute power. Therefore, High-Performance Computing (HPC) systems tend to rely more and more on accelerators such as Field-Programmable Gate Arrays (FPGAs) to fuel high demanding workloads, like Big Data applications or Deep Neuron...
Abstract: The miniaturization of CMOS technology has reached a scale at which FPGAs are starting to integrate scalar CPUs, specialized AI engines, and an ever increasing number of hard IP controllers such as PCIe, DDR4, Ethernet and encryption cores. Equipped with such a compute density and reconfigurable capability, FPGAs have the potential to dis...
Presentation to our FCCM20 extended abstract about our framework ZRLMPI.
Emerging applications such as deep neural networks, bioinformatics or video encoding impose a high computing pressure on the Cloud.
Reconfigurable technologies like Field-Programmable Gate Arrays (FPGAs) can handle such compute-intensive workloads in an efficient and performant way.
To seamlessly incorporate FPGAs into existing Cloud environments...
Emerging applications such as deep neural networks, bioinformatics or video encoding impose a high computing pressure on the Cloud.
Reconfigurable technologies like Field-Programmable Gate Arrays (FPGAs) can handle such compute-intensive workloads in an efficient and performant way.
To seamlessly incorporate FPGAs into existing Cloud environments...
A presentation by Alexander Ditter at the Fourth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC‘18), November 11, 2018, Dallas, TX. The slide deck describes an architecture for our cloudFPGA platform to acquire network-attached FPGAs, execute distributed applications, protect user specific IP and support lar...
Slides of the presentation at Hot Interconnects 25, Santa Clara, CA, Aug. 29-30, 2017
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. Meanwhile, DC infrastructures are being redesigned to pack ever more compute capacity into the same volume and power envelopes. This rede...
Many computational workloads from commercial and scientific fields have high demands in total throughput, and energy efficiency. For example the largest radio telescope, to be built in South Africa and Australia combines cost, performance and power targets that cannot be met by the technological development until its installation. In processor arch...
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. Meanwhile, DC infrastructures are being redesigned to pack ever more compute capacity into the same volume and power envelops. This redes...
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. However, this approach limits the number of FPGAs per node and hinders the acceleration of large-scale distributed applications. We propo...
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DCs) and are used to offload and accelerate specific services, but they are not yet available to cloud users. This puts the cloud deployment of compute-intensive workloads at a disadvantage compared with on-site infrastructure
installations, where the performance and ene...
Overlays are increasingly used to carry out virtual networks in multi-tenant data centers. However, the encapsulation layer of these overlay virtual networks increases the processing cost and degrades the system performance when the tunnel endpoints are implemented in software. In this paper, we investigate this processing cost in terms of clock cy...
Disclosed is a method for validating a data packet by a network processor supporting a first-network protocol and a second network protocol and utilizing shared hardware. The network processor receives a data packet; identifies a network packet protocol for the data packet; and processes the data packet according to the network packet protocol comp...
The miniaturization of CMOS technology has reached a scale at which server processors are starting to integrate multi-gigabit network interface controllers (NIC). While transistors are becoming cheap and abundant in solid-state circuits, they remain at a premium on a processor die if they do not contribute to increase the number of cores and caches...
Packet-switch fabrics with widely varying characteristics are currently deployed in the domains of both communications and computer interconnection networks. For economical reasons, it would be highly desirable that a single switch fabric could accommodate the needs of a variety of heterogeneous services and applications from both domains. In this...
The OSMOSIS project explores the role of optics in large-scale interconnection networks for high-performance computing (HPC) systems. Its main objectives are solving the technical challenges to meet the stringent HPC requirements of high bandwidth, low latency, low error rates, and cost-effective scalability. We discuss the technologies and archite...
A crucial part of any high-performance computing (HPC) system is its interconnection network. Corning and IBM are jointly developing a demonstration interconnect based on optical cell switching with electronic control. The Corning-IBM joint optical shared memory supercomputer interconnect system (Osmosis) project explores the opportunity to advance...
The goal of this work is to enable distributed (multi-chip) implementations of iterative matching algorithms for crossbar-based packet switches, as opposed to the traditional monolithic (single-chip) ones. The practical motivation for this effort is the design and implementation in FPGAs of a scheduler for a 64-port optical crossbar switch. Sizing...
A crucial part of any high-performance computing system is its interconnection network. In the OSMOSIS project, Corning and IBM are jointly developing a demonstrator interconnect based on optical cell switching with electronic control. Starting from the core set of requirements, we present the system design rationale and show how it impacts the pra...
We describe an incremental request-grant protocol between line cards comprising virtual output queues and a central arbitration unit in a crossbar-based packet switch. Moreover, we introduce a method to make this protocol reliable in the presence of transmission errors that might lead to permanent inconsistency of the queue state information mainta...
Heuristic, parallel, iterative matching algorithms for input-queued cell switches with virtual output queuing require O(log N) iterations to achieve good performance. If the hardware implementation of the number of iterations required is not feasible within the cell duration, the matching process can be pipelined to obtain a matching in every cell...
We answer the question on how much memory a packet switch/router needs; more specifically, we propose a systematic method that is simple, rigorous and general for determining the absolute lower bound of packet buffering required by practical switching systems. Accordingly, we introduce a deterministic traffic scenario that stresses the global stabi...
This 4-TBPS packet switch uses a combined input- and crosspoint-queued (CICQ) structure with virtual output queuing at the ingress to achieve the scalability of input-buffered switches, the performance of output-buffered switches, and low latency.
We propose a systematic method to determine the lower bound for internal buffering of practical CIOQ (combined input-output queued) switching systems. We introduce a deterministic traffic scenario that stresses the global stability of finite output queues. We demonstrate its usefulness by dimensioning the buffer capacity of the CIOQ under such traf...
Addressing the ever growing capacity demand for packet switches, current research focuses on scheduling algorithms or buffer bandwidth reductions. Although these topics remain relevant, our position is that the primary design focus for systems beyond 1 Tb/s must be shifted to aspects resulting from packaging disruptions. Based on trends such as inc...
We present the architecture and practical VLSI implementation of a 4-Tb/s single-stage switch. It is based on a combined input- and crosspoint-queued structure with virtual output queuing at the ingress, which has the scalability of input-buffered switches and the performance of output-buffered switches. Our system handles the large fabric-internal...
Traditional improvements in packet switch architecture are aimed
at increasing switch performance in terms of utilization, fairness and
QoS. This paper focuses on improving the architecture to achieve
implementation feasibility of terabit aggregate data rates while
maintaining such performance. Terabit class shared-memory switch chips
are simple in...
A method for allocating pending requests for data packet transmission at a number of inputs to a number of outputs of a switching system in successive time slots, including a matching method including the steps of providing a first request information in a first time slot indicating data packets at the inputs requesting transmission to the outputs...