Home
IBM
IBM Research Zurich
Francois Abel

Francois Abel
IBM · IBM Research Zurich

About

Publications

13,380

Reads

737

Citations

I'm a research staff member at the IBM Zurich Research Laboratory (Switzerland). I am currently working on a disaggregated cloud and computing infrastructure for FPGAs. The goal of this project is to deploy FPGAs at large scale in hyperscale data centers (see https://www.zurich.ibm.com/cci/cloudFPGA/). My area of research is high-speed data networking, with an emphasis on architecture and VLSI design of server interconnect fabrics and accelerators for computer interconnection networks. I maintain a personal home page at: http://researcher.watson.ibm.com/researcher/view.php?person=zurich-fab and a LinkedIn page at https://www.linkedin.com/in/francois-abel/

Skills and Expertise

Publications

Automated parallel execution of distributed task graphs with FPGA clusters

Article

Jun 2024

DOSA: Organic Compilation for Neural Network Inference on Distributed FPGAs

Conference Paper

Jul 2023

Composability of Cloud Accelerators in Virtual World Simulations

Conference Paper

Jul 2023

Advancing Compilation of DNNs for FPGAs using Operation Set Architectures

Article

Dec 2022

The slow-down of technology scaling combined with the exponential growth of modern machine learning and artificial intelligence models has created a demand for specialized accelerators, such as GPUs, ASICs, and field-programmable gate arrays (FPGAs). FPGAs can be reconfigured and have the potential to outperform other accelerators, while also being...

OmpSs@cloudFPGA: An FPGA Task-Based Programming Model with Message Passing

Conference Paper

May 2022

Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs. Fie...

A Case for Function-as-a-Service with Disaggregated FPGAs

Conference Paper

Sep 2021

Acceleration-as-a-µService: A Cloud-native Monte-Carlo Option Pricing Engine on CPUs, GPUs and Disaggregated FPGAs

Conference Paper

Sep 2021

Acceleration-as-a-{\mu}Service: A Cloud-native Monte-Carlo Option Pricing Engine on CPUs, GPUs and Disaggregated FPGAs

Preprint

Full-text available

Jun 2021

The evolution of cloud applications into loosely-coupled microservices opens new opportunities for hardware accelerators to improve workload performance. Existing accelerator techniques for cloud sacrifice the consolidation benefits of microservices. This paper presents CloudiFi, a framework to deploy and compare accelerators as a cloud service. We...

Programming Reconfigurable Heterogeneous Computing Clusters Using MPI With Transpilation -- Slides

Presentation

Full-text available

Apr 2021

Slide deck for the paper: “Programming Reconfigurable Heterogeneous Computing Clusters Using MPI With Transpilation,” by B. Ringlein, F. Abel, A. Ditter, B. Weiss, C. Hagleitner and D. Fey, presented in 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), 2020.

Programming Reconfigurable Heterogeneous Computing Clusters Using MPI With Transpilation

Conference Paper

Nov 2020

With the slowdown of Moore's law and the stop of Dennard scaling, energy efficiency of compute hardware translates to compute power. Therefore, High-Performance Computing (HPC) systems tend to rely more and more on accelerators such as Field-Programmable Gate Arrays (FPGAs) to fuel high demanding workloads, like Big Data applications or Deep Neuron...

cloudFPGA -- Promoting FPGAs To Become First-class Citizens in Datacenters FCCM 2020 Workshop, The Future of FPGA-Acceleration in Cloud and Datacenters, May 6 (https://www.fccm.org/past/2020/home/program/workshop-the-future-of-fpga-acceleration-in-cloud-and-datacenters/)

Presentation

Full-text available

May 2020

Francois Abel

Abstract: The miniaturization of CMOS technology has reached a scale at which FPGAs are starting to integrate scalar CPUs, specialized AI engines, and an ever increasing number of hard IP controllers such as PCIe, DDR4, Ethernet and encryption cores. Equipped with such a compute density and reconfigurable capability, FPGAs have the potential to dis...

ZRLMPI: A Unified Programming Model for Reconfigurable Heterogeneous Computing Clusters -- Slides

Presentation

Full-text available

May 2020

Presentation to our FCCM20 extended abstract about our framework ZRLMPI.

ZRLMPI: A Unified Programming Model for Reconfigurable Heterogeneous Computing Clusters

Conference Paper

May 2020

System architecture for network-attached FPGAs in the Cloud using partial reconfiguration

Poster

Full-text available

Sep 2019

Emerging applications such as deep neural networks, bioinformatics or video encoding impose a high computing pressure on the Cloud. Reconfigurable technologies like Field-Programmable Gate Arrays (FPGAs) can handle such compute-intensive workloads in an efficient and performant way. To seamlessly incorporate FPGAs into existing Cloud environments...

System architecture for network-attached FPGAs in the Cloud using partial reconfiguration -- Slides

Presentation

Full-text available

Sep 2019

System Architecture for Network-Attached FPGAs in the Cloud using Partial Reconfiguration

Conference Paper

Sep 2019

Integrating Network-Attached FPGAs into the Cloud using Partial Reconfiguration - Slides

Presentation

Full-text available

Dec 2018

A presentation by Alexander Ditter at the Fourth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC‘18), November 11, 2018, Dallas, TX. The slide deck describes an architecture for our cloudFPGA platform to acquire network-attached FPGAs, execute distributed applications, protect user specific IP and support lar...

An FPGA Platform for Hyperscalers - Slides

Presentation

Full-text available

Aug 2017

Francois Abel

Slides of the presentation at Hot Interconnects 25, Santa Clara, CA, Aug. 29-30, 2017

An FPGA Platform for Hyperscalers

Conference Paper

Full-text available

Aug 2017

FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. Meanwhile, DC infrastructures are being redesigned to pack ever more compute capacity into the same volume and power envelopes. This rede...

Microserver + micro-switch = micro-datacenter

Conference Paper

Jan 2017

Many computational workloads from commercial and scientific fields have high demands in total throughput, and energy efficiency. For example the largest radio telescope, to be built in South Africa and Australia combines cost, performance and power targets that cannot be met by the technological development until its installation. In processor arch...

Network-Attached FPGAs for Data Center Applications Slides

Data

Jan 2017

Jagath Weerasinghe CloudCom2016 20161213 RG

Data

Jan 2017

Disaggregated FPGAs: Network Performance Comparison against Bare-Metal Servers, Virtual Machines and Linux Containers

Conference Paper

Full-text available

Dec 2016

Network-Attached FPGAs for Data Center Applications

Conference Paper

Full-text available

Dec 2016

FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. However, this approach limits the number of FPGAs per node and hinders the acceleration of large-scale distributed applications. We propo...

On the cost of tunnel endpoint processing in overlay virtual networks 20141208

Data

Full-text available

Nov 2015

Enabling FPGAs in Hyperscale Data Centers

Conference Paper

Full-text available

Aug 2015

FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DCs) and are used to offload and accelerate specific services, but they are not yet available to cloud users. This puts the cloud deployment of compute-intensive workloads at a disadvantage compared with on-site infrastructure installations, where the performance and ene...

Slides-CBDCom-30

Data

Aug 2015

On the Cost of Tunnel Endpoint Processing in Overlay Virtual Networks

Presentation

Full-text available

Dec 2014

Overlays are increasingly used to carry out virtual networks in multi-tenant data centers. However, the encapsulation layer of these overlay virtual networks increases the processing cost and degrades the system performance when the tunnel endpoints are implemented in software. In this paper, we investigate this processing cost in terms of clock cy...

Checksum verification accelerator

Patent

Full-text available

May 2014

Disclosed is a method for validating a data packet by a network processor supporting a first-network protocol and a second network protocol and utilizing shared hardware. The network processor receives a data packet; identifies a network packet protocol for the data packet; and processes the data packet according to the network packet protocol comp...

Rx stack accelerator for 10 GbE integrated NIC

Conference Paper

Full-text available

Aug 2012

The miniaturization of CMOS technology has reached a scale at which server processors are starting to integrate multi-gigabit network interface controllers (NIC). While transistors are becoming cheap and abundant in solid-state circuits, they remain at a premium on a processor die if they do not contribute to increase the number of cores and caches...

Method and system for ordered dynamic distribution of packet flows over network processors

Patent

Apr 2008

Lightspeed Communications in Supercomputers.

Article

Jan 2008

Design Issues in Next-Generation Merchant Switch Fabrics

Article

Full-text available

Jan 2008

Packet-switch fabrics with widely varying characteristics are currently deployed in the domains of both communications and computer interconnection networks. For economical reasons, it would be highly desirable that a single switch fabric could accommodate the needs of a variety of heterogeneous services and applications from both domains. In this...

The OSMOSIS Optical Packet Switch for Supercomputers: Enabling Technologies and Measured Performance

Conference Paper

Full-text available

Sep 2007

The OSMOSIS project explores the role of optics in large-scale interconnection networks for high-performance computing (HPC) systems. Its main objectives are solving the technical challenges to meet the stringent HPC requirements of high bandwidth, low latency, low error rates, and cost-effective scalability. We discuss the technologies and archite...

Designing a Crossbar Scheduler for HPC Applications

Article

Full-text available

Jun 2006

A crucial part of any high-performance computing (HPC) system is its interconnection network. Corning and IBM are jointly developing a demonstration interconnect based on optical cell switching with electronic control. The Corning-IBM joint optical shared memory supercomputer interconnect system (Osmosis) project explores the opportunity to advance...

Distributed crossbar schedulers

Conference Paper

Full-text available

Jan 2006

The goal of this work is to enable distributed (multi-chip) implementations of iterative matching algorithms for crossbar-based packet switches, as opposed to the traditional monolithic (single-chip) ones. The practical motivation for this effort is the design and implementation in FPGAs of a scheduler for a 64-port optical crossbar switch. Sizing...

Control Path Implementation for a Low-Latency Optical HPC Switch

Conference Paper

Full-text available

Sep 2005

A crucial part of any high-performance computing system is its interconnection network. In the OSMOSIS project, Corning and IBM are jointly developing a demonstrator interconnect based on optical cell switching with electronic control. Starting from the core set of requirements, we present the system design rationale and show how it impacts the pra...

Reliable control protocol for crossbar arbitration

Article

Full-text available

Mar 2005

We describe an incremental request-grant protocol between line cards comprising virtual output queues and a central arbitration unit in a crossbar-based packet switch. Moreover, we introduce a method to make this protocol reliable in the presence of transmission errors that might lead to permanent inconsistency of the queue state information mainta...

Low-latency pipelined crossbar arbitration

Conference Paper

Full-text available

Jan 2004

Heuristic, parallel, iterative matching algorithms for input-queued cell switches with virtual output queuing require O(log N) iterations to achieve good performance. If the hardware implementation of the number of iterations required is not feasible within the cell duration, the matching process can be pipelined to obtain a matching in every cell...

Stability degree of switches with finite buffers and non-negligible round-trip time☆

Article

Jun 2003

We answer the question on how much memory a packet switch/router needs; more specifically, we propose a systematic method that is simple, rigorous and general for determining the absolute lower bound of packet buffering required by practical switching systems. Accordingly, we introduce a deterministic traffic scenario that stresses the global stabi...

A four-Terabit packet switch supporting long round-trip times

Article

Full-text available

Feb 2003

This 4-TBPS packet switch uses a combined input- and crosspoint-queued (CICQ) structure with virtual output queuing at the ingress to achieve the scalability of input-buffered switches, the performance of output-buffered switches, and low latency.

Stability of CIOQ switches with finite buffers and non-negligible round-trip time

Conference Paper

Full-text available

Nov 2002

We propose a systematic method to determine the lower bound for internal buffering of practical CIOQ (combined input-output queued) switching systems. We introduce a deterministic traffic scenario that stresses the global stability of finite output queues. We demonstrate its usefulness by dimensioning the buffer capacity of the CIOQ under such traf...

Current Issues in Packet Switch Design

Article

Full-text available

Nov 2002

Addressing the ever growing capacity demand for packet switches, current research focuses on scheduling algorithms or buffer bandwidth reductions. Although these topics remain relevant, our position is that the primary design focus for systems beyond 1 Tb/s must be shifted to aspects resulting from packaging disruptions. Based on trends such as inc...

A four-terabit single-stage packet switch with large round-trip time support

Conference Paper

Full-text available

Feb 2002

We present the architecture and practical VLSI implementation of a 4-Tb/s single-stage switch. It is based on a combined input- and crosspoint-queued structure with virtual output queuing at the ingress, which has the scalability of input-buffered switches and the performance of output-buffered switches. Our system handles the large fabric-internal...

Optimized architecture and design of an output-queued CMOS switch chip

Conference Paper

Full-text available

Feb 2001

Traditional improvements in packet switch architecture are aimed at increasing switch performance in terms of utilization, fairness and QoS. This paper focuses on improving the architecture to achieve implementation feasibility of terabit aggregate data rates while maintaining such performance. Terabit class shared-memory switch chips are simple in...

Method and allocation device for allocating pending requests for data packet transmission at a number of inputs to a number of outputs of a packet switching device in successive time slots

Article

A method for allocating pending requests for data packet transmission at a number of inputs to a number of outputs of a switching system in successive time slots, including a matching method including the steps of providing a first request information in a first time slot indicating data packets at the inputs requesting transmission to the outputs...