• Home
  • IBM
  • IBM Research Zurich
  • Francois Abel
Francois Abel

Francois Abel
IBM · IBM Research Zurich

About

46
Publications
13,380
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
737
Citations
Introduction
I'm a research staff member at the IBM Zurich Research Laboratory (Switzerland). I am currently working on a disaggregated cloud and computing infrastructure for FPGAs. The goal of this project is to deploy FPGAs at large scale in hyperscale data centers (see https://www.zurich.ibm.com/cci/cloudFPGA/). My area of research is high-speed data networking, with an emphasis on architecture and VLSI design of server interconnect fabrics and accelerators for computer interconnection networks. I maintain a personal home page at: http://researcher.watson.ibm.com/researcher/view.php?person=zurich-fab and a LinkedIn page at https://www.linkedin.com/in/francois-abel/

Publications

Publications (46)
Article
The slow-down of technology scaling combined with the exponential growth of modern machine learning and artificial intelligence models has created a demand for specialized accelerators, such as GPUs, ASICs, and field-programmable gate arrays (FPGAs). FPGAs can be reconfigured and have the potential to outperform other accelerators, while also being...
Conference Paper
Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs. Fie...
Preprint
Full-text available
The evolution of cloud applications into loosely-coupled microservices opens new opportunities for hardware accelerators to improve workload performance. Existing accelerator techniques for cloud sacrifice the consolidation benefits of microservices. This paper presents CloudiFi, a framework to deploy and compare accelerators as a cloud service. We...
Presentation
Full-text available
Slide deck for the paper: “Programming Reconfigurable Heterogeneous Computing Clusters Using MPI With Transpilation,” by B. Ringlein, F. Abel, A. Ditter, B. Weiss, C. Hagleitner and D. Fey, presented in 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), 2020.
Conference Paper
With the slowdown of Moore's law and the stop of Dennard scaling, energy efficiency of compute hardware translates to compute power. Therefore, High-Performance Computing (HPC) systems tend to rely more and more on accelerators such as Field-Programmable Gate Arrays (FPGAs) to fuel high demanding workloads, like Big Data applications or Deep Neuron...
Presentation
Full-text available
Abstract: The miniaturization of CMOS technology has reached a scale at which FPGAs are starting to integrate scalar CPUs, specialized AI engines, and an ever increasing number of hard IP controllers such as PCIe, DDR4, Ethernet and encryption cores. Equipped with such a compute density and reconfigurable capability, FPGAs have the potential to dis...
Presentation
Full-text available
Presentation to our FCCM20 extended abstract about our framework ZRLMPI.
Poster
Full-text available
Emerging applications such as deep neural networks, bioinformatics or video encoding impose a high computing pressure on the Cloud. Reconfigurable technologies like Field-Programmable Gate Arrays (FPGAs) can handle such compute-intensive workloads in an efficient and performant way. To seamlessly incorporate FPGAs into existing Cloud environments...
Presentation
Full-text available
Emerging applications such as deep neural networks, bioinformatics or video encoding impose a high computing pressure on the Cloud. Reconfigurable technologies like Field-Programmable Gate Arrays (FPGAs) can handle such compute-intensive workloads in an efficient and performant way. To seamlessly incorporate FPGAs into existing Cloud environments...
Presentation
Full-text available
A presentation by Alexander Ditter at the Fourth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC‘18), November 11, 2018, Dallas, TX. The slide deck describes an architecture for our cloudFPGA platform to acquire network-attached FPGAs, execute distributed applications, protect user specific IP and support lar...
Presentation
Full-text available
Slides of the presentation at Hot Interconnects 25, Santa Clara, CA, Aug. 29-30, 2017
Conference Paper
Full-text available
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. Meanwhile, DC infrastructures are being redesigned to pack ever more compute capacity into the same volume and power envelopes. This rede...
Conference Paper
Many computational workloads from commercial and scientific fields have high demands in total throughput, and energy efficiency. For example the largest radio telescope, to be built in South Africa and Australia combines cost, performance and power targets that cannot be met by the technological development until its installation. In processor arch...
Conference Paper
Full-text available
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. Meanwhile, DC infrastructures are being redesigned to pack ever more compute capacity into the same volume and power envelops. This redes...
Conference Paper
Full-text available
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DC). They are used as accelerators to boost the compute power of individual server nodes and to improve the overall power efficiency. However, this approach limits the number of FPGAs per node and hinders the acceleration of large-scale distributed applications. We propo...
Conference Paper
Full-text available
FPGAs (Field Programmable Gate Arrays) are making their way into data centers (DCs) and are used to offload and accelerate specific services, but they are not yet available to cloud users. This puts the cloud deployment of compute-intensive workloads at a disadvantage compared with on-site infrastructure installations, where the performance and ene...
Presentation
Full-text available
Overlays are increasingly used to carry out virtual networks in multi-tenant data centers. However, the encapsulation layer of these overlay virtual networks increases the processing cost and degrades the system performance when the tunnel endpoints are implemented in software. In this paper, we investigate this processing cost in terms of clock cy...
Patent
Full-text available
Disclosed is a method for validating a data packet by a network processor supporting a first-network protocol and a second network protocol and utilizing shared hardware. The network processor receives a data packet; identifies a network packet protocol for the data packet; and processes the data packet according to the network packet protocol comp...
Conference Paper
Full-text available
The miniaturization of CMOS technology has reached a scale at which server processors are starting to integrate multi-gigabit network interface controllers (NIC). While transistors are becoming cheap and abundant in solid-state circuits, they remain at a premium on a processor die if they do not contribute to increase the number of cores and caches...
Article
Full-text available
Packet-switch fabrics with widely varying characteristics are currently deployed in the domains of both communications and computer interconnection networks. For economical reasons, it would be highly desirable that a single switch fabric could accommodate the needs of a variety of heterogeneous services and applications from both domains. In this...
Conference Paper
Full-text available
The OSMOSIS project explores the role of optics in large-scale interconnection networks for high-performance computing (HPC) systems. Its main objectives are solving the technical challenges to meet the stringent HPC requirements of high bandwidth, low latency, low error rates, and cost-effective scalability. We discuss the technologies and archite...
Article
Full-text available
A crucial part of any high-performance computing (HPC) system is its interconnection network. Corning and IBM are jointly developing a demonstration interconnect based on optical cell switching with electronic control. The Corning-IBM joint optical shared memory supercomputer interconnect system (Osmosis) project explores the opportunity to advance...
Conference Paper
Full-text available
The goal of this work is to enable distributed (multi-chip) implementations of iterative matching algorithms for crossbar-based packet switches, as opposed to the traditional monolithic (single-chip) ones. The practical motivation for this effort is the design and implementation in FPGAs of a scheduler for a 64-port optical crossbar switch. Sizing...
Conference Paper
Full-text available
A crucial part of any high-performance computing system is its interconnection network. In the OSMOSIS project, Corning and IBM are jointly developing a demonstrator interconnect based on optical cell switching with electronic control. Starting from the core set of requirements, we present the system design rationale and show how it impacts the pra...
Article
Full-text available
We describe an incremental request-grant protocol between line cards comprising virtual output queues and a central arbitration unit in a crossbar-based packet switch. Moreover, we introduce a method to make this protocol reliable in the presence of transmission errors that might lead to permanent inconsistency of the queue state information mainta...
Conference Paper
Full-text available
Heuristic, parallel, iterative matching algorithms for input-queued cell switches with virtual output queuing require O(log N) iterations to achieve good performance. If the hardware implementation of the number of iterations required is not feasible within the cell duration, the matching process can be pipelined to obtain a matching in every cell...
Article
We answer the question on how much memory a packet switch/router needs; more specifically, we propose a systematic method that is simple, rigorous and general for determining the absolute lower bound of packet buffering required by practical switching systems. Accordingly, we introduce a deterministic traffic scenario that stresses the global stabi...
Article
Full-text available
This 4-TBPS packet switch uses a combined input- and crosspoint-queued (CICQ) structure with virtual output queuing at the ingress to achieve the scalability of input-buffered switches, the performance of output-buffered switches, and low latency.
Conference Paper
Full-text available
We propose a systematic method to determine the lower bound for internal buffering of practical CIOQ (combined input-output queued) switching systems. We introduce a deterministic traffic scenario that stresses the global stability of finite output queues. We demonstrate its usefulness by dimensioning the buffer capacity of the CIOQ under such traf...
Article
Full-text available
Addressing the ever growing capacity demand for packet switches, current research focuses on scheduling algorithms or buffer bandwidth reductions. Although these topics remain relevant, our position is that the primary design focus for systems beyond 1 Tb/s must be shifted to aspects resulting from packaging disruptions. Based on trends such as inc...
Conference Paper
Full-text available
We present the architecture and practical VLSI implementation of a 4-Tb/s single-stage switch. It is based on a combined input- and crosspoint-queued structure with virtual output queuing at the ingress, which has the scalability of input-buffered switches and the performance of output-buffered switches. Our system handles the large fabric-internal...
Conference Paper
Full-text available
Traditional improvements in packet switch architecture are aimed at increasing switch performance in terms of utilization, fairness and QoS. This paper focuses on improving the architecture to achieve implementation feasibility of terabit aggregate data rates while maintaining such performance. Terabit class shared-memory switch chips are simple in...
Article
A method for allocating pending requests for data packet transmission at a number of inputs to a number of outputs of a switching system in successive time slots, including a matching method including the steps of providing a first request information in a first time slot indicating data packets at the inputs requesting transmission to the outputs...

Network

Cited By