ChapterPDF Available

Efficient Analysis of Cyclic Redundancy Architectures via Boolean Fault Propagation

Authors:

Abstract

Many safety critical systems guarantee fault-tolerance by using several redundant copies of their components. When designing such redundancy architectures, it is crucial to analyze their fault trees, which describe combinations of faults of individual components that may cause malfunction of the system. State-of-the-art techniques for fault tree computation use first-order formulas with uninterpreted functions to model the transformations of signals performed by the redundancy system and an AllSMT query for computation of the fault tree from this encoding. Scalability of the analysis can be further improved by techniques such as predicate abstraction, which reduces the problem to Boolean case. In this paper, we show that as far as fault trees of redundancy architectures are concerned, signal transformation can be equivalently viewed in a purely Boolean way as fault propagation. This alternative view has important practical consequences. First, it applies also to general redundancy architectures with cyclic dependencies among components, to which the current state-of-the-art methods based on AllSMT are not applicable, and which currently require expensive sequential reasoning. Second, it allows for a simpler encoding of the problem and usage of efficient algorithms for analysis of fault propagation, which can significantly improve the runtime of the analyses. A thorough experimental evaluation demonstrates the superiority of the proposed techniques.
28th International Conference, TACAS 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 2–7, 2022
Proceedings, Part II
Tools and Algorithms
for the Construction
and Analysis of Systems
LNCS 13244 ARCoSS
DanaFisman
GrigoreRosu (Eds.)
Lecture Notes in Computer Science 13244
Founding Editors
Gerhard Goos, Germany
Juris Hartmanis, USA
Editorial Board Members
Elisa Bertino, USA
Wen Gao, China
Bernhard Steffen , Germany
Gerhard Woeginger , Germany
Moti Yung , USA
Advanced Research in Computing and Software Science
Subline of Lecture Notes in Computer Science
Subline Series Editors
Giorgio Ausiello, University of Rome La Sapienza, Italy
Vladimiro Sassone, University of Southampton, UK
Subline Advisory Board
Susanne Albers, TU Munich, Germany
Benjamin C. Pierce, University of Pennsylvania, USA
Bernhard Steffen ,University of Dortmund, Germany
Deng Xiaotie, Peking University, Beijing, China
Jeannette M. Wing, Microsoft Research, Redmond, WA, USA
More information about this series at https://link.springer.com/bookseries/558
Dana Fisman Grigore Rosu (Eds.)
Tools and Algorithms
for the Construction
and Analysis of Systems
28th International Conference, TACAS 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 27, 2022
Proceedings, Part II
123
Editors
Dana Fisman
Ben-Gurion University of the Negev
Beer Sheva, Israel
Grigore Rosu
University of Illinois Urbana-Champaign
Urbana, IL, USA
ISSN 0302-9743 ISSN 1611-3349 (electronic)
Lecture Notes in Computer Science
ISBN 978-3-030-99526-3 ISBN 978-3-030-99527-0 (eBook)
https://doi.org/10.1007/978-3-030-99527-0
©The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the books Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the books Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specic statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ETAPS Foreword
Welcome to the 25th ETAPS! ETAPS 2022 took place in Munich, the beautiful capital
of Bavaria, in Germany.
ETAPS 2022 is the 25th instance of the European Joint Conferences on Theory and
Practice of Software. ETAPS is an annual federated conference established in 1998,
and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each
conference has its own Program Committee (PC) and its own Steering Committee
(SC). The conferences cover various aspects of software systems, ranging from theo-
retical computer science to foundations of programming languages, analysis tools, and
formal approaches to software engineering. Organizing these conferences in a coherent,
highly synchronized conference program enables researchers to participate in an
exciting event, having the possibility to meet many colleagues working in different
directions in the eld, and to easily attend talks of different conferences. On the
weekend before the main conference, numerous satellite workshops took place that
attract many researchers from all over the globe.
ETAPS 2022 received 362 submissions in total, 111 of which were accepted,
yielding an overall acceptance rate of 30.7%. I thank all the authors for their interest in
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con-
tributions, and in particular the PC (co-)chairs for their hard work in running this entire
intensive process. Last but not least, my congratulations to all authors of the accepted
papers!
ETAPS 2022 featured the unifying invited speakers Alexandra Silva (University
College London, UK, and Cornell University, USA) and TomášVojnar (Brno
University of Technology, Czech Republic) and the conference-specic invited
speakers Nathalie Bertrand (Inria Rennes, France) for FoSSaCS and Lenore Zuck
(University of Illinois at Chicago, USA) for TACAS. Invited tutorials were provided by
Stacey Jeffery (CWI and QuSoft, The Netherlands) on quantum computing and
Nicholas Lane (University of Cambridge and Samsung AI Lab, UK) on federated
learning.
As this event was the 25th edition of ETAPS, part of the program was a special
celebration where we looked back on the achievements of ETAPS and its constituting
conferences in the past, but we also looked into the future, and discussed the challenges
ahead for research in software science. This edition also reinstated the ETAPS men-
toring workshop for PhD students.
ETAPS 2022 took place in Munich, Germany, and was organized jointly by the
Technical University of Munich (TUM) and the LMU Munich. The former was
founded in 1868, and the latter in 1472 as the 6th oldest German university still running
today. Together, they have 100,000 enrolled students, regularly rank among the top
100 universities worldwide (with TUMs computer-science department ranked #1 in
the European Union), and their researchers and alumni include 60 Nobel laureates.
The local organization team consisted of Jan Křetínský(general chair), Dirk Beyer
(general, nancial, and workshop chair), Julia Eisentraut (organization chair), and
Alexandros Evangelidis (local proceedings chair).
ETAPS 2022 was further supported by the following associations and societies:
ETAPS e.V., EATCS (European Association for Theoretical Computer Science),
EAPLS (European Association for Programming Languages and Systems), and EASST
(European Association of Software Science and Technology).
The ETAPS Steering Committee consists of an Executive Board, and representa-
tives of the individual ETAPS conferences, as well as representatives of EATCS,
EAPLS, and EASST. The Executive Board consists of Holger Hermanns
(Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofroň(Prague), Barbara König
(Duisburg), Thomas Noll (Aachen), Caterina Urban (Paris), Tarmo Uustalu (Reykjavik
and Tallinn), and Lenore Zuck (Chicago).
Other members of the Steering Committee are Patricia Bouyer (Paris), Einar Broch
Johnsen (Oslo), Dana Fisman (Beer Sheva), Reiko Heckel (Leicester), Joost-Pieter
Katoen (Aachen and Twente), Fabrice Kordon (Paris), Jan Křetínský(Munich), Orna
Kupferman (Jerusalem), Leen Lambers (Cottbus), Tiziana Margaria (Limerick),
Andrew M. Pitts (Cambridge), Elizabeth Polgreen (Edinburgh), Grigore Roşu (Illinois),
Peter Ryan (Luxembourg), Sriram Sankaranarayanan (Boulder), Don Sannella
(Edinburgh), Lutz Schröder (Erlangen), Ilya Sergey (Singapore), Natasha Sharygina
(Lugano), Pawel Sobocinski (Tallinn), Peter Thiemann (Freiburg), Sebastián Uchitel
(London and Buenos Aires), Jan Vitek (Prague), Andrzej Wasowski (Copenhagen),
Thomas Wies (New York), Anton Wijs (Eindhoven), and Manuel Wimmer (Linz).
Id like to take this opportunity to thank all authors, attendees, organizers of the
satellite workshops, and Springer-Verlag GmbH for their support. I hope you all
enjoyed ETAPS 2022.
Finally, a big thanks to Jan, Julia, Dirk, and their local organization team for all their
enormous efforts to make ETAPS a fantastic event.
February 2022 Marieke Huisman
ETAPS SC Chair
ETAPS e.V. President
vi ETAPS Foreword
Preface
TACAS 2022 was the 28th edition of the International Conference on Tools and
Algorithms for the Construction and Analysis of Systems. TACAS 2022 was part of the
25th European Joint Conferences on Theory and Practice of Software (ETAPS 2022),
which was held from April 2 to April 7 in Munich, Germany, as well as online due to the
COVID-19 pandemic. TACAS is a forum for researchers, developers, and users inter-
ested in rigorous tools and algorithms for the construction and analysis of systems. The
conference aims to bridge the gaps between different communities with this common
interest and to support them in their quest to improve the utility, reliability, exibility,
and efciency of tools and algorithms for building computer-controlled systems.
There were four submission categories for TACAS 2022:
1. Research papers advancing the theoretical foundations for the construction and
analysis of systems.
2. Case study papers with an emphasis on a real-world setting.
3. Regular tool papers presenting a new tool, a new tool component, or novel
extensions to an existing tool.
4. Tool demonstration papers focusing on the usage aspects of tools.
Papers of categories 13 were restricted to 16 pages, and papers of category 4 to six
pages.
This year 159 papers were submitted to TACAS, consisting of 112 research papers,
ve case study papers, 33 regular tool papers, and nine tool demo papers. Authors were
allowed to submit up to four papers. Each paper was reviewed by three Program
Committee (PC) members, who made use of subreviewers. Similarly to previous years,
it was possible to submit an artifact alongside a paper, which was mandatory for regular
tool and tool demo papers.
An artifact might consist of a tool, models, proofs, or other data required for vali-
dation of the results of the paper. The Artifact Evaluation Committee (AEC) was tasked
with reviewing the artifacts based on their documentation, ease of use, and, most
importantly, whether the results presented in the corresponding paper could be accu-
rately reproduced. Most of the evaluation was carried out using a standardized virtual
machine to ensure consistency of the results, except for those artifacts that had special
hardware or software requirements. The evaluation consisted of two rounds. The rst
round was carried out in parallel with the work of the PC. The judgment of the AEC
was communicated to the PC and weighed in their discussion. The second round took
place after paper acceptance notications were sent out; authors of accepted research
papers who did not submit an artifact in the rst round could submit their artifact at this
time. In total, 86 artifacts were submitted (79 in the rst round and seven in the second)
and evaluated by the AEC regarding their availability, functionality, and/or reusability.
Papers with an artifact that was successfully evaluated include one or more badges on
the rst page, certifying the respective properties.
Selected authors were requested to provide a rebuttal for both papers and artifacts in
case a review gave rise to questions. Using the review reports and rebuttals, the
Program and the Artifact Evaluation Committees extensively discussed the papers and
artifacts and ultimately decided to accept 33 research papers, one case study, 12 tool
papers, and four tool demos.
This corresponds to an acceptance rate of 29.46% for research papers and an overall
acceptance rate of 31.44%.
Besides the regular conference papers, this two-volume proceedings also contains
16 short papers that describe the participating verication systems and a competition
report presenting the results of the 11th SV-COMP, the competition on automatic
software veriers for C and Java programs. These papers were reviewed by a separate
Program Committee (PC); each of the papers was assessed by at least three reviewers.
A total of 47 verication systems with developers from 11 countries entered the sys-
tematic comparative evaluation, including four submissions from industry. Two ses-
sions in the TACAS program were reserved for the presentation of the results: (1) a
summary by the competition chair and of the participating tools by the developer teams
in the rst session, and (2) an open community meeting in the second session.
We would like to thank all the people who helped to make TACAS 2022 successful.
First, we would like to thank the authors for submitting their papers to TACAS 2022.
The PC members and additional reviewers did a great job in reviewing papers: they
contributed informed and detailed reports and engaged in the PC discussions. We also
thank the steering committee, and especially its chair, Joost-Pieter Katoen, for his
valuable advice. Lastly, we would like to thank the overall organization team of
ETAPS 2022.
April 2022 Dana Fisman
Grigore Rosu
PC Chairs
Swen Jacobs
Andrew Reynolds
AEC Chairs, Tools, and Case-study Chairs
Dirk Beyer
Competition Chair
viii Preface
Organization
Program Committee
Parosh Aziz Abdulla Uppsala University, Sweden
Luca Aceto Reykjavik University, Iceland
Timos Antonopoulos Yale University, USA
Saddek Bensalem Verimag, France
Dirk Beyer LMU Munich, Germany
Nikolaj Bjorner Microsoft, USA
Jasmin Blanchette Vrije Universiteit Amsterdam, The Netherlands
Udi Boker Interdisciplinary Center Herzliya, Israel
Hana Chockler Kings College London, UK
Rance Cleaveland University of Maryland, USA
Alessandro Coglio Kestrel Institute, USA
Pedro R. DArgenio Universidad Nacional de Córdoba, Argentina
Javier Esparza Technical University of Munich, Germany
Bernd Finkbeiner CISPA Helmholtz Center for Information Security,
Germany
Dana Fisman (Chair) Ben-Gurion University, Israel
Martin Fränzle University of Oldenburg, Germany
Felipe Gorostiaga IMDEA Software Institute, Spain
Susanne Graf UniversitéJoseph Fourier, France
Radu Grosu Stony Brook University, USA
Arie Gurnkel University of Waterloo, Canada
Klaus Havelund Jet Propulsion Laboratory, USA
Holger Hermanns Saarland University, Germany
Falk Howar TU Clausthal / IPSSE, Germany
Swen Jacobs CISPA Helmholtz Center for Information Security,
Germany
Ranjit Jhala University of California, San Diego, USA
Jan Kretinsky Technical University of Munich, Germany
Viktor Kuncak Ecole Polytechnique Fédérale de Lausanne,
Switzerland
Kim Larsen Aalborg University, Denmark
Konstantinos Mamouras Rice University, USA
Daniel Neider Max Planck Institute for Software Systems, Germany
Dejan Nickovic AIT Austrian Institute of Technology, Austria
Corina Pasareanu Carnegie Mellon University, NASA, and KBR, USA
Doron Peled Bar Ilan University, Israel
Anna Philippou University of Cyprus, Cyprus
Andrew Reynolds University of Iowa, USA
Grigore Rosu (Chair) University of Illinois at Urbana-Champaign, USA
Kristin Yvonne Rozier Iowa State University, USA
Cesar Sanchez IMDEA Software Institute, Spain
Sven Schewe University of Liverpool, UK
Natasha Sharygina Universitàdella Svizzera italiana, Italy
Jan Strejček Masaryk University, Czech Republic
Cesare Tinelli University of Iowa, USA
Stavros Tripakis Northeastern University, USA
Frits Vaandrager Radboud University, The Netherlands
Tomas Vojnar Brno University of Technology, Czech Republic
Christoph M. Wintersteiger Microsoft, USA
Lijun Zhang Institute of Software, Chinese Academy of Sciences,
China
Lingming Zhang University of Illinois at Urbana-Champaign, USA
Lenore Zuck University of Illinois at Chicago, USA
Artifact Evaluation Committee
Pavel Andrianov Ivannikov Institute for System Programming
of the RAS, Russia
Michael Backenköhler Saarland University, Germany
Sebastian Biewer Saarland University, Germany
Benjamin Bisping TU Berlin, Germany
Olav Bunte Eindhoven University of Technology, The Netherlands
Damien Busatto-Gaston UniversitéLibre de Bruxelles, Belgium
Marek Chalupa IST Austria, Austria, and Masaryk University,
Czech Republic
Priyanka Darke Tata Consultancy Services, India
Alexandre Duret-Lutz LRDE, France
Shenghua Feng Institute of Software, Chinese Academy of Sciences,
Beijing, China
Mathias Fleury University of Freiburg, Germany
Kush Grover Technical University of Munich, Germany
Dominik Harmim Brno University of Technology, Czech Republic
Swen Jacobs (Chair) CISPA Helmholtz Center for Information Security,
Germany
Xiangyu Jin Institute of Software, Chinese Academy of Sciences
Juraj SičMasaryk University, Czech Republic
Daniela Kaufmann Johannes Kepler University Linz, Austria
Maximilian Alexander Köhl Saarland University, Germany
Mitja Kulczynski Kiel University, Germany
Maurice Laveaux Eindhoven University of Technology, The Netherlands
Yong Li Institute of Software, Chinese Academy of Sciences,
China
Debasmita Lohar Max Planck Institute for Software Systems, Germany
Makai Mann Stanford University, USA
x Organization
Fabian Meyer RWTH Aachen University, Germany
Stefanie Mohr Technical University of Munich, Germany
Malte Mues TU Dortmund, Germany
Yuki Nishida Kyoto University, Japan
Philip Offtermatt Universitéde Sherbrooke, Canada
Muhammad Osama Eindhoven University of Technology, The Netherlands
JiříPavela Brno University of Technology, Czech Republic
Adrien Pommellet LRDE, France
Mathias Preiner Stanford University, USA
JoséProença CISTER-ISEP and HASLab-INESC TEC, Portugal
Tim Quatmann RWTH Aachen University, Germany
Etienne Renault LRDE, France
Andrew Reynolds (Chair) University of Iowa, USA
Mouhammad Sakr University of Luxembourg, Luxembourg
Morten Konggaard Schou Aalborg University, Denmark
Philipp Schlehuber-Caissier LRDE, France
Hans-Jörg Schurr Inria Nancy - Grand Est, France
Michael Schwarz Technische UniversitätMünchen, Germany
Joseph Scott University of Waterloo, Canada
Ali Shamakhi Tehran Institute for Advanced Studies, Iran
Lei Shi University of Pennsylvania, USA
Matthew Sotoudeh University of California, Davis, USA
Jip Spel RWTH Aachen University, Germany
Veronika ŠokováBrno University of Technology, Czech Republic
Program Committee and Jury SV-COMP
Fatimah Aljaafari University of Manchester, UK
Lei Bu Nanjing University, China
Thomas Bunk LMU Munich, Germany
Marek Chalupa Masaryk University, Czech Republic
Priyanka Darke Tata Consultancy Services, India
Daniel Dietsch University of Freiburg, Germany
Gidon Ernst LMU Munich, Germany
Fei He Tsinghua University, China
Matthias Heizmann University of Freiburg, Germany
Jera Hensel RWTH Aachen University, Germany
Falk Howar TU Dortmund, Germany
Soha Hussein University of Minnesota, USA
Dominik Klumpp University of Freiburg, Germany
Henrich Lauko Masaryk University, Czech Republic
Will Leeson University of Virginia, USA
Xie Li Chinese Academy of Sciences, China
Viktor Malík Brno University of Technology, Czech Republic
Raveendra Kumar
Medicherla
Tata Consultancy Services, India
Organization xi
Rafael SáMenezes University of Manchester, UK
Vince Molnár Budapest University of Technology and Economics,
Hungary
Hernán Ponce de León Bundeswehr University Munich, Germany
Cedric Richter University of Oldenburg, Germany
Simmo Saan University of Tartu, Estonia
Emerson Sales Gran Sasso Science Institute, Italy
Peter Schrammel University of Sussex and Diffblue, UK
Frank Schüssele University of Freiburg, Germany
Ryan Scott Galois, USA
Ali Shamakhi Tehran Institute for Advanced Studies, Iran
Martin Spiessl LMU Munich, Germany
Michael Tautschnig Queen Mary University of London, UK
Anton Vasilyev ISP RAS, Russia
Vesal Vojdani University of Tartu, Estonia
Steering Committee
Dirk Beyer Ludwig-Maximilians-UniversitätMünchen, Germany
Rance Cleaveland University of Maryland, USA
Holger Hermanns Universität des Saarlandes, Germany
Joost-Pieter Katoen (Chair) RWTH Aachen University, Germany, and Universiteit
Twente, The Netherlands
Kim G. Larsen Aalborg University, Denmark
Bernhard Steffen Technische Universität Dortmund, Germany
Additional Reviewers
Abraham, Erika
Aguilar, Edgar
Akshay, S.
Asadi, Sepideh
Attard, Duncan
Avni, Guy
Azeem, Muqsit
Bacci, Giorgio
Balasubramanian, A. R.
Barbanera, Franco
Bard, Joachim
Basset, Nicolas
Bendík, Jaroslav
Berani Abdelwahab, Erzana
Beutner, Raven
Bhandary, Shrajan
Biewer, Sebastian
Blicha, Martin
Brandstätter, Andreas
Bright, Curtis
Britikov, Konstantin
Brunnbauer, Axel
Capretto, Margarita
Castiglioni, Valentina
Castro, Pablo
Ceska, Milan
Chadha, Rohit
Chalupa, Marek
Changshun, Wu
Chen, Xiaohong
Cruciani, Emilio
Dahmen, Sander
Dang, Thao
Danielsson, Luis Miguel
xii Organization
Degiovanni, Renzo
DellErba, Daniele
Demasi, Ramiro
Desharnais, Martin
Dierl, Simon
Dubslaff, Clemens
Egolf, Derek
Evangelidis, Alexandros
Fedyukovich, Grigory
Fiedor, Jan
Fitzpatrick, Stephen
Fleury, Mathias
Frenkel, Hadar
Gamboa Guzman, Laura P.
Garcia-Contreras, Isabel
Gianola, Alessandro
Goorden, Martijn
Gorostiaga, Felipe
Gorrieri, Roberto
Grahn, Samuel
Grastien, Alban
Grover, Kush
Grünbacher, Sophie
Guha, Shibashis
Gutiérrez Brida, Simón Emmanuel
Havlena, Vojtěch
He, Jie
Helfrich, Martin
Henkel, Elisabeth
Hicks, Michael
Hirschkoff, Daniel
Hofmann, Jana
Hojjat, Hossein
Holík, Lukáš
Hospodár, Michal
Huang, Chao
Hyvärinen, Antti
Inverso, Omar
Itzhaky, Shachar
Jaksic, Stefan
Jansen, David N.
Jin, Xiangyu
Jonas, Martin
Kanav, Sudeep
Karra, Shyam Lal
Katsaros, Panagiotis
Kempa, Brian
Klauck, Michaela
Kreitz, Christoph
Kröger, Paul
Köhl, Maximilian Alexander
König, Barbara
Lahijanian, Morteza
Larraz, Daniel
Le, Nham
Lemberger, Thomas
Lengal, Ondrej
Li, Chunxiao
Li, Jianlin
Lorber, Florian
Lung, David
Luppen, Zachary
Lybech, Stian
Major, Juraj
Manganini, Giorgio
McCarthy, Eric
Mediouni, Braham Lot
Meggendorfer, Tobias
Meira-Goes, Romulo
Melcer, Daniel
Metzger, Niklas
Milovancevic, Dragana
Mohr, Stefanie
Najib, Muhammad
Noetzli, Andres
Nouri, Ayoub
Offtermatt, Philip
Otoni, Rodrigo
Paoletti, Nicola
Parizek, Pavel
Parker, Dave
Parys, Paweł
Passing, Noemi
Perez Dominguez, Ivan
Perez, Guillermo
Pinna, G. Michele
Pous, Damien
Priya, Siddharth
Putruele, Luciano
Pérez, Jorge A.
Qu, Meixun
Raskin, Mikhail
Organization xiii
Rauh, Andreas
Reger, Giles
Reynouard, Raphaël
Riener, Heinz
Rogalewicz, Adam
Roy, Rajarshi
Ruemmer, Philipp
Ruijters, Enno
Schilling, Christian
Schmitt, Frederik
Schneider, Tibor
Scholl, Christoph
Schultz, William
Schupp, Stefan
Schurr, Hans-Jörg
Schwammberger, Maike
Shaei, Nastaran
Siber, Julian
Sickert, Salomon
Singh, Gagandeep
Smith, Douglas
Somenzi, Fabio
Stewing, Richard
Stock, Gregory
Su, Yusen
Tang, Qiyi
Tibo, Alessandro
Treer, Richard
Trtík, Marek
Turrini, Andrea
Vaezipoor, Pashootan
van Dijk, Tom
Vašíček, Ondřej
Vediramana Krishnan, Hari Govind
Wang, Wenxi
Wendler, Philipp
Westfold, Stephen
Winter, Stefan
Wolovick, Nicolás
Yakusheva, Sophia
Yang, Pengfei
Zeljić, Aleksandar
Zhou, Yuhao
Zimmermann, Martin
xiv Organization
Contents Part II
Probabilistic Systems
A Probabilistic Logic for Verifying Continuous-time Markov Chains. . . . . . . 3
Ji Guan and Nengkun Yu
Under-Approximating Expected Total Rewards in POMDPs . . . . . . . . . . . . . 22
Alexander Bork, Joost-Pieter Katoen, and Tim Quatmann
Correct Probabilistic Model Checking with Floating-Point Arithmetic . . . . . . 41
Arnd Hartmanns
Correlated Equilibria and Fairness in Concurrent Stochastic Games . . . . . . . . 60
Marta Kwiatkowska, Gethin Norman, David Parker, and Gabriel Santos
Omega Automata
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games . . . . . . . . 81
Tamajit Banerjee, Rupak Majumdar, Kaushik Mallik,
Anne-Kathrin Schmuck, and Sadegh Soudjani
Practical Applications of the Alternating Cycle Decomposition . . . . . . . . . . . 99
Antonio Casares, Alexandre Duret-Lutz, Klara J. Meyer,
Florian Renkin, and Salomon Sickert
Sky Is Not the Limit: Tighter Rank Bounds for Elevator Automata in Büchi
Automata Complementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Vojtěch Havlena, Ondřej Lengál, and Barbora Šmahlíková
On-The-Fly Solving for Symbolic Parity Games . . . . . . . . . . . . . . . . . . . . . 137
Maurice Laveaux, Wieger Wesselink, and Tim A. C. Willemse
Equivalence Checking
Distributed Coalgebraic Partition Refinement . . . . . . . . . . . . . . . . . . . . . . . 159
Fabian Birkmann, Hans-Peter Deifel, and Stefan Milius
From Bounded Checking to Verification of Equivalence via Symbolic
Up-to Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Vasileios Koutavas, Yu-Yang Lin, and Nikos Tzevelekos
Equivalence Checking for Orthocomplemented Bisemilattices
in Log-Linear Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Simon Guilloud and Viktor Kunčak
Monitoring and Analysis
A Theoretical Analysis of Random Regression Test Prioritization . . . . . . . . . 217
Pu Yi, Hao Wang, Tao Xie, Darko Marinov, and Wing Lam
Verified First-Order Monitoring with Recursive Rules . . . . . . . . . . . . . . . . . 236
Sheila Zingg, Srđan Krstić, Martin Raszyk, Joshua Schneider,
and Dmitriy Traytel
Maximizing Branch Coverage with Constrained Horn Clauses . . . . . . . . . . . 254
Ilia Zlatkin and Grigory Fedyukovich
Efficient Analysis of Cyclic Redundancy Architectures via Boolean
Fault Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Marco Bozzano, Alessandro Cimatti, Alberto Griggio, and Martin Jonáš
Tools | Optimizations, Repair and Explainability
Adiar Binary Decision Diagrams in External Memory . . . . . . . . . . . . . . . . . 295
Steffan Christ Sølvsten, Jaco van de Pol, Anna Blume Jakobsen,
and Mathias Weller Berg Thomasen
Forest GUMP: A Tool for Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Alnis Murtovi, Alexander Bainczyk, and Bernhard Steffen
ALPINIST: An Annotation-Aware GPU Program Optimizer. . . . . . . . . . . . . . . 332
Ömer Şakar, Mohsen Safari, Marieke Huisman, and Anton Wijs
Automatic Repair for Network Programs . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Lei Shi, Yuepeng Wang, Rajeev Alur, and Boon Thau Loo
11th Competition on Software Verification: SV-COMP 2022
Progress on Software Verification: SV-COMP 2022 . . . . . . . . . . . . . . . . . . 375
Dirk Beyer
AProVE: Non-Termination Witnesses for CPrograms:
(Competition Contribution). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Jera Hensel, Constantin Mensendiek, and Jürgen Giesl
xvi Contents Part II
BRICK: Path Enumeration Based Bounded Reachability Checking
of C Program (Competition Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . 408
Lei Bu, Zhunyi Xie, Lecheng Lyu, Yichao Li, Xiao Guo, Jianhua Zhao,
and Xuandong Li
A Prototype for Data Race Detection in CSeq 3: (Competition
Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Alex Coto, Omar Inverso, Emerson Sales, and Emilio Tuosto
DARTAGNAN: SMT-based Violation Witness Validation (Competition
Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
Hernán Ponce-de-León, Thomas Haas, and Roland Meyer
Deagle: An SMT-based Verifier for Multi-threaded Programs
(Competition Contribution). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Fei He, Zhihang Sun, and Hongyu Fan
The Static Analyzer Frama-C in SV-COMP (Competition Contribution). . . . . 429
Dirk Beyer and Martin Spiessl
GDART: An Ensemble of Tools for Dynamic Symbolic Execution
on the Java Virtual Machine (Competition Contribution) . . . . . . . . . . . . . . . 435
Malte Mues and Falk Howar
Graves-CPA: A Graph-Attention Verifier Selector (Competition
Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Will Leeson and Matthew B. Dwyer
GWIT: A Witness Validator for Java based on GraalVM (Competition
Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
Falk Howar and Malte Mues
The Static Analyzer Infer in SV-COMP (Competition Contribution) . . . . . . . 451
Matthias Kettl and Thomas Lemberger
LART: Compiled Abstract Execution: (Competition Contribution). . . . . . . . . 457
Henrich Lauko and Petr Ročkai
SYMBIOTIC 9: String Analysis and Backward Symbolic Execution with Loop
Folding: (Competition Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Marek Chalupa, Vincent Mihalkovič, Anna Řechtáčková, LukášZaoral,
and Jan Strejček
SYMBIOTIC-WITCH:AKLEE-Based Violation Witness Checker:
(Competition Contribution). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Paulína Ayaziová, Marek Chalupa, and Jan Strejček
Contents Part II xvii
THETA: portfolio of CEGAR-based analyses with dynamic algorithm
selection (Competition Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Zsófia Ádám, Levente Bajczi, Mihály Dobos-Kovács, Ákos Hajdu,
and Vince Molnár
ULTIMATE GEMCUTTER and the Axes of Generalization: (Competition
Contribution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Dominik Klumpp, Daniel Dietsch, Matthias Heizmann, Frank Schüssele,
Marcel Ebbinghaus, Azadeh Farzan, and Andreas Podelski
Wit4Java: A Violation-Witness Validator for Java Verifiers
(Competition Contribution). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
Tong Wu, Peter Schrammel, and Lucas C. Cordeiro
Author Index ............................................ 491
xviii Contents Part II
Contents Part I
Synthesis
HOLL: Program Synthesis for Higher Order Logic Locking . . . . . . . . . . . . . 3
Gourav Takhar, Ramesh Karri, Christian Pilato, and Subhajit Roy
The Complexity of LTL Rational Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 25
Orna Kupferman and Noam Shenwald
Synthesis of Compact Strategies for Coordination Programs . . . . . . . . . . . . . 46
Kedar S. Namjoshi and Nisarg Patel
ZDD Boolean Synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Yi Lin, Lucas M. Tabajara, and Moshe Y. Vardi
Verification
Comparative Verification of the Digital Library of Mathematical Functions
and Computer Algebra Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
AndréGreiner-Petter, Howard S. Cohl, Abdou Youssef,
Moritz Schubotz, Avi Trost, Rajen Dey, Akiko Aizawa, and Bela Gipp
Verifying Fortran Programs with CIVL . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Wenhao Wu, Jan Hückelheim, Paul D. Hovland, and Stephen F. Siegel
NORMA: a tool for the analysis of Relay-based Railway
Interlocking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Arturo Amendola, Anna Becchi, Roberto Cavada, Alessandro Cimatti,
Andrea Ferrando, Lorenzo Pilati, Giuseppe Scaglione,
Alberto Tacchella, and Marco Zamboni
Efficient Neural Network Analysis with Sum-of-Infeasibilities . . . . . . . . . . . 143
Haoze Wu, Aleksandar Zeljić, Guy Katz, and Clark Barrett
Blockchain
Formal Verification of the Ethereum 2.0 Beacon Chain . . . . . . . . . . . . . . . . 167
Franck Cassez, Joanne Fuller, and Aditya Asgaonkar
Fast and Reliable Formal Verification of Smart Contracts
with the Move Prover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
David Dill, Wolfgang Grieskamp, Junkil Park, Shaz Qadeer, Meng Xu,
and Emma Zhong
A Max-SMT Superoptimizer for EVM handling Memory and Storage . . . . . . 201
Elvira Albert, Pablo Gordillo, Alejandro Hernández-Cerezo,
and Albert Rubio
Grammatical Inference
A New Approach for Active Automata Learning Based on Apartness . . . . . . 223
Frits Vaandrager, Bharat Garhewal, Jurriaan Rot,
and Thorsten Wißmann
Learning Realtime One-Counter Automata . . . . . . . . . . . . . . . . . . . . . . . . . 244
Véronique Bruyère, Guillermo A. Pérez, and Gaëtan Staquet
Scalable Anytime Algorithms for Learning Fragments of Linear
Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Ritam Raha, Rajarshi Roy, Nathanaël Fijalkow, and Daniel Neider
Learning Model Checking and the Kernel Trick for Signal Temporal Logic
on Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Luca Bortolussi, Giuseppe Maria Gallo, Jan Křetínský, and Laura Nenzi
Verification Inference
Inferring Interval-Valued Floating-Point Preconditions . . . . . . . . . . . . . . . . . 303
Jonas Krämer, Lionel Blatter, Eva Darulova, and Mattias Ulbrich
NeuReach: Learning Reachability Functions from Simulations . . . . . . . . . . . 322
Dawei Sun and Sayan Mitra
Inferring Invariants with Quantifier Alternations: Taming the Search
Space Explosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Jason R. Koenig, Oded Padon, Sharon Shoham, and Alex Aiken
LinSyn: Synthesizing Tight Linear Bounds for Arbitrary Neural Network
Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Brandon Paulsen and Chao Wang
Short papers
Kmclib: Automated Inference and Verification of Session Types from
OCaml Programs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Keigo Imai, Julien Lange, and Rumyana Neykova
Automated Translation of Natural Language Requirements
to Runtime Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Ivan Perez, Anastasia Mavridou, Tom Pressburger, Alwyn Goodloe,
and Dimitra Giannakopoulou
xx Contents Part I
MaskD: A Tool for Measuring Masking Fault-Tolerance . . . . . . . . . . . . . . . 396
Luciano Putruele, Ramiro Demasi, Pablo F. Castro,
and Pedro R. DArgenio
Better Counterexamples for Dafny. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Aleksandar Chakarov, Aleksandr Fedchin, Zvonimir Rakamarić,
and Neha Rungta
Constraint Solving
cvc5: A Versatile and Industrial-Strength SMT Solver . . . . . . . . . . . . . . . . . 415
Haniel Barbosa, Clark Barrett, Martin Brain, Gereon Kremer,
Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed,
Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir,
Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli,
and Yoni Zohar
Clausal Proofs for Pseudo-Boolean Reasoning . . . . . . . . . . . . . . . . . . . . . . 443
Randal E. Bryant, Armin Biere, and Marijn J. H. Heule
Moving Definition Variables in Quantified Boolean Formulas. . . . . . . . . . . . 462
Joseph E. Reeves, Marijn J. H. Heule, and Randal E. Bryant
A Sorted Datalog Hammer for Supervisor Verification Conditions Modulo
Simple Linear Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
Martin Bromberger, Irina Dragoste, Rasha Faqeh, Christof Fetzer,
Larry González, Markus Krötzsch, Maximilian Marx, Harish K Murali,
and Christoph Weidenbach
Model Checking and Verification
Property Directed Reachability for Generalized Petri Nets . . . . . . . . . . . . . . 505
Nicolas Amat, Silvano Dal Zilio, and Thomas Hujsa
Transition Power Abstractions for Deep Counterexample Detection . . . . . . . . 524
Martin Blicha, Grigory Fedyukovich, Antti E. J. Hyvärinen,
and Natasha Sharygina
Searching for Ribbon-Shaped Paths in Fair Transition Systems . . . . . . . . . . . 543
Marco Bozzano, Alessandro Cimatti, Stefano Tonetta,
and Viktoria Vozarova
CoVeriTeam: On-Demand Composition of Cooperative
Verification Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
Dirk Beyer and Sudeep Kanav
Author Index ............................................ 581
Contents Part I xxi
Probabilistic Systems
A Probabilistic Logic for Verifying
Continuous-time Markov Chains
Ji Guan1and Nengkun Yu2()
1State Key Laboratory of Computer Science, Institute of Software, Chinese
Academy of Sciences, Beijing, China.
guanji1992@gmail.com
2Centre for Quantum Software and Information, University of Technology Sydney,
Sydney, Australia.
nengkunyu@gmail.com
Abstract. A continuous-time Markov chain (CTMC) execution is a con-
tinuous class of probability distributions over states. This paper proposes
a probabilistic linear-time temporal logic, namely continuous-time linear
logic (CLL), to reason about the probability distribution execution of
CTMCs. We define the syntax of CLL on the space of probability dis-
tributions. The syntax of CLL includes multiphase timed until formulas,
and the semantics of CLL allows time reset to study relatively temporal
properties. We derive a corresponding model-checking algorithm for CLL
formulas. The correctness of the model-checking algorithm depends on
Schanuel’s conjecture, a central open problem in transcendental num-
ber theory. Furthermore, we provide a running example of CTMCs to
illustrate our method.
1 Introduction
As a popular model of probabilistic continuous-time systems, continuous-time
Markov chains (CTMCs) have been extensively studied since Kolmogorov [25].
In the recent 20 years, probabilistic continuous-time model checking receives
much attention. Adopting probabilistic computational tree logic (PCTL) [22] to
this context with extra multiphase timed until formulas Φ1UT1Φ2···UTKΦK+1,
for state formula Φand time interval T, Aziz et al. proposed continuous stochas-
tic logic (CSL) to specify the branching-time properties of CTMCs and the
model-checking problem for CSL is decidable [8]. After that, efficient model-
checking algorithms were developed by transient analysis of CTMCs using uni-
formization [9] and stratification [41] for a restricted version (path formulas are
restricted to single until formulas Φ1UIΦ2) and a full version of CSL, respec-
tively. These algorithms have been practically implemented in model checkers
PRISM [26], MRMC [24] and STORM [18]. Further details can be found in an
excellent survey [23].
There are also different ways to specify the linear-time properties of CTMCs.
Timed automata were first used to achieve this task [11,13,14,15,19], and then
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 3–21, 2022.
https://doi.org/10.1007/978-3-030-99527-0_1
metric temporal logic (MTL) [12] was also considered in this context. Subse-
quently, the probability of “the system being in state s0within five-time units
after having continuously remained in state s1 can be computed. However, some
statements cannot be specified and verified because of the lack of a probabilistic
linear-time temporal logic, for instance “the system being in state s0with high
probability (0.9) within five-time units after having continuously remained in
state s1with low probability (0.1)”. Furthermore, this probabilistic property
cannot be expressed by CSL because CSL cannot express properties that are
defined across several state transitions of the same time length in the execution
of a CTMC.
In this paper, targeting to express the mentioned probabilistic linear-time
properties, we introduce continuous-time linear logic (CLL). In particular, we
adopt the viewpoint used in [2] by regarding CTMCs as transformers of prob-
ability distributions over states. CLL studies the properties of the probability
distribution execution generated by a given initial probability distribution over
time. By the fundamental difference between the views of state executions and
probability distribution executions of CTMCs, CLL and CSL are incomparable
and complementary, as the relation between probabilistic linear-time temporal
logic (PLTL) and PCTL in model checking discrete-time Markov chains [2, Sec-
tion 3.3].
The atomic propositions of CLL are explained on the space of probability
distributions over states of CTMCs. We apply the method of symbolic dynamics
to the probability distributions of CTMCs. To be specific, we symbolize the
probability value space [0,1] into a finite set of intervals I={Ik[0,1]}m
k=1.
A probability distribution µover its set of states S={s0, s2, . . . , sd1}is then
represented symbolically as a set of symbols
S(µ) = {⟨s, I S×I:µ(s) I}
where each symbol s, I⟩ asserts µ(s) I, i.e., the probability of state sin
distribution µfalls in interval I. For example, s0,[0.9,1]means the system is
in state s0with a probability in 0.9 to 1. The symbolization idea of distributions
has been considered in [2]: choosing a disjoint cover of [0,1]:
I={[0, p1),[p1, p2), ..., [pn,1]}.
Here, we remove this restriction and enrich the expressiveness of I. A crucial
fact about this symbolization is that the set S×Iis finite. Consequently,
the (probability distribution) execution path generated by an initial probability
distribution µinduces a sequence of symbols in S×Iover time. Therefore, the
dynamics of CTMCs can be studied in terms of a (real-time) language over the
alphabet S×I, which is the set of atomic propositions of CLL.
Different from non-probabilistic linear-time temporal logics linear-time
temporal logic (LTL) and MTL, CLL has two types of formulas: state formu-
las and path formulas. The state formulas are constructed using propositional
connectives. The path formulas are obtained by propositional connectives and a
temporal modal operator timed until UTfor a bounded time interval T, as in
4 J. Guan and N. Yu
MTL and CSL. The standard next-step temporal operator in LTL is meaningless
in continuous-time systems since the time domain (real numbers) is uncountable.
As a result, CLL can express the above mentioned probabilistic property “the
system is at state s0with high probability (0.9) within 5 time units after hav-
ing continuously remained at state s1with low probability (0.1)” in a path
formula:
φ=s1,[0,0.1]U[0,5]s0,[0.9,1].
In this single until formula, there is a time instant 0 t5 at which state s1
with low probability transits to state s0with high probability. Then we illustrate
this on the following timeline.
| {z }
s1,[0,0.1]
t50
s0,[0.9,1]
Furthermore, CLL allows multiphase timed until formulas. The semantics of the
formulas focuses on relative time intervals, i.e., time can be reset as in timed au-
tomata [5,6], while those of CSL [8] are for absolute time intervals. Subsequently,
CLL can express not only relatively but also absolutely temporal properties of
CTMCs.
We illustrate the significant difference between relatively temporal properties
and absolutely temporal properties of CTMCs. For instance, “before probability
distributions transition φhappening in 3 to 7 time units, the system always stays
at state s0with a high probability (0.9)” can be formalized in path formulae
φ=s0,[0.9,1]U[3,7](s1,[0,0.1]U[0,5] s0,[0.9,1]).
As we can see, there are two time instants, namely t1and t2, happening distribu-
tion transitions. Time is reset to 0 after the first distribution transition happens
and thus t2is relative to t1. More clearly, we depict this on the following timeline.
| {z }
s0,[0.9,1]| {z }
s1,[0,0.1]
(t2+t1)12
=3
z }| {
0s t17
s0,[0.9,1]
An absolute version is “probability distribution transition φhappens and the
system always stays at state s0with a high probability (0.9) in 3 to 7 time
units”
φ′′ =[3,7]s0,[0.9,1]⟩∧⟨s1,[0,0.1]U[0,5]s0,[0.9,1]).
We can get a clear timeline representation by simply adding [3,7]s0,[0.9,1]to
that of φ. Assume that t < 3,
| {z }
s1,[0,0.1]
t < 30
s0,[0.9,1]37
| {z }
s0,[0.9,1]
A Probabilistic Logic for Verifying Continuous-time Markov Chains 5
Time reset enriches the expressiveness of CLL but introduces more difficulties
to model checking CLL than CSL. We cross this by translating relative time to
the absolute one. As a result, we develop an algorithm to model check CTMCs
against CLL formulas. More precisely, we reduce the model-checking problem to
a reachability problem of absolute time intervals. The reachability problem corre-
sponds to the real root isolation problem of real polynomial-exponential functions
(PEFs) over the field of algebraic numbers, an extensively studied question in
recent symbolic and algebraic computation community (e.g. [1,20,28]). By de-
veloping a state-of-the-art real root isolation algorithm, we resolve the latter
problem under the assumption of the validity of Schanuel’s conjecture, a central
open question in transcendental number theory [27]. This conjecture has also
been the footstone of the correctness of many recent model-checking algorithms,
including the decidability of continuous-time Markov decision processes [30], the
synthesizing inductive invariants for continuous linear dynamical systems [4], the
termination analysis for probabilistic programs with delays [39], and reachability
analysis for dynamical systems [20].
In summary, the main contributions of this paper are as follows.
Introducing a probabilistic logic, namely continuous-time linear logic (CLL),
for reasoning about CTMCs;
Developing a state-of-the-art real root isolation algorithm for PEFs over the
field of algebraic numbers for checking atomic propositions of CLL;
Proving that model checking CTMCs against CLL formulas is decidable
subject to Schanuel’s conjecture.
Organization of this paper. In the next section, we give the mathematical
preliminaries used in this paper. In Section 3, we recall the view of CTMCs as
distribution transformers. After that, the symbolic dynamics of CTMCs are in-
troduced by symbolizing distributions over states of CTMCs in Section 4. In the
subsequent section, we present our continuous-time probabilistic temporal logic
CLL. In Section 6, we develop an algorithm to solve the CLL model checking
problem. A case study and related works are shown in Sections 7and 8, respec-
tively. We summarize our results and point out future research directions in the
final section.
2 Preliminaries
For the convenience of the readers, we review basic definitions and notations of
number theory, particularly Schanuel’s conjecture.
Throughout this paper, we write C,R,Qand Afor the fields of all complex,
real, rational and algebraic numbers, respectively. In addition, Zdenotes the set
of all integer numbers. For F {C,R,Q,Z,A}, we use F[t] and Fn×mto denote
the set of polynomials in twith coefficients in Fand n-by-mmatrices with every
entry in F, respectively. Furthermore, for F {R,Q,Z}, we use F+to denote
the set of positive elements (including 0) of F.
6 J. Guan and N. Yu
A bounded (time) interval Tis a subset of R+, which may be open, half-open
or closed with one of the following forms:
[t1, t2],[t1, t2),(t1, t2],(t1, t2),
where t1, t2R+and t2t1(t1=t2is only allowed in the case of [t1, t2]). Here,
t1and t2are called the left and right endpoints of T, respectively. Conveniently,
we use inf Tand sup Tto denote t1and t2, respectively. In this paper, we only
consider bounded intervals.
For reasoning about the temporal properties, we further define the addition
and subtraction of (time) intervals. The expression T+tor t+T, for tR+,
denotes the interval {t+t:t T }. Similarly, T tstands for the interval
{−t+t:t T } if tinf T. Furthermore, for two intervals T1and T2,
T1+T2={t(t+T2) : t T1}={t1+t2:t1 T1and t2 T2}.
Two intervals T1and T2are disjoint if their intersection is an empty set, i.e.,
T1 T2=. Let us see some concrete examples: 1 + (2,3) = (3,4), (2,3) 1 =
(1,2), (2,3) + [3,4] = (5,7) and (2,3),[3,4] are disjoint. It is obvious that all
calculations of time intervals in the above are easy to be computed.
An algebraic number is a complex number that is a root of a non-zero poly-
nomial in one variable with rational coefficients (or equivalent to integer coeffi-
cients, by eliminating denominators). An algebraic number αis represented by
(P, (a, b), ε) where Pis the minimal polynomial of α,a, b Qand a+bi is an
approximation of αsuch that |α(a+bi)|< ε and αis the only root of Pin the
open ball B(a+bi, ε). The minimal polynomial of αis the polynomial with the
smallest degree in Q[t] such that αis a root of the polynomial and the coefficient
of the highest-degree term is 1. Any root of f(t)A[t] is algebraic. Moreover,
given the representations of a, b A, the representations of a±b, a
band a·bcan
be computed in polynomial time, so does the equality checking [17].
Furthermore, a complex number is called transcendental if it is not an al-
gebraic number. In general, it is challenging to verify relationships between
transcendental numbers [33]. On the other hand, one can use the Lindemann-
Weierstrass theorem to compare some transcendental numbers. The transcen-
dence of eand πare direct corollaries of this theorem.
Theorem 1 (Lindemann-Weierstrass theorem). Let η1,··· , ηnbe pairwise
distinct algebraic complex numbers. Then Pkλkeηk= 0 for non-zero algebraic
numbers λ1,··· , λn.
The following concepts are introduced to study the general relation between
transcendental numbers.
Definition 1 (Algebraic independence). A set of complex numbers S=
{a1,··· , an}is algebraically independent over Qif the elements of Sdo not
satisfy any nontrivial (non-constant) polynomial equation with coefficients in Q.
By the above definition, for any transcendental number u,{u}is algebraically
independent over Q, while {a}for any algebraic number aAis not. Thus, a
A Probabilistic Logic for Verifying Continuous-time Markov Chains 7
set of complex numbers that is algebraically independent over Qmust consist of
transcendental numbers. {π, eπn}is also algebraically independent over Qfor
any positive integer n[31]. Checking the algebraic independence is challenging.
For example, it is still widely open whether {e, π }is algebraically independent
over Q.
Definition 2 (Extension field). Given two fields EF,Fis an extension
field of E, denoted by F/E, if the operations of Eare those of Frestricted to
E.
For example, under the usual notions of addition and multiplication, the field of
complex numbers is an extension field of real numbers.
Definition 3 (Transcendence degree). Let Lbe an extension field of Q,
the transcendence degree of Lover Qis defined as the largest cardinality of an
algebraically independent subset of Lover Q.
For instance, let Q(e)/Q={a+be |a, b Q}and Q(2)/Q={a+b2|a, b
Q}be two extension fields of Q. Then the transcendence degree of them are 1
and 0, respectively, by noting that eis a transcendental number and 2 is an
algebraic number.
Now, Schanuel’s conjecture is ready to be presented.
Conjecture 1 (Schanuel’s conjecture). Given any complex numbers z1,··· , zn
that are linearly independent over Q, the extension field Q(z1, ..., zn, ez1, ..., ezn)
has transcendence degree of at least nover Q.
Stephen Schanuel proposed this conjecture during a course given by Serge
Lang at Columbia in the 1960s [27]. Schanuel’s conjecture concerns the transcen-
dence degree of certain field extensions of the rational numbers. The conjecture,
if proven, would generalize the most well-known results in transcendental num-
ber theory significantly [29,37]. For example, the algebraical independence of
{e, π}would simply follow by setting z1= 1 and z2=πi, and using Euler’s
identity eπi + 1 = 0.
3 Continuous-time Markov Chains as Distributions
Transformers
We begin with the definition of continuous-time Markov chains (CTMCs). A
CTMC is a Markovian (memoryless) stochastic process that takes values on a
finite state set S(|S|=d < ) and evolves in continuous-time tR+. Formally,
Definition 4. A CTMC is a pair M= (S, Q), where S(|S|=d) is a finite
state set and QQd×dis a transition rate matrix.
A transition rate matrix Qis a matrix whose off-diagonal entries {Qi,j }i=jare
nonnegative rational numbers, representing the transition rate from state ito
8 J. Guan and N. Yu
state j, while the diagonal entries {Qj,j }are constrained to be Pj=iQi,j for
all 1 jd. Consequently, the column summations of Qare all zero.
The evolution of a CTMC can be regarded as a distribution transformer.
Given initial distribution µQd×1 D(S), the distribution at time tR+is:
µt=eQtµ,
where D(S) is denoted as the set of all probability distributions over S. We call
D(S) the probability distribution space of CTMCs. An execution path of CTMCs
is a continuous function indexed by initial distribution µ D(S):
σµ:R+ D(S), σµ(t) = eQt µ. (1)
Example 1. We recall the illustrating example of CTMC M= (S, Q) in [8,
Figure 1] as the running example in our work. In particular, Mis a 5-dimensional
CTMC with initial distribution µ, where S={s0, s1, s2, s3, s4}and
Q=
3 0 0 0 0
10000
2 0 700
00300
00400
µ=
0.1
0.2
0.3
0.4
0
.
4 Symbolic Dynamics of CTMCs
In this section, we introduce symbolic dynamics to characterize the properties
of the probability distribution space of CTMCs.
First, we fix a finite set of intervals I={Ik[0,1]}kK, where the end-
points of each Ikare rational numbers. With the states S={s0, s1,· · · , sd1},
we define the symbolization of distributions as a function:
S:D(S)2S×IS(µ) = {⟨s, I S×I:µ(s) I},(2)
where ×denotes the Cartesian product, and 2S×Iis the power set of S×
I.s, I⟩ S(µ) asserts that the probability of state s in distribution µis
in the interval I. The symbolization of distributions is a generalization of the
discretization of distributions with Ik∩Im=for all k=mwhich was studied in
[2]. This generalization increases the expressiveness of our continuous linear-time
logic introduced in the next section. Now, we can represent any given probability
distribution by finite symbols from S×I. For example, suppose
I={[0,0.1],(0.1,0.9),[0.9,1],[1,1],[0.4,0.4]},(3)
and then the initial distribution µin Example 1is symbolized as
S(µ) = {⟨s0,[0,0.1],s1,(0.1,0.9),s2,(0.1,0.9),
s3,(0.1,0.9),s3,[0.4,0.4],s4,[0,0.1]⟩}.(4)
A Probabilistic Logic for Verifying Continuous-time Markov Chains 9
As we can see from the above example, the symbolization of distributions on
states considers the exact probabilities (singleton intervals) of the states and the
range of their possibilities.
Next, we introduce the symbolization to CTMCs,
Definition 5. Asymbolized CTMC is a tuple SM = (S, Q, I), where M=
(S, Q)is a CTMC and Iis a finite set of intervals in [0,1].
As we can see, the set of intervals is picked depending on CTMCs. Then, we
extend this symbolization to the path σµ:
Sσµ:R+2S×I.(5)
Definition 6. Given a symbolized CTMC SM = (S, Q, I),Sσµis a symbolic
execution path of M= (S, Q).
Given a symbolized CTMC SM = (S, Q, I), the path σµof CTMC M= (S, Q)
over real numbers R+generated by probability distribution µinduces a symbolic
execution path Sσµover finite symbols S×I. Subsequently, the dynamics
of CTMCs can be studied in terms of a language over S×I. In other words,
we can study the temporal properties of CTMCs in the context of symbolized
CTMCs.
5 Continuous Linear-time Logic
In this section, we introduce continuous linear-time logic (CLL), a probabilistic
linear-time temporal logic, to specify the temporal properties of a symbolized
CTMC SM = (S, Q, I).
CLL has two types of formulas: state formulas and path formulas. The state
formulas are constructed using propositional connectives. The path formulas are
obtained by propositional connectives and a temporal modal operator timed until
UTfor a bounded time interval T, as in MTL and CSL. Furthermore, multiphase
timed until formulas Φ0UT1Φ1UT2Φ2. . . U TnΦnare allowed to enrich the expres-
siveness of CLL. More importantly, time reset is involved in these multiphase
formulas. Thus absolutely and relatively temporal properties of CTMCs can be
studied.
Definition 7. The state formulas of CLL are described according to the follow-
ing syntax:
Φ:= true |aAP | ¬Φ|Φ1Φ2
where AP denotes S×Ias the set of atomic propositions.
The path formulas of CLL are constructed by the following syntax:
φ:= true |Φ0UT1Φ1UT2Φ2. . . U TnΦn| ¬φ|φ1φ2
where nZ+is a positive integer, for all 0kn,Φkis a state formula,
and Tk’s are time intervals with the endpoints in Q+, i.e., each Tkis one of the
following forms:
(a, b),[a, b],(a, b],[a, b)a, b Q+.
10 J. Guan and N. Yu
The semantics of CLL state formulas is defined on the set D(S) of probability
distributions over Swith the symbolized function Sin Eq.(2) of Section 4.
(1) µ|=true for all probability distributions µ D(S);
(2) µ|=aiff aS(µ);
(3) µ|=¬Φiff it is not the case that µ|=Φ(or written µ|=Φ);
(4) µ|=Φ1Φ2iff µ|=Φ1and µ|=Φ2.
The semantics of CLL path formulas is defined on execution paths {σµ}µ∈D(S)
of CTMC M= (S, Q).
(1) σµ|=true for all probability distributions µ D(S);
(2) σµ|=Φ0UT1Φ1UT2Φ2. . . UTnΦniff there is a time instant t T1such that
σµt|=Φ1UT2Φ2. . . UTnΦn, and for any t T1[0, t), µt|=Φ0, where
σµt|=Φiff µt|=Φ, and µtis the distribution of the chain at time instant t,
i.e., µt=eQtµtR+;
(3) σµ|=¬φiff it is not the case that σµ|=φ(written σµ|=φ);
(4) σµ|=φ1φ2iff σµ|=φ1and σµ|=φ2.
Not surprisingly, other Boolean connectives are derived in the standard way,
i.e., false =¬true,Φ1Φ2=¬(¬Φ1 ¬Φ2) and Φ1Φ2=¬Φ1Φ2,and
the path formula φfollows the same way. Furthermore, we generalize temporal
operators (“eventually”) and (“always”) of discrete-time systems into their
timed variant Tand T, respectively, in the following:
TΦ=trueUTΦTΦ=¬T¬Φ.
For n= 1 in multiphase timed until formulas, the until operator UT1is a
timed variant of the until operator of LTL; the path formula Φ0UT1Φ1asserts
that Φ1is satisfied at some time instant in the interval T1and that at all pre-
ceding time instants in T1,Φ0holds. For example,
φ=s1,[0,0.1]U[0,5]s0,[0.9,1],
as mentioned in introduction section.
For general n, the CLL path formula Φ0UT1Φ1UT2Φ2. . . U TnΦnis explained
over the induction on n. We first mention that UTis right-associative, e.g.,
Φ0UT1Φ1UT2Φ2stands for Φ0UT1(Φ1UT2Φ2). This makes time reset, i.e., T1and
T2do not have to be disjoint, and the starting time point of T2is based on some
time instant in T1. Recall the multiphase timed until formula in introduction
section and this formula expresses a relative time property:
φ=s0,[0.9,1]U[3,7](s1,[0,0.1]U[0,5] s0,[0.9,1]),
which is different to the following CLL path formula representing an absolutely
temporal property of CTMCs:
φ′′ =[3,7]s0,[0.9,1]⟩∧⟨s1,[0,0.1]U[0,5]s0,[0.9,1]).
A Probabilistic Logic for Verifying Continuous-time Markov Chains 11
As an example, we clarify the semantics of CLL by comparing the above two
path formulas in general forms:
Φ0UT1Φ1UT2Φ2and Φ0UT1Φ1Φ1UT2Φ2.
(1) σµ|=Φ0UT1Φ1UT2Φ2asserts that there are time instants t1 T1, t2 T2
such that µt1+t2|=Φ2and for any t
1 T1[0, t1) and t
2 T2[0, t2),
µt
1|=Φ0and µt1+t
2|=Φ1, where µt=eQtµtR+.This is more clear in
the following timeline.
| {z }
Φ0| {z }
Φ1
=inf T2
z }| { Φ2
=inf T1
z }| {
time 0 t1sup T1(t1+t2)sup(T1+T2)
(2) σµ|=Φ0UT1Φ1Φ1UT2Φ2asserts that there are time instants t1 T1, t2
T2such that µt1|=Φ1and µt2|=Φ2, and for any t
1 T1[0, t1) and
t
2 T2[0, t2), µt
1|=Φ0and µt
2|=Φ1, where µt=eQtµtR+.
Before solving the model-checking problem of CTMCs against CLL formulas
in the next section, we shall first discuss what can be specified in our logic CLL.
Given a CTMC (S, Q), CLL path formula [0,1000]s, [1,1]expresses a live-
ness property that state sSis eventually reached with probability one before
time instant 1000. In terms of safety properties, formula [100,1000]s, [0,0]rep-
resents that state sSis never reached (reached with probability zero) between
time instants 100 and 1000. Furthermore, setting the intervals nontrivial (neither
[0,0] or [1,1]), liveness and safety properties can be asserted with probabilities,
such as [0,1000]s, [0.5,1]and [100,1000]s, [0,0.5]. For multiphase timed un-
til formula s, [0.7,1]U[2,3]s, [0.7,1]. . . U [2,3]s, [0.7,1],where the number of
U[2,3] is 100, asserts that the probability of state sis beyond 0.7 in every time
instant 2 to 3, and this happens at least 100 times.
Next, we can classify members of Ias representing “low” and “high” prob-
abilities. For example, if Icontains 3 intervals {[0,0.1],(0.1,0.9),[0.9,1]}, we
can declare the first interval as “low” and the last interval as “high”. In this
case [10,1000)(s0,[0,0.1] s1,[0.9,1]) says that, in time interval [10,1000),
whenever the probability of state s0is low, the probability of state s1will be
high.
6 CLL Model Checking
In this section, we provide an algorithm to model check CTMCs against CLL
formulas, i.e., the following CLL model-checking problem Problem 1is decid-
able.
Problem 1 (CLL Model-checking Problem). Given a symbolized CTMC SM =
(S, Q, I) with an initial distribution µand a CLL path formula φon AP =
S×I, the goal is to decide whether σµ|=φ, where σµ(t) = eQtµis an execution
path defined in Eq.(1).
12 J. Guan and N. Yu
In particular, we show that
Theorem 2. Under the condition that Schanuel’s conjecture holds, the CLL
model-checking problem in Problem 1is decidable.
In the following, we prove the above theorem from checking basic formulas
atomic propositions to the most complex one nontrivial multiphase timed
until formulas. For readability, we put the proofs of all results in Appendix A of
the extended version [21] of this paper.
We start with the simplest case of atomic proposition s, I. By the semantics
of CLL, µt|=s, I⟩ if and only if µt=eQtµ(s) I. To check this, we first observe
that the execution path eQtµof CTMCs is a system of polynomial exponential
functions (PEFs).
Definition 8. A function f:RRis a polynomial-exponential function
(PEF) if fhas the following form:
f(t) =
K
X
k=0
fk(t)eλkt(6)
where for all 0kK < ,fk(t)F1[t], fk(t)= 0, λkF2and F1,F2are
fields. Without loss of generality, we assume that λk’s are distinct.
Generally, for a PEF f(t) with the range in complex numbers C,g(t) =
f(t) + f(t) is a PEF with the range in real numbers R, where f(t) is the
complex conjugate of f(t). The factor tis omitted whenever convenient, i.e.,
f=f(t). tis called a root of a function fif f(t) = 0. PEFs often appear in
transcendental number theory as auxiliary functions in the proofs involving the
exponential function [10].
Lemma 1. Given a CTMC M= (S, Q)with S={s0, . . . , sd1},QQd×d,
and an initial distribution µQd×1, for any 0id1,eQtµ(si), the i-th
entry of eQtµ, can be expressed as a PEF f:R+[0,1] as in Eq.(6) with
F1=F2=A.
By the above lemma, for a given tin some bounded time interval T(to be specific
in the latter discussion), eQtµ(s) I is determined by the algebraic structure
of PEF g(t) = eQtµ(s) in T. That is all maximum intervals Tmax T such
that g(t) I for all t Tmax , where interval Tmax =is called maximum for
g(t) I if no sub-intervals TTmax such that the property holds, i.e., g(t) I
for all t T . Then eQt µ(s) I if and only if t Tmax for some maximum
interval Tmax. So, we aim to compute the set Tof all maximum intervals. By
the continuity of PEF g(t), this can be done by identifying a real root isolation
of the following PEF f(t) in T:f(t)=(g(t)inf I)(g(t)sup I).
A (real) root isolation of function f(t) in interval Tis a set of mutually
disjoint intervals, denoted by Iso(f)T={(aj, bj) T } for aj, bjQsuch that
for any j, there is one and only one root of f(t) in (aj, bj);
A Probabilistic Logic for Verifying Continuous-time Markov Chains 13
for any root tof f(t), t(aj, bj) for some j.
Furthermore, if fhas no any root in T, then Iso(f)T=.
Although there are infinite kinds of real root isolations of f(t) in T, the
number of isolation intervals equals to the number of distinct roots of f(t) in T.
Finding real root isolations of PEFs is a long-standing problem and can be
at least backtracked to Ritt’s paper [34] in 1929. Some following results were
obtained since the last century (e.g. [7,38]). This problem is essential in the
reachability analysis of dynamical systems, one active field of symbolic and al-
gebraic computation. In the case of F1=Qand F2=N+in [1], an algorithm
named ISOL was proposed to isolate all real roots of f(t). Later, this algorithm
has been extended to the case of F1=Qand F2=R[20]. A variant of the
problem has also been studied in [28]. The correctness of these algorithms is
based on Schanuel’s conjecture. Other works are using Schanuel’s conjecture to
do the root isolation of other functions, such as exp-log functions [35] and tame
elementary functions [36].
By Lemma 1, we pursue this problem in the context of CTMCs. The distinct
feature of solving real root isolations of PEFs in our paper is to deal with complex
numbers C, more specifically algebraic numbers A, i.e., F1=F2=A. At the
same time, to the best of our knowledge, all the previous works can only handle
the case over R. Here, we develop a state-of-the-art real root isolation algorithm
for PEFs over algebraic numbers. Thus from now on, we always assume that
PEFs are over A, i.e., F1=F2=Ain Eq.(6). In this case, it is worth noting
that whether a PEF has a root in a given interval, T R+is decidable subject
to Schanuel’s Conjecture if Tis bounded [16], which falls in the situation we
consider in this paper.
Theorem 3 ([16]). Under the condition that Schanuel’s conjecture holds, there
is an algorithm to check whether a PEF f(t)has a root in interval T, i.e.,
whether Iso(f)T=.
In this paper, we extend the above checking Iso(f)T=to computing
Iso(f)Tof PEF f(t).
Theorem 4. Under the condition that Schanuel’s conjecture holds, there is an
algorithm to find real root isolation Iso(f)Tfor any PEF f(t)and interval T.
Furthermore, the number of real roots is finite, i.e., |Iso(f)T|<.
We can compute the set Tof all maximum intervals with the above theorem
to check atomic propositions. Furthermore, we can compare the values of any
real roots of PEFs, which is important in model checking general multiphase
timed until formulas at the end of this section.
Lemma 2. Let f1(t)and f2(t)be two PEFs with the domains in T1and T2,
and t1 T1and t2 T2are roots of them, respectively. Under the condition that
Schanuel’s conjecture holds, there is an efficient way to check whether or not
t1t2< g for any given rational number gQ.
14 J. Guan and N. Yu
For model checking general state formula Φ, we can also use real root isolation
of some PEF to obtain the set of all maximum intervals Tmax such that µt|=Φ
for all t Tmax. The reason is that Φadmits conjunctive normal form consisting
of atomic propositions. See the proof of the following lemma in Appendix A of
the extended version [21] of this paper for the details.
Lemma 3. Under the condition that Schanuel’s conjecture holds, given a time
interval T, the set Tof all maximum intervals in Tsatisfying µt|=Φcan be
computed, where Φis a state formula of CLL. Furthermore, the number of all
intervals in Tis finite; the left and right endpoints of each interval in Tare
roots of PEFs.
At last, we characterize the multiphase timed until formulas by the reacha-
bility analysis of time intervals (instants).
Lemma 4. σµ|=Φ0UT1Φ1UT2Φ2···UTnΦnif and only if there exist time in-
tervals {IkR+}n
k=0 with I0= [0,0] such that
The satisfaction of intervals: for all 1kn,µt|=Φk1for all t Ik,
and µt|=Φn, where t= sup Inand µt=eQtµtR+;
The order of intervals: for all 1kn,Ik Ik1+Tkand inf Ik=
sup Ik1+ inf Tk.
By the above lemma, the problem of checking multiphase timed until formulas
is reduced to verify the existence of a sequence of time intervals.
Now we can show the proof of Theorem 2.
Proof. Recall that the nontrivial step is to model check multiphase timed until
formula Φ0UT1Φ1UT2Φ2·· · UTnΦn,where {Tj}n
j=1 is a set of bounded rational
intervals in R+, and for 0 kn+ 1, Φkis a state formula.
By Lemma 4, for model checking the above formula, we only need to check
the existence of time intervals {Ik}n
k=0 illustrated in the lemma. The following
procedure can construct such a set of intervals if it exists:
(1) Let I0={I0= [0,0]};
(2) For each 1 kn, obtaining the set Ikin [0,Pk
j=1 sup Tj] of all
maximum intervals such that µt|=Φk1for all t I of I I, where
µt=eQtµ; this can be done by Lemma 3. Noting that Ikcan be the empty
set, i.e., Ik=;
(3) Let kfrom 1 to n. First, updating Ik:
Ik={I (I+Tk) : I Ikand IIk1}.(7)
The above updates can be finished by Lemma 2. If Ik=, then the formula
is not satisfied;
(4) Updating In: for each I In, we replace Iwith [sε, s) for some
constant ε > 0 if there is an s I with sε I such that µs|=Φnwhere
µs=eQsµ; Otherwise, remove this element from In. Again, this can be
done by Lemma 3. If In=, then the formula is not satisfied;
A Probabilistic Logic for Verifying Continuous-time Markov Chains 15
(5) Finally, let kfrom n1 to 1, updating Ik:
Ik={[sinf Tk, s inf Tk]:[sε, s)Ik+1]}.
Thus after the above procedure, we have non-empty sets {Ik}n
k=0 with the
following properties.
for each 1 kn,µt|=Φk1for all t Ikand IkIk, and µt|=Φn,
where t= sup In;
for each 1 kn,I Ik, there exists at least one IIk1such that
I sup I+Tkand inf I= sup I+ inf Tk.
Therefore, we can get a set of intervals {Ik}n
k=0 satisfying the two conditions
in Lemma 4if it exists. On the other hand, it is easy to check that all such
{Ik}n
k=0 must be in {Ik}n
k=0, i.e., for each k,Ik I for some I Ik. This
ensures the correctness of the above procedure.
By the above constructive analysis, we give an algorithm for model checking
CTMCs against CLL formulas. Focusing on the decidability problem, we do
not provide the pseudocode of the algorithm. Alternatively, we implement a
numerical experiment to illustrate the checking procedure in the next section.
7 Numerical Implementation
In this section, we implement a case study of checking CTMCs against CLL
formulas. Here, we consider a symbolized CTMC SM = (S, Q, I), where M=
(S, Q) is the CTMC in Example 1and finite set Iis the one considered in
Eq.(3). We check the properties of Mgiven by the following two CLL path
formulas mentioned in the introduction for different initial distributions.
φ=s1,[0,0.1]U[0,5]s0,[0.9,1].
φ=s0,[0.9,1]U[3,7]s1,[0,0.1]U[0,5] s0,[0.9,1].
By Jordan decomposition, we have Q=SJS1where
S=
06000
0 2 0 0 1
73000
3 3 0 1 0
4 4 1 0 0
J=
7 0 0 0 0
03000
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
S1=
1
14 01
70 0
1
60 0 0 0
8
21 04
70 1
2
703
71 0
1
31 0 0 0
.
Then, we consider an initial distribution µas the same as the one in Example 1.
Then we have that the value of eQtµis as follows:
e3t0 0 0 0
1
3(e3t1) 1 0 0 0
1
2(e3te7t) 0 e7t0 0
3
14 e7t1
2e3t+2
703
7e7t+3
71 0
2
7e7t2
3e3t+8
21 04
7e7t+4
70 1
0.1
0.2
0.3
0.4
0
=
1
10 e3t
1
30 e3t+7
30
1
20 e3t+1
4e7t
1
20 e3t3
28 e7t+39
70
1
15 e3t1
7e7t+22
105
.
16 J. Guan and N. Yu
As we only consider states s0and s1in formulas φand φ, we focus on the
following PEFs: f0(t) = 1
10 e3tand f1(t) = 1
30 e3t+7
30 .
Next, we initialize the model checking procedures introduced in the proof of
Theorem 2. First, we compute the set Tof all maximum intervals T [0,5]
such that eQtµ|=s0,[0.9,1]for t T , i.e., f0(t)[0.9,1] for t T . We obtain
T=by the real root isolation algorithm mentioned in Theorem 4, and this
indicates that σµ|=φwhere σµ(t) = eQtµis the path induced by µand defined
in Eq.(1).
To check whether σµ|=φ, we compute the set Tof all maximum intervals
T [0,12] such that eQtµ|=s0,[0.9,1]for t T , i.e., f0(t)[0.9,1] for t T .
Again, we obtain T=by the real root isolation algorithm in Theorem 4.
Therefore, σµ|=φ.
In the following, we consider a different initial distribution µ1as follows:
eQtµ1=eQt
0.9
0
0.1
0
0
=
9
10 e3t
3
10 (e3t1)
9
20 e3t7
20 e7t
9
20 e3t+3
20 e7t+3
10
3
5e3t+1
5e7t+2
5
.
The key PEFs are: g0(t) = 9
10 e3tand g1(t) = 3
10 (e3t1).
Again, we initialize the model checking procedures introduced in the proof of
Theorem 2. We first compute the set Tof all maximum intervals T [0,5] such
that eQtµ1|=s1,[0,0.1]for t T , i.e., g1(t)[0,0.1] for t T . This can be
done by finding a real root isolation of the following PEF: g0
1(t) = 3
10 (e3t
1) 1
10 .
By implementing the real root isolation algorithm in Theorem 4, we have
Iso(g0
1)[0,5] ={(0.13,0.14)}and then T={[0, t]}for t(0.13,0.14).
Following the same way, we compute Tfor eQtµ1|=s0,[0.9,1]. Then we
complete the model checking procedures in the proof of Theorem 2, and we
conclude: σµ1|=φ. By repeating these, the result of the second formula φis
σµ1|=φ.
8 Related Works
Agrawal et al. [2] introduced probabilistic linear-time temporal logic (PLTL) to
reason about discrete-time Markov chains in the context of distribution trans-
formers as we did for CTMCs in this paper. Interestingly, the Skolem Prob-
lem can be reduced to the model checking problem for the logic PLTL [3]. The
Skolem Problem asks whether a given linear recurrence sequence has a zero term
and plays a vital role in the reachability analysis of linear dynamical systems.
Unfortunately, the decidability of the problem remains open [32]. Recently, the
Continuous Skolem Problem has been proposed with good behavior (the problem
is decidable) and forms a fundamental decision problem concerning reachability
A Probabilistic Logic for Verifying Continuous-time Markov Chains 17
in continuous-time linear dynamical systems [16]. Not surprisingly, the Continu-
ous Skolem Problem can be reduced to model-checking CLL. The primary step
of verifying CLL formulas is to find a real root isolation of a PEF in a given
interval. Chonev, Ouaknine and Worrell reformulated the Continuous Skolem
Problem in terms of whether a PEF has a root in a given interval, which is
decidable subject to Schanuel’s conjecture [16]. An algorithm for finding root
isolation can also answer the problem of checking the existence of the roots of a
PEF. However, the reverse does not work in general. Therefore, the decidability
of the Continuous Skolem Problem cannot be applied to establish that of our
CLL model checking.
Remark 1. By adopting the method in this paper, we established the decidability
of model checking quantum CTMCs against signal temporal logic [40]. Again,
we need Schanuel’s conjecture to guarantee the correctness. A Lindblad’s master
equation governs a quantum CTMC and a more general real-time probabilistic
Markov model than a CTMC, i.e., a CTMC is an instance of quantum CTMCs.
We converted the evolution of Lindblad’s master equation into a distribution
transformer that preserves the laws of quantum mechanics. We reduced the
model-checking problem of quantum CTMCs to the real root isolation problem,
which we considered in this paper, and thus our method could be applied to it.
9 Conclusion
This paper revisited the study of temporal properties of finite-state CTMCs by
symbolizing the probability value space [0,1] into a finite set of intervals. To
specify relatively and absolutely temporal properties, we propose a probabilistic
logic for CTMCs, namely continuous linear-time logic (CLL). We have considered
the model checking problem in this setting. Our main result is that a state-of-the-
art real root isolation algorithm over the field of algebraic numbers was proposed
to establish the decidability of the model checking problem under the condition
that Schanuel’s conjecture holds.
This paper aims to show decidability in as simple a fashion as possible with-
out paying much attention to complexity issues. Faster algorithms on our current
constructions would significantly improve from a practical standpoint.
Acknowledgments
We want to thank Professor Joost-Pieter Katoen for his invaluable feedback and
for pointing out the references [14,15,30]. This work is supported by the National
Key R&D Program of China (Grant No: 2018YFA0306701), the National Natural
Science Foundation of China (Grant No: 61832015), ARC Discovery Program
(#DP210102449) and ARC DECRA (#DE180100156).
18 J. Guan and N. Yu
References
1. Achatz, M., McCallum, S., Weispfenning, V.: Deciding polynomial-exponential
problems. In: Proceedings of the Twenty-first International Symposium on Sym-
bolic and Algebraic Computation. pp. 215–222. ACM (2008)
2. Agrawal, M., Akshay, S., Genest, B., Thiagarajan, P.: Approximate verification of
the symbolic dynamics of Markov chains. Journal of the ACM (JACM) 62(1), 2
(2015)
3. Akshay, S., Antonopoulos, T., Ouaknine, J., Worrell, J.: Reachability problems for
Markov chains. Information Processing Letters 115(2), 155–158 (2015)
4. Almagor, S., Kelmendi, E., Ouaknine, J., Worrell, J.: Invariants for continuous
linear dynamical systems. arXiv preprint arXiv:2004.11661 (2020)
5. Alur, R., Dill, D.L.: A theory of timed automata. Theoretical Computer Science
126, 183–235 (1994)
6. Alur, R., Henzinger, T.A., Vardi, M.Y.: Parametric real-time reasoning. In: Pro-
ceedings of the Twenty-fifth Annual ACM Symposium on Theory of Computing.
pp. 592–601 (1993)
7. Avellar, C.E., Hale, J.K.: On the zeros of exponential polynomials. Journal of
Mathematical Analysis and Applications 73(2), 434–452 (1980)
8. Aziz, A., Sanwal, K., Singhal, V., Brayton, R.: Model-checking continuous-time
Markov chains. ACM Transactions on Computational Logic 1(1), 162–170 (2000)
9. Baier, C., Haverkort, B., Hermanns, H., Katoen, J.P.: Model-checking algorithms
for continuous-time Markov chains. IEEE Transactions on Software Engineering
29(6), 524–541 (2003)
10. Baker, A.: Transcendental number theory. Cambridge university press (1990)
11. Barbot, B., Chen, T., Han, T., Katoen, J.P., Mereacre, A.: Efficient CTMC model
checking of linear real-time objectives. In: International Conference on Tools and
Algorithms for the Construction and Analysis of Systems. pp. 128–142. Springer
(2011)
12. Chen, T., Diciolla, M., Kwiatkowska, M., Mereacre, A.: Time-bounded verification
of CTMCs against real-time specifications. In: International Conference on Formal
Modeling and Analysis of Timed Systems. pp. 26–42. Springer (2011)
13. Chen, T., Han, T., Katoen, J.P., Mereacre, A.: Quantitative model checking of
continuous-time Markov chains against timed automata specifications. In: 2009
24th Annual IEEE Symposium on Logic In Computer Science. pp. 309–318. IEEE
(2009)
14. Chen, T., Han, T., Katoen, J.P., Mereacre, A.: Model checking of continuous-
time Markov chains against timed automata specifications. Logical Methods in
Computer Science 7(1) (Mar 2011)
15. Chen, T., Han, T., Katoen, J.P., Mereacre, A.: Observing continuous-time MDPs
by 1-clock timed automata. In: International Workshop on Reachability Problems.
pp. 2–25. Springer (2011)
16. Chonev, V., Ouaknine, J., Worrell, J.: On the skolem problem for continuous lin-
ear dynamical systems. In: Chatzigiannakis, I., Mitzenmacher, M., Rabani, Y.,
Sangiorgi, D. (eds.) 43rd International Colloquium on Automata, Languages, and
Programming (ICALP 2016). Leibniz International Proceedings in Informatics
(LIPIcs), vol. 55, pp. 100:1–100:13. Schloss Dagstuhl–Leibniz-Zentrum fuer Infor-
matik, Dagstuhl, Germany (2016)
17. Cohen, H.: A course in computational algebraic number theory, vol. 138. Springer
Science & Business Media (2013)
A Probabilistic Logic for Verifying Continuous-time Markov Chains 19
18. Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A STORM is coming: A mod-
ern probabilistic model checker. In: International Conference on Computer Aided
Verification. pp. 592–600. Springer (2017)
19. Feng, Y., Katoen, J.P., Li, H., Xia, B., Zhan, N.: Monitoring CTMCs by multi-clock
timed automata. In: International Conference on Computer Aided Verification. pp.
507–526. Springer (2018)
20. Gan, T., Chen, M., Li, Y., Xia, B., Zhan, N.: Reachability analysis for solvable
dynamical systems. IEEE Transactions on Automatic Control 63(7), 2003–2018
(2017)
21. Guan, J., Yu, N.: A probabilistic logic for verifying continuous-time markov chains.
arXiv preprint arXiv:2004.08059 (2020)
22. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal
Aspects of Computing 6(5), 512–535 (1994)
23. Katoen, J.P.: The probabilistic model checking landscape. In: Proceedings of the
31st Annual ACM/IEEE Symposium on Logic in Computer Science. pp. 31–45.
ACM (2016)
24. Katoen, J.P., Zapreev, I.S., Hahn, E.M., Hermanns, H., Jansen, D.N.: The ins and
outs of the probabilistic model checker MRMC. Performance Evaluation 68(2),
90–104 (2011)
25. Kolmogoroff, A.: ¨
Uber die analytischen methoden in der wahrscheinlichkeitsrech-
nung. Mathematische Annalen 104(1), 415–458 (1931)
26. Kwiatkowska, M., Norman, G., Parker, D.: PRISM: Probabilistic symbolic model
checker. In: International Conference on Modelling Techniques and Tools for Com-
puter Performance Evaluation. pp. 200–204. Springer (2002)
27. Lang, S.: Introduction to transcendental numbers. Addison-Wesley Pub. Co. (1966)
28. Li, J.C., Huang, C.C., Xu, M., Li, Z.B.: Positive root isolation for poly-powers. In:
Proceedings of the ACM on International Symposium on Symbolic and Algebraic
Computation. pp. 325–332. ACM (2016)
29. Macintyre, A., Wilkie, A.J.: On the decidability of the real exponential field (1996)
30. Majumdar, R., Salamati, M., Soudjani, S.: On decidability of time-bounded reacha-
bility in CTMDPs. In: Czumaj, A., Dawar, A., Merelli, E. (eds.) 47th International
Colloquium on Automata, Languages, and Programming (ICALP 2020). Leib-
niz International Proceedings in Informatics (LIPIcs), vol. 168, pp. 133:1–133:19.
Schloss Dagstuhl–Leibniz-Zentrum ur Informatik, Dagstuhl, Germany (2020)
31. Nesterenko, Y.: Modular functions and transcendence problems. Comptes rendus
de l’Acad´emie des sciences. erie 1, Math´ematique 322(10), 909–914 (1996)
32. Ouaknine, J., Worrell, J.: Decision problems for linear recurrence sequences. In:
International Workshop on Reachability Problems. pp. 21–28. Springer (2012)
33. Richardson, D.: How to recognize zero. Journal of Symbolic Computation 24(6),
627–645 (1997)
34. Ritt, J.F.: On the zeros of exponential polynomials. Transactions of the American
Mathematical Society 31(4), 680–686 (1929)
35. Strzebonski, A.: Real root isolation for exp-log functions. In: Proceedings of the
Twenty-first International Symposium on Symbolic and Algebraic Computation.
pp. 303–314 (2008)
36. Strzebonski, A.: Real root isolation for tame elementary functions. In: Proceedings
of the 2009 International Symposium on Symbolic and Algebraic Computation.
pp. 341–350 (2009)
37. Terzo, G.: Some consequences of Schanuel’s conjecture in exponential rings. Com-
munications in Algebra®36(3), 1171–1189 (2008)
20 J. Guan and N. Yu
38. Tijdeman, R.: On the number of zeros of general exponential polynomials. In:
Indagationes Mathematicae (Proceedings). vol. 74, pp. 1–7. North-Holland (1971)
39. Xu, M., Deng, Y.: Time-bounded termination analysis for probabilistic programs
with delays. Information and Computation 275, 104634 (2020)
40. Xu, M., Mei, J., Guan, J., Yu, N.: Model checking quantum continuous-time
Markov chains. In: Haddad, S., Varacca, D. (eds.) 32nd International Conference
on Concurrency Theory (CONCUR 2021). Leibniz International Proceedings in In-
formatics (LIPIcs), vol. 203, pp. 13:1–13:17. Schloss Dagstuhl Leibniz-Zentrum
ur Informatik, Dagstuhl, Germany (2021)
41. Zhang, L., Jansen, D.N., Nielson, F., Hermanns, H.: Automata-based CSL model
checking. In: International Colloquium on Automata, Languages, and Program-
ming. pp. 271–282. Springer (2011)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
A Probabilistic Logic for Verifying Continuous-time Markov Chains 21
Under-Approximating
Expected Total Rewards in POMDPs?
Alexander Bork1( ), Joost-Pieter Katoen1, and Tim Quatmann1
RWTH Aachen University, Aachen, Germany
alexander.bork@cs.rwth-aachen.de
Abstract
We consider the problem: is the optimal expected total re-
ward to reach a goal state in a partially observable Markov decision
process (POMDP) below a given threshold? We tackle this—generally
undecidable—problem by computing under-approximations on these total
expected rewards. This is done by abstracting finite unfoldings of the
infinite belief MDP of the POMDP. The key issue is to find a suitable
under-approximation of the value function. We provide two techniques: a
simple (cut-off) technique that uses a good policy on the POMDP, and
a more advanced technique (belief clipping) that uses minimal shifts of
probabilities between beliefs. We use mixed-integer linear programming
(MILP) to find such minimal probability shifts and experimentally show
that our techniques scale quite well while providing tight lower bounds
on the expected total reward.
1 Introduction
The relevance of POMDPs. Partially observable Markov decision processes (POM-
DPs) originated in operations research and nowadays are a pivotal model for
planning in AI [
40
]. They inherit all features of classical MDPs: each state has a
set of discrete probability distributions over the states and rewards are earned
when taking transitions. However, states are not fully observable. Intuitively,
certain aspects of the states can be identified, such as a state’s colour, but states
themselves cannot be observed. This partial observability reflects, for example, a
robot’s view of its environment while only having the limited perspective of its
sensors at its disposal. The main goal is to obtain a policy—a plan how to resolve
the non-determinism in the model—for a given objective. The key problem here
is that POMDP policies must base their decisions only on the observable aspects
(e.g. colours) of states. This stands in contrast to policies for MDPs which can
make decisions dependent on the entire history of full state information.
Analysing POMDPs. Typical POMDP planning problems consider either finite-
horizon objectives or infinite-horizon objectives under discounting. Finite-horizon
objectives focus on reaching a certain goal state (such as “the robot has collected
all items” ) within a given number of steps. For infinite horizons, no step bound
?This work is funded by the DFG RTG 2236 “UnRAVeL”.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 22–40, 2022.
https://doi.org/10.1007/978-3-030-99527-0_2
is provided and typically rewards along a run are weighted by a discounting
factor that indicates how much immediate rewards are favoured over more distant
ones. Existing techniques to treat these objectives include variations of value
iteration [
46
,
36
,
20
,
18
,
52
,
53
] and policy trees [
29
]. Point-based techniques [
38
,
42
]
approximate a POMDP’s value function using a finite subset of beliefs which is
iteratively updated. Algorithms include PBVI [
38
], Perseus [
48
], SARSOP [
30
]
and HSVI [
45
]. Point-based methods can treat large POMDPs for both finite-
and discounted infinite-horizon objectives [42].
Problem statement. In this paper we consider the problem: is the maximal expected
total reward to reach a given goal state in a POMDP below a given threshold?
We thus consider an infinite-horizon objective without discounting—also called
an indefinite-horizon objective. A specific instance of the considered problem is
the reachability probability to eventually reach a given goal state in a POMDP.
This problem is undecidable [
33
,
34
] in general. Intuitively, this is due to the fact
that POMDP policies need to consider the entire (infinite) observation history
to make optimal decisions. For a POMDP, this notion is captured by an infinite,
fully observable MDP, its belief MDP. This MDP is obtained from observation
sequences inducing probabilities of being in certain states of the POMDP.
Previously proposed methods to solve the problem are e.g. to use approx-
imate value iteration [
22
], optimisation and search techniques [
1
,
12
], dynamic
programming [
6
], Monte Carlo simulation [
43
], game-based abstraction [
51
], and
machine learning [
13
,
14
,
19
]. Other approaches restrict the memory size of the
policies [
35
]. The synthesis of (possibly randomised) finite-memory policies is
ETR-complete
1
[
28
]. Techniques to obtain finite-memory policies use e.g. para-
meter synthesis [28] or satisfiability checking and SMT solving [15,50].
Our approach. We tackle the aforementioned problem by computing under-
approximations on maximal total expected rewards. This is done by considering
finite unfoldings of the infinite belief MDP of the POMDP, and then applying
abstraction. The key issue here is to find a suitable under-approximation of
the POMDP’s value function. We provide two techniques: a simple (cut-off)
technique that uses a good policy on the POMDP, and a more advanced tech-
nique (belief clipping) that uses minimal shifts of probabilities between beliefs
and can be applied on top of the simple approach. We use mixed-integer linear
programming (MILP) to find such minimal probability shifts. Cut-off techniques
for indefinite-horizon objectives have been used on computation trees—rather
than on the belief MDP as used here—in Goal-HSVI [
24
]. Belief clipping amends
the probabilities in a belief to be in a state of the POMDP yielding discretised
values, i.e. an abstraction of the probability range [0
,
1] is applied. Such grid-based
approximations are inspired by Lovejoy’s grid-based belief MDP discretisation
method [
32
]. They have also been used in [
7
] in the context of dynamic pro-
gramming for POMDPs, and to over-approximate the value function in model
checking of POMDPs [
8
]. In fact, this paper on determining lower bounds for
1
A decision problem is ETR-complete if it can be reduced to a polynomial-length
sentence in the Existential Theory of the Reals (for which the satisfiability problem is
decidable) in polynomial time, and there is such a reduction in the reverse direction.
Under-Approximating Expected Total Rewards in POMDPs 23
indefinite-horizon objectives can be seen as the dual counterpart of [
8
]. Our key
challenge—compared to the approach of [
8
]—is that the value at a certain belief
cannot easily be under-approximated with a convex combination of values of
nearby beliefs. On the other hand, an under-approximation can benefit from a
“good” guess of some initial POMDP policy. In the context of [
8
], such a guessed
policy is of limited use for over-approximating values in the POMDP induced
by an optimal policy. Although our approach is applicable to all thresholds, the
focus of our work is on determining under-approximations for quantitative object-
ives. Dedicated verification techniques for the qualitative setting—almost-sure
reachability—are presented in [17,16,27].
Experimental results. We have implemented our cut-off and belief clipping ap-
proaches on top of the probabilistic model checker Storm [
23
] and applied it to a
range of various benchmarks. We provide a comparison with the model checking
approach in [
37
], and determine the tightness of our under-approximations by
comparing them to over-approximations obtained using the algorithm from [
8
].
Our main findings from the experimental validation are:
Cut-offs often generate tight bounds while being computationally inexpensive.
The clipping approach may further improve the accuracy of the approximation.
Our implementation can deal with POMDPs with tens of thousands of states.
Mostly, the obtained under-approximations are less than 10% off.
2 Preliminaries and Problem Statement
Let
Dist(A):=µ:A[0,1] |PaAµ(a)=1
denote the set of probability
distributions over a finite set
A
. The set
supp
(
µ
)
:={aA|µ(a)>0}
is the
support of
µDist(A)
. Let
R:=R {∞,−∞}
. We use Iverson bracket
notation, where [
x
] = 1 if the Boolean expression
x
is true and [
x
] = 0 otherwise.
2.1 Partially Observable MDPs
Definition 1 (MDP).
AMarkov decision process (MDP) is a tuple
M
=
hS, Act,P, sinit i
with a (finite or infinite) set of states
S
, a finite set of actions
Act
, a transition function
P:S×Act ×S
[0
,
1] with
Ps0SP
(
s, α, s0
)
{0,1}
for all sSand αAct, and an initial state sinit .
We fix an MDP
M:=hS, Act,P, sinit i
. For
sS
and
αAct
, let
postM
(
s, α
)
:=
{s0S|P
(
s, α, s0
)
>
0
}
denote the set of
α
-successors of
s
in
M
. The set of
enabled actions in sSis given by Act(s):={αAct |postM(s, α)6=∅}.
Definition 2 (POMDP).
Apartially observable MDP (POMDP) is a tuple
M
=
hM, Z, O i
, where
M
is the underlying MDP with
|S| N
, i.e.
S
is finite,
Z
is a finite set of observations, and
O:SZ
is an observation function such
that O(s) = O(s0) =Act(s) = Act(s0)for all s, s0S.
We fix a POMDP
M:=hM, Z, O i
with underlying MDP
M
. We lift the notion
of enabled actions to observations
zZ
by setting
Act
(
z
)
:=Act
(
s
)for some
24 A. Bork, J.-P. Katoen, T. Quatmann
sS
with
O
(
s
) =
z
which is valid since states with the same observations are
required to have the same enabled actions. The notions defined for MDPs below
also straightforwardly apply to POMDPs.
Remark 1.
More general observation functions of the form
O
:
S×Act Dist(Z)
can be encoded in this formalism by using a polynomially larger state space [
16
].
An infinite path through an MDP (and a POMDP) is a sequence
˜π
=
s0α1s1α2. . .
such that
αi+1 Act
(
si
)and
si+1 postM
(
si, αi+1
)for all
iN
. A finite path is
a finite prefix
ˆπ
=
s0α1. . . αnsn
of an infinite path
˜π
. For finite
ˆπ
let
last
(
ˆπ
)
:=sn
and
|ˆπ|:=n
. For infinite
˜π
set
|˜π|:=
and let
˜π
[
i
]denote the finite prefix of
length
iN
. We denote the set of finite and infinite paths in
M
by
PathsM
fin
and
PathsM
inf
, respectively. Let
PathsM:=PathsM
fin PathsM
inf
. Paths are lifted to
the observation level by observation traces. The observation trace of a (finite or
infinite) path
π
=
s0α1s1α2. . . PathsM
is
O
(
π
)
:=O
(
s0
)
α1O
(
s1
)
α2. . .
. Two
paths π, π0Paths Mare observation-equivalent if O(π) = O(π0).
Policies resolve the non-determinism present in MDPs (and POMDPs). Given
a finite path ˆπ, a policy determines the action to take at last (ˆπ).
Definition 3 (Policy).
Apolicy for
M
is a function
σ
:
PathsM
fin Dist(Act)
such that for each path ˆπPathsM
fin,supp (σ(ˆπ)) Act(last (ˆπ)).
A policy
σ
is deterministic if
|supp
(
σ
(
ˆπ
))
|
= 1 for all
ˆπPathsM
fin
. Otherwise
it is randomised.
σ
is memoryless if for all
ˆπ, ˆπ0PathsM
fin
we have
last
(
ˆπ
) =
last
(
ˆπ0
) =
σ
(
ˆπ
) =
σ
(
ˆπ0
).
σ
is observation-based if for all
ˆπ, ˆπ0PathsM
fin
it
holds that
O
(
ˆπ
) =
O
(
ˆπ0
) =
σ
(
ˆπ
) =
σ
(
ˆπ0
). We denote the set of policies for
M
by
ΣM
and the set of observation-based policies for
M
by
ΣM
obs
. A finite-memory
policy (fm-policy) can be represented by a finite automaton where the current
memory state and the state of the MDP determine the actions to take [4].
The probability measure
µσ,s
M
for paths in
M
under policy
σ
and initial state
sis the probability measure of the Markov chain induced by M,σ, and s[4].
We use reward structures to model quantities like time, or energy consumption.
Definition 4 (Reward Structure). Areward structure for Mis a function
R:S×Act ×SR
such that either for all
s, s0S
,
αAct
,
R
(
s, α, s0
)
0
or for all
s, s0S
,
αAct
,
R
(
s, α, s0
)
0holds. In the former case, we call
R
positive, otherwise negative.
We fix a reward structure
R
for
M
. The total reward along a path
π
is defined
as
rewM,R
(
π
)
:=P|π|
i=1 R
(
si1, αi, si
). The total reward is always well-defined—
even if
π
is infinite—since all rewards are assumed to be either non-negative or
non-positive. For an infinite path
˜π
we define the total reward until reaching a
set of goal states GSby
rewM,R,G π):=
rewM,R(ˆπ)if iN: ˆπ= ˜π[i]lastπ)G
j < i :last π[j]) /G,
rewM,R(˜π)otherwise.
Under-Approximating Expected Total Rewards in POMDPs 25
Intuitively,
rewM,R,G
(
˜π
)accumulates reward along
˜π
until the first visit of a goal
state
sG
. If no goal state is reached, reward is accumulated along the infinite
path. The expected total reward until reaching Gfor policy σand state sis
ERσ
M,R(s|=G):=Z
˜πPaths M
inf
rewM,R,G π)·µσ,s
M(d˜π).
Observation-based policies capture the notion that a decision procedure for a
POMDP only accesses the observations and their history and not the entire state
of the system. We are interested in reasoning about minimal and maximal values
over all observation-based policies. For our explanations we focus on maximising
(non-negative or non-positive) expected rewards. Minimisation can be achieved
by negating all rewards.
Definition 5 (Maximal Expected Total Reward).
The maximal expected
total reward until reaching Gfrom sin POMDP Mis
ERmax
M,R(s|=G):= sup
σΣM
obs
ERσ
M,R(s|=G).
We define ERmax
M,R(G):=ERmax
M,R(sinit |=G).
The central problem of our work, the indefinite-horizon total reward problem,
asks the question whether the maximal expected total reward until reaching a
goal exceeds a given threshold.
Problem 1.
Given a POMDP
M
, reward structure
R
, set of goal states
GS, and threshold λR, decide whether ERmax
M,R(G)λ.
Example 1.
Fig. 1shows a POMDP
M
with three states and two observations:
O
(
s0
) =
O
(
s1
) = and
O
(
s2
) = . A reward of 1 is collected when transitioning
from s1to s2via the β-action. All other rewards are zero. s0s1
s2
α1
/2
1
/2
β
1
β
R: 1
1
α
1
α
1
Figure 1.
POMDP
M
The policy that always selects
α
at
s0
and
β
at
s1
maximizes the expected total reward to reach
G
=
{s2}
but is not observation-based. The observation-based policy
that for the first
nN
transition steps selects
α
and then
selects
β
afterwards yields an expected total reward of
1(1
/2)n. With n we obtain ERmax
M,R({s2})=1.
As computing maximal expected rewards exactly in POMDPs is undecidable
[
34
], we aim at under-approximating the actual value
ERmax
M,R
(
G
). This allows
us to answer our problem negatively if the computed lower bound exceeds λ.
Remark 2.
Expected rewards can be used to describe reachability probabilities
by assigning reward 1 to all transitions entering
G
and assigning reward 0 to
all other transitions. Our approach can thus be used to obtain lower bounds on
reachability probabilities in POMDPs. This also holds for almost-sure reachability
(i.e. “is the reachability probabilty one?”), though dedicated methods like those
presented in [17,16,27] are better suited for that setting.
26 A. Bork, J.-P. Katoen, T. Quatmann
2.2 Beliefs
The semantics of a POMDP
M
are captured by its (fully observable) belief
MDP. The infinite state space of this MDP consists of beliefs [
3
,
44
]. A belief is a
distribution over the states of the POMDP where each component describes the
likelihood to be in a POMDP state given a history of observations. We denote the
set of all beliefs for
M
by
BM:={bDist(S)| s, s0supp
(
b
) :
O
(
s
) =
O
(
s0
)
}
and write O(b)Zfor the unique observation O(s)of all ssupp(b).
The belief MDP of
M
is constructed by starting in the belief corresponding
to the initial state and computing successor beliefs to unfold the MDP. Let
P
(
s, α, z
)
:=Ps0S
[
O
(
s0
) =
z
]
·P
(
s, α, s0
)be the probability to observe
zZ
after taking action
α
in POMDP state
s
. Then, the probability to observe
z
after taking action
α
in belief
b
is
P
(
b, α, z
)
:=PsSb
(
s
)
·P
(
s, α, z
). We refer
to
Jb|α, zK BM
—the belief after taking
α
in
b
, conditioned on observing
z
—as
the α-z-successor of b. If P(b, α, z)>0, it is defined component-wise as
Jb|α, zK(s):=[O(s) = z]·Ps0Sb(s0)·P(s0, α, s)
P(b, α, z)
for all sS. Otherwise Jb|α, zKis undefined.
Definition 6 (Belief MDP).
The belief MDP of
M
is the MDP
bel
(
M
) =
BM,Act,PB, binit
, where
BM
is the set of all beliefs in
M
,
Act
is as for
M
,
binit :={sinit 7→ 1}
is the initial belief, and
PB:BM×Act × BM
[0
,
1] is the
belief transition function with
PB(b, α, b0):=(P(b, α, z)if b0=Jb|α, z K,
0otherwise.
We lift a POMDP reward structure Rto the belief MDP [25].
Definition 7 (Belief Reward Structure).
For beliefs
b, b0 BM
and action
αAct
, the belief reward structure
RB
based on
R
associated with
bel
(
M
)is
given by
RB(b, α, b0):=PsSb(s)·Ps0S[O(s0) = O(b0)] ·R(s, α, s0)·P(s, α, s0)
P(b, α, O(b0)) .
Given a set of goal states
GS
, we assume—for simplicity—that there is a set
of observations
Z0Z
such that
sG
iff
O
(
s
)
Z0
. This assumption can always
be ensured by transforming the POMDP
M
. See the full technical report [
10
] for
details. The set of goal beliefs for Gis given by GB:={b BM|supp(b)G}.
We now lift the computation of expected rewards to the belief level. Based on
the well-known Bellman equations [
5
], the belief MDP induces a function that
maps every belief to the expected total reward accumulated from that belief.
Definition 8 (POMDP Value Function).
For
b BM
, the
n
-step value
function Vn:BMRof Mis defined recursively as V0(b):= 0 and
Vn(b):= [b /GB]·max
αAct X
b0postbel(M)(b,α)
PB(b, α, b0)·RB(b, α, b0) + Vn1(b0).
Under-Approximating Expected Total Rewards in POMDPs 27
s07→1
s17→0s07→ 1
/2
s17→1
/2
s07→1
/4
s17→3
/4
s07→1
/8
s17→7
/8· · ·
s27→1
α1α1α1α1
β
1
RB: 0
β1
RB:1
/2
β
1RB:3
/4
β
1
RB:7
/8
α
1
Figure 2. Belief MDP bel(M)of POMDP Mfrom Fig. 1
The (optimal) value function
V
:
BMR
is given by
V
(
b
)
:= limn→∞ Vn
(
b
).
The
n
-step value function is piecewise linear and convex [
44
]. Thus, the optimal
value function can be approximated arbitrarily close by a piecewise linear convex
function [
47
]. The value function yields expected total rewards in
M
and
bel
(
M
):
ERmax
M,R(s|=G) = ERmax
bel(M),RB({s7→ 1} |=GB) = V({s7→ 1}).
Example 2.
Fig. 2shows a fragment of the belief MDP of the POMDP from
Fig. 1. Observe ERmax
bel(M),RB({s27→ 1}) = 1.
We reformulate our problem statement to focus on the belief MDP.
Problem 2
(equivalent to Problem 1). For a POMDP
M
, reward structure
R
,
goal states
GS
, and threshold
λR
, decide whether
V
(
{sinit 7→ 1}
)
λ
.
As the belief MDP is fully observable, standard results for MDPs apply. However,
an exhaustive analysis of
bel
(
M
)is intractable since the belief MDP is—in
general—infinitely large2.
3 Finite Exploration Under-Approximation
Instead of approximating values directly on the POMDP, we consider approx-
imations of the corresponding belief MDP. The basic idea is to construct a
finite abstraction of the belief MDP by unfolding parts of it and approximate
values at beliefs where we decide not to explore. In the resulting finite MDP,
under-approximative expected reward values can be computed by standard model
checking techniques. We present two approaches for abstraction: belief cut-offs
and belief clipping. We incorporate those techniques into an algorithmic framework
that yields arbitrarily tight under-approximations.
The technical report [10] contains formal proofs of our claims.
2
The set of all beliefs—i.e. the state space of
bel
(
M
)—is uncountable. The reachable
fragment is countable, though, since each belief has at most |Z|many successors.
28 A. Bork, J.-P. Katoen, T. Quatmann
s07→1
s17→0s07→ 1
/2
s17→1
/2
s07→1
/4
s17→3
/4bcut
s27→1
α1α1cut 1
R0:V(b)
β
1
R0: 0
β1
R0:1
/2α
1
cut
1
Figure 3. Applying belief cut-offs to the belief MDP from Fig. 2
3.1 Belief Cut-Offs
The general idea of belief cut-offs is to stop exploring the belief MDP at certain
beliefs—the cut-off beliefs—and assume that a goal state is immediately reached
while sub-optimal reward is collected. Similar techniques have been discussed in
the context of fully observable MDPs and other model types [
11
,
26
,
49
,
2
]. Our
work adapts the idea of cut-offs for POMDP over-approximations described in [
8
]
to under-approximations. The main idea of belief cut-offs shares similarities with
the SARSOP [
30
] and Goal-HSVI [
24
] approaches. While they apply cut-offs on
the level of the computation tree, our approach directly manipulates the belief
MDP to yield a finite model.
Let
V
:
BMR
with
V
(
b
)
V
(
b
)for all
b BM
. We call
V
an under-
approximative value function and
V
(
b
)the cut-off value of
b
. In each of the cut-off
beliefs
b
, instead of adding the regular transitions to its successors, we add a
transition with probability 1 to a dedicated goal state
bcut
. In the modified reward
structure
R0
, this cut-off transition is assigned a reward
3
of
V
(
b
), causing the
value for a cut-off belief
b
in the modified MDP to coincide with
V
(
b
). Hence,
the exact value of the cut-off belief—and thus the value of all other explored
beliefs—is under-approximated.
Example 3.
Fig. 3shows the resulting finite MDP obtained when considering
the belief MDP from Fig. 2with single cut-off belief b={s07→ 1
/4, s17→ 3
/4}.
Computing cut-off values. The question of finding a suitable under-approximative
value function
V
is central to the cut-off approach. For an effective approximation,
such a function should be easy to compute while still providing values close
to the optimum. If we assume a positive reward structure, the constant value
0is always a valid under-approximation. A more sophisticated approach is to
compute suboptimal expected reward values for the states of the POMDP using
some arbitrary, fixed observation-based policy
σΣM
obs
. Let
Uσ
:
SR
such that for all
sS
,
Uσ
(
s
) =
ERσ
M,R
(
s|
=
G
). Then, we define the function
Uσ:BMRas Uσ(b):=Pssupp(b)b(s)·Uσ(s).
3
We slightly deviate from Def. 4by allowing transition rewards to be
−∞
or +
.
Alternatively, we could introduce new sink states with a non-zero self-loop reward.
Under-Approximating Expected Total Rewards in POMDPs 29
Lemma 1. Uσis an under-approximative value function, i.e. for all b BM:
Uσ(b):=X
ssupp(b)
b(s)·Uσ(s)V(b).
Thus, finding a suitable under-approximative value function reduces to finding
“good” policies for
M
, e.g. by using randomly guessed fm-policies, machine
learning methods [13], or a transformation to a parametric model [28].
3.2 Belief Clipping
The cut-off approach provides a universal way to construct an MDP which under-
approximates the expected total reward value for a given POMDP. The quality
of the approximation, however, is highly dependent on the under-approximative
value function used. Furthermore, regions where the belief MDP slowly converges
towards a belief may pose problems in practice.
As a potential remedy for these problems, we propose a different concept
called belief clipping. Intuitively, the procedure shifts some of the probability mass
of a belief
b
in order to transform
b
to another belief
˜
b
. We then connect
b
to
˜
b
in
a way that the accuracy of our approximation of the value
V
(
b
)depends only
on the approximation of
V
(
˜
b
)and the so-called clipping value—some notion of
distance between
b
and
˜
b
that we discuss below. We can thus focus on exploring
the successors of ˜
bto obtain good approximations for both beliefs band ˜
b.
Definition 9 (Belief Clip).
For
b BM
, we call
µ:supp
(
b
)
[0
,
1] abelief
clip if
ssupp
(
b
)
:µ
(
s
)
b
(
s
)and
P
(
µ
)
:=Pssupp(b)µ
(
s
)
<
1. The belief
(bµ) BMinduced by µis defined by
ssupp(b) : (bµ)(s):=b(s)µ(s)
1P(µ).
Intuitively, a belief clip
µ
for
b
describes for each
ssupp
(
b
)the probability
mass that is removed (“clipped away”) from
b
(
s
). The induced belief is obtained
when normalising the resulting values so that they sum up to one.
Example 4.
For belief
b
=
{s07→ 1
/4, s17→ 3
/4}
, consider the two belief clips
µ1
=
{s07→ 1
/4, s17→ 1
/4}
and
µ2
=
{s07→ 1
/4, s17→ 0}
. Both induce the same
belief: (bµ1)=(bµ2) = {s07→ 0, s17→ 1}.
We have
supp
((
bµ
))
supp
(
b
), which also implies
O
((
bµ
)) =
O
(
b
). Given
some candidate belief ˜
b, consider the set of inducing belief clips:
C(b, ˜
b):=nµ:supp(b)[0,1] |µis a belief clip for bwith ˜
b= (bµ)o.
Belief ˜
bis called an adequate clipping candidate for biff C(b, ˜
b)6=.
Definition 10 (Clipping Value).
For
b BM
and adequate clipping candidate
˜
b
, the clipping value is
b˜
b:=P
(
δb˜
b
), where
δb˜
b:= arg minµ∈C(b,˜
b)P
(
µ
).
The values δb˜
b(s)for ssupp(b)are the state clipping values.
30 A. Bork, J.-P. Katoen, T. Quatmann
s07→1
s17→0s07→ 1
/2
s17→1
/2
s07→1
/4
s17→3
/4bcut
s07→0
s17→1
s27→1
α1α1clip 1
/4
3
/4
β
1
R0: 0
β1
R0:1
/2β
1
R0: 1
α
1
α1
cut
1
Figure 4. Applying belief clipping to the belief MDP from Fig. 2
Given a belief
b
and an adequate clipping candidate
˜
b
, we outline how the notion
of belief clipping is used to obtain valid under-approximations. We assume
b6
=
˜
b
,
implying 0
< b˜
b<
1. Instead of exploring all successors of
b
in
bel
(
M
), the
approach is to add a transition from
b
to
˜
b
. The newly added transition has
probability 1
b˜
b
and gets assigned a reward of 0. The remaining probability
mass (i.e.
b˜
b
) leads to a designated goal state
bcut
. To guarantee that—in
general—the clipping procedure yields a valid under-approximation, we need to
add a corrective reward value to the transition from
b
to
bcut
. Let
L
:
SR
which maps each POMDP state to its minimum expected reward in the underlying,
fully observable MDP
M
of
M4
, i.e.
L
(
s
) =
ERmin
M,R
(
s|
=
G
). This function
soundly under-approximates the state values which can be achieved by any
observation-based policy. It can be generated using standard MDP analysis.
Given state clipping values
δb˜
b
(
s
)for
ssupp
(
b
), the reward for the transition
from bto bcut is Pssupp(b)(δb˜
b(s)/∆b˜
b)·L(s).
Example 5.
For the belief MDP from Fig. 2, belief
b
=
{s07→ 1
/4, s17→ 3
/4}
,
and clipping candidate
˜
b
=
{s07→ 0, s17→ 1}
we get
b˜
b
=
1
/4
, as
δb˜
b
=
µ2
=
{s07→ 1
/4, s17→ 0}
with the belief clip
µ2
as in Example 4. Furthermore,
L
(
s0
) = 0. The resulting MDP following our construction above is given in Fig. 4.
The following lemma shows that the construction yields an under-approximation.
Lemma 2. (1 b˜
b)·V(˜
b) + b˜
b·X
ssupp(b)
δb˜
b(s)
b˜
b
·L(s)V(b).
Proof
(sketch). To gain some intuition, consider the special case, where
b˜
b
=
δb˜
b
(
s
) =
b
(
s
)for some
ssupp
(
b
). The clipping candidate
˜
b
can be interpreted
as the conditional probability distribution arising from distribution
b
given that
s
is not the current state. The value
V
(
b
)can be split into the sum of (i) the
probability that
s
is not the current state times the reward accumulated from
belief
˜
b
and (ii) the probability that
s
is the current state times the reward
accumulated from
s
, i.e. from the belief
{s7→ 1}
. However, for the two summands
4
When rewards are negative, we might have
L
(
s
) =
−∞
for many
sS\G
in which
case the applicability of the clipping approach is very limited.
Under-Approximating Expected Total Rewards in POMDPs 31
we must consider a policy that does not distinguish between the beliefs
b
,
˜
b
, and
{s7→ 1}
as well as their observation-equivalent successors. In other words, the
same sequence of actions must be executed when the same observations are made.
We consider such a policy that in addition is optimal at
˜
b
, i.e. the reward
accumulated from
˜
b
is equal to
V
(
˜
b
). For the reward accumulated from
{s7→ 1}
,
L
(
s
)provides a lower bound. Hence, (1
b
(
s
))
·V
(
b
) +
b
(
s
)
·L
(
s
)is a lower
bound for the reward accumulated from b. A formal proof is given in [10]. ut
To find a suitable clipping candidate for a given belief
b
, we consider a finite
candidate set
B BM
consisting of beliefs with observation
O
(
b
). These beliefs
do not need to be reachable in the belief MDP. The set can be constructed, e.g.
by taking already explored beliefs or by using a fixed, discretised set of beliefs.
We are interested in minimising the clipping value
bb0
over all candidate
beliefs
b0B
. A naive approach is to explicitly compute all clipping values for all
candidates. We are using mixed-integer linear programming (MILP) [
41
] instead.
An MILP is a system of linear inequalities (constraints) and a linear objective
function considering real-valued and integer variables. A feasible solution of the
MILP is a variable assignment that satisfies all constraints. An optimal solution
is a feasible solution that minimises the objective function.
Definition 11 (Belief Clipping MILP).
The belief clipping MILP for belief
b BMand finite set of candidates B {b0 BM|O(b0) = O(b)}is given by:
minimise such that:
X
b0B
ab0= 1 Select exactly one candidate b0(1)
b0B:ab0 {0,1}(2)
X
ssupp(b)
δs= Compute clipping value for selected b0(3)
ssupp(b) : δs[0, b(s)] (4)
b0B:δsb(s)(1 )·b0(s)(1 ab0) (5)
The MILP consists of
O
(
|supp
(
b
)
|
+
|B|
)variables and
O
(
|supp
(
b
)
|·|B|
)con-
straints. For
b0B
, the binary variable
ab0
indicates whether
b0
has been chosen
as the clipping candidate. Moreover, we have variables
δs
for
ssupp
(
b
)and a
variable
to represent the (state) clipping values for
b
and the chosen candidate
b0
. Constraints 1and 2enforce that exactly one of the
ab0
variables is one, i.e.
exactly one belief is chosen. Constraint 3forces
to be the sum of all state
clipping values.
δs
variables get a value between zero and
b
(
s
)(Constraint 4).
Constraint 5only affects δsif the corresponding belief is chosen. Otherwise, ab0
is set to 0and the value on the right-hand side becomes negative. If a belief
b0
is chosen, the minimisation forces Constraint 5to hold with equality as the
right-hand side is greater or equal to 0. Assuming
is set to a value below 1, we
obtain a valid clipping values as
ssupp(b) : δs=b(s)(1 )·b0(s) b0(s) = b(s)δs
1.
32 A. Bork, J.-P. Katoen, T. Quatmann
Input : POMDP M=hM , Z, Oiwith M=hS, Act,P, sinit i, reward
structure R, goal states GS, under-approx. value function V,
function L:SRwith L(s) = ERmin
M,R(s|=G)
Output : Clipping belief MDP KMand reward structure RK
1SK {binit , bcut}with binit ={sinit 7→ 1}and a new belief state bcut
2PK(bcut,cut, bcut )1,RK(bcut ,cut, bcut)0// add self-loop
3Q {binit }// initialize exploration set
4while Q6=do
5bchooseBelief(Q),QQ\ {b}// pop next belief to explore from Q
6if supp
(
b
)
Gthen PK
(
b, goal, b
)
1,
RK
(
b, goal, b
)
0
//
add self-loop
7else if exploreBelief(b)then // expand b
8foreach αAct(b)do // Using bel(M)and RBas in Defs. 6and 7
9foreach b0postbel(M)(b, α)do
10 PK(b, α, b0)PB(b, α, b0),RK(b, α, b0)RB(b, α, b0)
11 if b0/SKthen SKSK {b0},QQ {b0}
12 else // apply cut-off and clipping to b
13 PK(b, cut, bcut)1,RK(b, cut, bcut )V(b)// add cut-off transition
14 choose a finite set B BMof clipping candidates for b
15 ˜
b, b˜
b, δb˜
bsolveClippingMILP(b, B)
16 if ˜
b6=band ˜
bis adequate then // Clip busing ˜
b
17 PK(b, clip,˜
b)(1b˜
b),PK(b, clip, bcut)b˜
b
18 RK(b, clip,˜
b)0,RK(b, clip, bcut)Pssupp (b)
δb˜
b(s)
b˜
b·L(s)
19 if ˜
b /SKthen SKSK {˜
b},QQ {˜
b}
20 return KM=SK,Act ] {goal,cut,clip},PK, binitand RK
Algorithm 1: Belief exploration algorithm with cut-offs and clipping
A trivial solution of the MILP is always obtained by setting
ab0
and
to 1and
δs
to
b
(
s
)for all
s
and an arbitrary
b0B
. This corresponds to an invalid belief
clip. However, as we minimise the value for
, we can conclude that no belief in
the candidate set is adequate for clipping if is 1in an optimal solution.
Theorem 1.
An optimal solution to the belief clipping MILP for belief
b
and
candidate set
B
sets
a˜
b
to 1 and
to a value below 1 iff
˜
bB
is an adequate
clipping candidate for bwith minimal clipping value.
3.3 Algorithm
We incorporate belief cut-offs and belief clipping into an algorithmic framework
outlined in Algorithm 1. As input, the algorithm takes an instance of Problems 1
and 2, i.e. a POMDP
M
with reward structure
R
and goal states
G
. In addition,
the algorithm considers an under-approximative value function
V
(Sect. 3.1) and
a function Lfor the computation of corrective reward values (Sect. 3.2).
Lines 1and 2initialise the state set
SK
of the under-approximative MDP
KM
with the initial belief
binit
and the designated goal state
bcut
which has only one
Under-Approximating Expected Total Rewards in POMDPs 33
transition to itself with reward 0. Furthermore, we initialise the exploration set
Q
by adding
binit
(Line 3). During the computation,
Q
is used to keep track of
all beliefs we still need to process. We then execute the exploration loop (Lines 4
to 19) until
Q
becomes empty. In each exploration step, a belief
b
is selected
5
and removed from Q. There are three cases for the currently processed belief b.
If
supp
(
b
)
G
, i.e.
b
is a goal belief, we add a self-loop with reward 0to
b
and continue with the next belief (Line 6).
b
is not expanded as successors of
goal beliefs will not influence the result of the computation.
If
b
is not a goal belief, we use a heuristic function
6exploreBelief
to decide
if
b
is expanded in Line 7. Lines 8to 11 outline the expansion step. The transitions
from
b
to its successor beliefs and the corresponding rewards as in the original
belief MDP (see Sect. 2.2) are added. Furthermore, the successor beliefs that
have not been encountered before are added to the set of states
SK
and the
exploration set Q.
If
b
is not expanded, we apply the cut-off approach and the clipping approach
to
b
in Lines 12 to 19. In Line 13 we add a cut-off transition from
b
to
bcut
with
a new action
cut
. We use the given under-approximative value function
V
to
compute the cut-off reward. Towards the clipping approach, a set of candidate
beliefs is chosen and the belief clipping MILP for
b
and the candidate set is
constructed as described in Def. 11 (Lines 14 and 15). If an adequate candidate
˜
b
with clipping values
b˜
b
and
δb˜
b
(
s
)for
ssupp
(
b
)has been found, we add
the transitions from
b
to
bcut
and to
˜
b
using a new action
clip
and probabilities
b˜
b
and 1
b˜
b
, respectively. Furthermore, we equip the transitions with
reward values as described in Sect. 3.2 using the given function
L
(Lines 16 to 18).
If the clipping candidate
˜
b
has not been encountered before, we add it to the
state space of the MDP and to the exploration set in Line 19.
The result of the algorithm is an MDP
KM
with reward structure
RK
. The
set of states
SK
of
KM
contains all encountered beliefs. To guarantee termination
of the algorithm, the decision heuristic
exploreBelief
has to stop exploring
further beliefs at some point. Moreover, the handling of clipping candidates in
Line 19 should not add new beliefs to
Q
infinitely often. We therefore fix a finite
set of candidate beliefs
B# BM
and make sure that the candidate sets
B
in
Line 14 satisfy (
B\SK
)
B#
. To ensure a certain progress in the exploration
clip
-cycles”—i.e. paths of the form
b1clip . . . clip bnclip b1
—are avoided in
KM
.
This can be done, e.g. by always expanding the candidate beliefs b B#.
Expected total rewards until reaching the extended set of goal beliefs
Gcut :=
GB {bcut}in KMunder-approximate the values in the belief MDP:
Theorem 2. For all beliefs bSK\ {bcut}it holds that
ERmax
KM,RK(b|=Gcut)V(b) = ERmax
bel(M),RB(b|=GB).
Corollary 1. ERmax
KM,RK(Gcut)ERmax
M,R(G).
5For example, Qcan be implemented as a FIFO queue.
6
The decision can be made for example by considering the size of the already explored
state space such that the expansion is stopped if a size threshold has been reached.
More involved decision heuristics are subject to further research.
34 A. Bork, J.-P. Katoen, T. Quatmann
Table 1. Results for benchmark POMDPs with maximisation objective
Benchmark Data Prism Storm
Model φ S/Act /ZCut-Off Cut-Off + Clipping Over-
Only η=2 η=3 η=4 η=6 Approx.
Drone
Pmax
1226 TO / MO 0.79 0.79
TO TO TO
0.94
4-1 2954 <1s1360s
384 3·1043·104
Drone
Pmax
1226 TO / MO 0.86 0.91 0.92
TO TO
0.97
4-2 2954 <1s249s1902s
761 2·1042·1042·104
Grid-av
Pmax
17 [0.21,1.0] 0.86 0.93 0.93 0.93 0.93 0.98
4-0 59 5.14s < 1s < 1s1.77s3.63s13.9s
4η=6238 312 472 663 1300
Grid-av
Pmax
17 [0.21,1.0] 0.82 0.85 0.82 0.85
TO
0.99
4-0.1 59 1.47s < 1s26.1s198s1913s
4η=3238 317 461 759
Netw-p
Rmax
2·104[557,557] 537 537 537 537 537 558
2-8-20 3·1042355s2.3s98.5s320s651s2368s
4909 η=10 8·1041·1051·1051·1051·105
Netw-p
Rmax
2·105TO / MO 769 769
TO TO TO
819
3-8-20 3·105290s6640s
2·1041·1061·106
Refuel
Pmax
208 [0.67,0.72] 0.67 0.67 0.67 0.67 0.67 0.69
06 565 4625s < 1s5.89s24.3s92s2076s
50 η=34576 4834 5204 5603 6135
Refuel
Pmax
470 TO / MO 0.45 0.45
TO TO TO
0.51
08 1431 <1s839s
66 2·1042·104
4 Experimental Evaluation
Implementation details. We integrated Algorithm 1in the probabilistic model
checker Storm [
23
] as an extension of the POMDP verification framework
described in [
8
]. Inputs are a POMDP—encoded either explicitly or using an
extension of the Prism language [
37
]—and a property specification. Internally,
POMDPs and MDPs are represented using sparse matrices. The implementation
supports minimisation
7
and maximisation of reachability probabilities, reach-
avoid probabilities (i.e. the probability to avoid a set of bad state until a set of goal
states is reached), and expected total rewards. In a preprocessing step, functions
V
and
L
as considered in Algorithm 1are generated. For
V
, we consider the function
Uσ
as in Lemma 1, where
σ
is a memoryless observation-based policy given by a
heuristic
8
. For the function
L
, we apply standard MDP analysis on the underlying
MDP. When exploring the abstraction MDP
KM
, our heuristic expands a belief iff
|SK|≤|S|·maxzz|O1
(
z
)
|
, where
|SK|
is the number of already explored beliefs
and
|O1
(
z
)
|
is the number of POMDP states with observation
z
. Belief clipping
can either be disabled entirely, or we consider candidate sets
B B#
η
, where
B#
η:={b B | sS:b(s) {i
/η|iN,0iη}}
forms a finite, regular grid
of beliefs with resolution
ηN\ {0}
. Grid beliefs
b B#
η
are always expanded.
7For minimisation, the under-approximation yields upper bounds.
8
The heuristic uses optimal values obtained on the fully observable underlying MDP.
Under-Approximating Expected Total Rewards in POMDPs 35
Table 2. Results for benchmark POMDPs with minimisation objective
Benchmark Data Prism Storm
Model φ S/Act /ZCut-Off Cut-Off + Clipping Over-
Only η=2 η=3 η=4 η=6 Approx.
Grid
Rmin
17 [4.52,4.7]4.78 4.78 4.78 4.78
TO
4.52
4-0.1 62 649s < 1s15.6s148s1940s
3η=10 258 255 255 255
Grid
Rmin
17 [6.12,6.31]6.56 6.56 6.56 6.56
TO
6.08
4-0.3 62 1077s < 1s15.8s148s1983s
3η=10 255 256 256 256
Maze2
Rmin
15 [6.32,6.32]6.34 6.34 6.34 6.34 6.34 6.32
0.1 54 1.79s < 1s < 1s < 1s < 1s2.02s
8η=10 91 90 90 90 90
Netw
Rmin
4589 [3.17,3.2]6.56 6.56 6.56 6.56 6.56 3.14
2-8-20 6973 211s < 1s5.31s17.2s42.3s167s
1173 η=10 2·1042·1042·1043·1043·104
Netw
Rmin
2·104[5.61,6.79]11.911.911.911.9
TO
6.13
3-8-20 3·1047133s3.51s214s1372s4910s
2205 η=61·1052·1052·1052·105
Rocks
Rmin
6553 38 38 38 20 21 20
12 3·104TO / MO 1.39s61.1s138s230s532s
1645 3·1043·1043·1045·1046·104
Rocks
Rmin
1·10444 44 44 26 27 26
16 5·104TO / MO 3.85s114s230s399s1062s
2761 4·1044·1044·1046·1041·105
Furthermore, we exclude clipping candidates
˜
b
with
δb˜
b
(
s
)
>
0for
s
with
L
(
s
) =
−∞
; clipping with such candidates is not useful as it induces a value of
−∞
.
Expected total rewards on fully observable MDPs are computed using Sound Value
Iteration [
39
] with relative precision 10
6
. MILPs are solved using Gurobi [
21
].
Set-up. We evaluate our under-approximation approach with cut-offs only and
with enabled belief clipping procedure using grid resolutions
η
= 2
,
3
,
4
,
6. We
consider the same POMDP benchmarks
9
as in [
37
,
8
]. The POMDPs are scalable
versions of case studies stemming from various application domains. To establish
an external baseline, we compare with the approach of [
37
] implemented in
Prism [
31
]. Prism generates an under-approximation based on an optimal policy
for an over-approximative MDP which—in contrast to Storm—means that always
both, under- and over-approximations, have to be computed. We ran Prism with
resolutions
η
= 2
,
3
,
4
,
6
,
8
,
10 and report on the best approximation obtained.
To provide a further reference for the tightness of our under-approximation,
we compute over-approximative bounds as in [
8
] using the implementation in
Storm with a resolution of
η
= 8. All experiments were run on an Intel
®
Xeon
®
Platinum 8160 CPU using 4 threads
10
, 64GB RAM and a time limit of 2 hours.
Results. Tables 1and 2show our results for maximising and minimising properties,
respectively. The first columns contain for each POMDP the benchmark name,
9
Instances with a finite belief MDP that would be fully explored by our algorithm are
omitted since the exact value can be obtained without approximation techniques.
10
For our implementation, only Gurobi runs multi-threaded. Prism uses multiple
threads for garbage collection.
36 A. Bork, J.-P. Katoen, T. Quatmann
0 10,000 20,000 30,000
0.5
0.75
1
Number of explored beliefs |SK|
Pr(G)
Cut-Off η= 2
Figure 5. Accuracy for Drone 4-2 with different sizes of approximation MDP KM
model parameters, property type (probabilities (P) or rewards (R)), and the
numbers of states, state-action pairs, and observations. Column Prism gives the
result with the smallest gap between over- and under-approximation computed
with the approach of [
37
]. For maximising (minimising) properties, our approach
competes with the lower (upper) bound of the provided interval. The relevant
value is marked in bold. We also provide the computation time and the considered
resolution η. For our implementation, we give results for the configuration with
disabled clipping and for clipping with different resolutions
η
. In each cell, we
give the obtained value, the computation time and the number of states in the
abstraction MDP
KM
. Time- and memory-outs are indicated by TO and MO.
The right-most column indicates the over-approximation value computed via [
8
].
Discussion. The pure cut-off approach yields valid under-approximations in all
benchmark instances—often exceeding the accuracy of the approach of [
37
] while
being consistently faster. In some cases, the resulting values improve when clipping
is enabled. However, larger candidate sets significantly increase the computation
time which stems from the fact that many clipping MILPs have to be solved.
For Drone 4-2, Fig. 5plots the resulting under-approximation values (
y
-axis)
for varying sizes of the explored MDP
KM
(
x
-axis). The horizontal, dashed line in-
dicates the computed over-approximation value. The quality of the approximation
further improves with an increased number of explored beliefs.
5 Conclusion
We presented techniques to safely under-approximate expected total rewards in
POMDPs. The approach scales to large POMDPs and often produces tight lower
bounds. Belief clipping generally does not improve on the simpler cut-off approach
in terms of results and performance. However, considering—and optimising—the
approach for particular classes of POMDPs might prove beneficial. Future work
includes integrating the algorithm into a refinement loop that also considers
over-approximation techniques from [
8
]. Furthermore, lifting our approach to
partially observable stochastic games is promising.
Data Availability. The artifact [
9
] accompanying this paper contains source code,
benchmark files, and replication scripts for our experiments.
Under-Approximating Expected Total Rewards in POMDPs 37
References
1.
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic control-
lers for POMDPs and decentralized POMDPs. Auton. Agents Multi Agent Syst.
21(3), 293–320 (2010)
2.
Ashok, P., Butkova, Y., Hermanns, H., Kretínský, J.: Continuous-time Markov
decisions based on partial exploration. In: ATVA. Lecture Notes in Computer
Science, vol. 11138, pp. 317–334. Springer (2018)
3.
Aström, K.J.: Optimal control of Markov processes with incomplete state informa-
tion. J. of Mathematical Analysis and Applications 10(1), 174–205 (1965)
4. Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)
5.
Bellman, R.: A Markovian decision process. Journal of Mathematics and Mechanics
6, 679–684 (1957)
6.
Bonet, B.: Solving large POMDPs using real time dynamic programming. In: AAAI
Fall Symp. on POMDPs (1998)
7.
Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. Point-based Algorithms.
In: IJCAI. pp. 1641–1646 (2009)
8.
Bork, A., Junges, S., Katoen, J., Quatmann, T.: Verification of indefinite-horizon
POMDPs. In: ATVA. Lecture Notes in Computer Science, vol. 12302, pp. 288–304.
Springer (2020)
9.
Bork, A., Katoen, J.P., Quatmann, T.: Artifact for Paper: Under-
Approximating Expected Total Rewards in POMDPs. Zenodo (2022). ht-
tps://doi.org/10.5281/zenodo.5643643
10.
Bork, A., Katoen, J.P., Quatmann, T.: Under-Approximating Expected Total
Rewards in POMDPs. arXiv e-print (2022), https://arxiv.org/abs/2201.08772
11.
Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Křetínsk`y, J., Kwiatkowska,
M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning
algorithms. In: ATVA. Lecture Notes in Computer Science, vol. 8837, pp. 98–114.
Springer (2014)
12.
Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In:
AAAI. pp. 690–696. AAAI Press / The MIT Press (2004)
13.
Carr, S., Jansen, N., Topcu, U.: Verifiable rnn-based policies for POMDPs under
temporal logic constraints. In: IJCAI. pp. 4121–4127. ijcai.org (2020)
14.
Carr, S., Jansen, N., Wimmer, R., Serban, A.C., Becker, B., Topcu, U.:
Counterexample-guided strategy improvement for POMDPs using recurrent neural
networks. In: IJCAI. pp. 5532–5539. ijcai.org (2019)
15.
Chatterjee, K., Chmelík, M., Davies, J.: A symbolic SAT-based algorithm for almost-
sure reachability with small strategies in POMDPs. In: AAAI. pp. 3225–3232 (2016)
16.
Chatterjee, K., Chmelík, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure
reachability in POMDPs. Artificial Intelligence 234, 26–48 (2016)
17.
Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-
observable Markov decision processes. In: MFCS. Lecture Notes in Computer
Science, vol. 6281, pp. 258–269. Springer (2010)
18.
Cheng, H.T.: Algorithms for partially observable Markov decision processes. Ph.D.
thesis, University of British Columbia (1988)
19.
Doshi, F., Pineau, J., Roy, N.: Reinforcement learning with limited reinforcement:
Using Bayes risk for active learning in POMDPs. In: ICML. pp. 256–263 (2008)
20.
Eagle, J.N.: The optimal search for a moving target when the search path is
constrained. Operations Research 32(5), 1107–1115 (1984)
38 A. Bork, J.-P. Katoen, T. Quatmann
21.
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2021), https:
//www.gurobi.com
22.
Hauskrecht, M.: Value-function approximations for partially observable Markov
decision processes. J. Artif. Intell. Res. 13, 33–94 (2000)
23.
Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic
model checker Storm. Int. J. on Software Tools for Technology Transfer (2021).
https://doi.org/10.1007/s10009-021-00633-z
24.
Horák, K., Bošanský, B., Chatterjee, K.: Goal-HSVI: Heuristic Search Value Iteration
for Goal POMDPs. In: IJCAI. pp. 4764–4770. ijcai.org (7 2018)
25.
Itoh, H., Nakamura, K.: Partially observable Markov decision processes with impre-
cise parameters. Artificial Intelligence 171(8-9), 453–490 (2007)
26.
Jansen, N., Dehnert, C., Kaminski, B.L., Katoen, J., Westhofen, L.: Bounded model
checking for probabilistic programs. In: ATVA. Lecture Notes in Computer Science,
vol. 9938, pp. 68–85 (2016)
27.
Junges, S., Jansen, N., Seshia, S.A.: Enforcing almost-sure reachability in POMDPs.
In: CAV (2). Lecture Notes in Computer Science, vol. 12760, pp. 602–625. Springer
(2021)
28.
Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J.P.,
Becker, B.: Finite-state Controllers of POMDPs via Parameter Synthesis. In: UAI.
pp. 519–529. AUAI Press (2018)
29.
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially
observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)
30.
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: Efficient point-based POMDP
planning by approximating optimally reachable belief spaces. In: Robotics: Science
and Systems. vol. 2008 (2008)
31.
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic
real-time systems. In: CAV. Lecture Notes in Computer Science, vol. 6806, pp.
585–591. Springer (2011)
32.
Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov
decision processes. Operations Research 39(1), 162–175 (1991)
33.
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning
and infinite-horizon partially observable Markov decision problems. In: AAAI/IAAI.
pp. 541–548 (1999)
34.
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning
and related stochastic optimization problems. Artificial Intelligence
147
(1-2), 5–34
(2003)
35.
Meuleau, N., Kim, K.E., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by
searching the space of finite policies. In: UAI. pp. 417–426 (1999)
36.
Monahan, G.E.: State of the art a survey of partially observable Markov decision
processes: theory, models, and algorithms. Management Science
28
(1), 1–16 (1982)
37.
Norman, G., Parker, D., Zou, X.: Verification and Control of Partially Observable
Probabilistic Systems. Real-Time Systems 53(3), 354–402 (2017)
38.
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm
for POMDPs. In: IJCAI. vol. 3, pp. 1025–1032 (2003)
39.
Quatmann, T., Katoen, J.: Sound value iteration. In: CAV (1). Lecture Notes in
Computer Science, vol. 10981, pp. 643–661. Springer (2018)
40.
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach (4th Edition).
Pearson (2020)
41.
Schrijver, A.: Theory of Linear and Integer Programming. John Wiley & Sons
(1986)
Under-Approximating Expected Total Rewards in POMDPs 39
42.
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers.
Autonomous Agents and Multi-Agent Systems 27(1), 1–51 (2013)
43.
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS. pp.
2164–2172 (2010)
44.
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov
processes over a finite horizon. Operations Research 21(5), 1071–1088 (1973)
45.
Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: UAI. pp.
520–527 (2004)
46.
Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes. Ph.D.
thesis, Stanford Univ Calif Stanford Electronics Labs (1971)
47.
Sondik, E.J.: The optimal control of partially observable Markov processes over the
infinite horizon: Discounted costs. Operations research 26(2), 282–304 (1978)
48.
Spaan, M.T., Vlassis, N.: Perseus: Randomized point-based value iteration for
POMDPs. J. of Artificial Intelligence Research 24, 195–220 (2005)
49.
Volk, M., Junges, S., Katoen, J.P.: Fast dynamic fault tree analysis by model
checking techniques. IEEE Transactions on Industrial Informatics
14
(1), 370–379
(2017)
50.
Wang, Y., Chaudhuri, S., Kavraki, L.E.: Bounded Policy Synthesis for POMDPs
with Safe-Reachability Objectives. In: AAMAS. pp. 238–246 (2018)
51.
Winterer, L., Junges, S., Wimmer, R., Jansen, N., Topcu, U., Katoen, J.P., Becker,
B.: Motion planning under partial observability using game-based abstraction. In:
CDC. pp. 2201–2208. IEEE (2017)
52.
Zhang, N.L., Lee, S.S.: Planning with partially observable Markov decision processes:
advances in exact solution method. In: UAI. pp. 523–530 (1998)
53.
Zhang, N.L., Zhang, W.: Speeding up the convergence of value iteration in partially
observable Markov decision processes. Journal of Artificial Intelligence Research
14, 29–51 (2001)
Open Access
This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
40 A. Bork, J.-P. Katoen, T. Quatmann
Correct Probabilistic Model Checking
with Floating-Point Arithmetic?
Arnd Hartmanns
University of Twente, Enschede, The Netherlands
a.hartmanns@utwente.nl
Abstract. Probabilistic model checking computes probabilities and ex-
pected values related to designated behaviours of interest in Markov
models. As a formal verification approach, it is applied to critical sys-
tems; thus we trust that probabilistic model checkers deliver correct re-
sults. To achieve scalability and performance, however, these tools use
finite-precision floating-point numbers to represent and calculate prob-
abilities and other values. As a consequence, their results are affected
by rounding errors that may accumulate and interact in hard-to-predict
ways. In this paper, we show how to implement fast and correct prob-
abilistic model checking by exploiting the ability of current hardware
to control the direction of rounding in floating-point calculations. We
outline the complications in achieving correct rounding from higher-
level programming languages, describe our implementation as part of
the Modest Toolset’s mcsta model checker, and exemplify the trade-
offs between performance and correctness in an extensive experimental
evaluation across different operating systems and CPU architectures.
1 Introduction
Given a Markov chain or Markov decision process (MDP [25]) model of a safety-
or performance-critical system, probabilistic model checking (PMC) calculates
quantitative properties of interest: the probability of (rare or catastrophic) fail-
ures, the expected recovery time after service interruption, or the long-run aver-
age throughput. These properties involve probabilities or expected costs/rewards
of sets of model behaviours, and are often specified in a temporal logic like
PCTL [16]. As a formal verification approach, users place great trust in the
results delivered by a PMC tool such as Prism [22], Storm [9], ePMC [15],
or the Modest Toolset’s [18]mcsta. In contrast to classical model checkers
for functional, Boolean-valued properties specified in e.g. LTL or CTL [2], a
probabilistic model checker is inherently quantitative: the input model contains
real-valued probabilities and costs/rewards; PCTL makes comparisons between
real-valued constants and probabilities; the most efficient algorithms numerically
iterate towards a fixpoint; and the final result itself may well be a real number.
?This work was supported by NWO VENI grant no. 639.021.754 and the EU’s Horizon
2020 research and innovation programme under MSCA grant agreement 101008233.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 41–59, 2022.
https://doi.org/10.1007/978-3-030-99527-0_3
Often, we can restrict to rationals, which simplifies the theory and facilitates
“exact” algorithms using arbitrary-precision rational number datatypes. These
algorithms only work for small models (as shown in the most recent QComp
2020 competition of quantitative verification tools [6]). In this paper, we thus
focus on the PMC techniques that scale to large problems: those building upon
iterative numerical algorithms, in particular value iteration (VI) [8]. We restrict
to probabilistic reachability, i.e. calculating the probability to eventually reach a
goal state, as this is the core problem in PMC for MDP. Embedded in the usual
recursive CTL algorithm, it allows us to check any (unbounded) PCTL formula.
Starting from a trivial underapproximation of the reachability probability
for each state of the model, VI iteratively improves the value of each state
based on its successors’ values. The true reachability probabilities are the least
fixpoint of this procedure, towards which the algorithm converges. For roughly
a decade, PMC tools implemented VI by stopping once the relative or absolute
difference between subsequent iterations was below a threshold . Haddad and
Monmege [12] showed in 20141that this does not guarantee a difference of
between the reported and the true probability, putting in question the trust
placed in PMC tools. Then variants of VI were developed that provide sound, i.e.
-correct, results: interval iteration (II) [3,5,13], sound value iteration (SVI) [26],
and optimistic value iteration (OVI) [19]. We focus on II as the prototypical
sound algorithm. It additionally iterates on an overapproximation; its stopping
criterion is the difference between over- and underapproximation being .
If all probabilities in an MDP are rational numbers, then the true reachability
probability as well as all intermediate values in II are rational, too. Yet imple-
menting II with arbitrary-precision rationals is impractical since the smaller-
and-smaller differences between intermediate values end up using excessive com-
putation time and memory. II is thus implemented with fixed-precision (usually
64-bit IEEE 754 double precision) floating point numbers. These, however, can-
not represent all rationals, so operations must round to nearby representable
values. Although II is numerically benign, consisting only of multiplications and
additions within [0,1], the default round to nearest, ties to even policy can cause
II to deliver incorrect results. Wimmer et al. [29] show an example where PMC
tools incorrectly state that a simple PCTL property is satisfied by a small Markov
chain due to the underlying numeric difference having disappeared in rounding.
We confirmed with current versions of Prism,Storm, and mcsta that the prob-
lem persists to today, even when requesting a “sound” algorithm like II. Wimmer
et al. propose interval arithmetic to avoid such problems, cautioning that
[...] the memory consumption will roughly double, since two numbers for
the interval bounds have to be stored [...]. The runtime will be higher by
a small factor, because we need to derive lower and upper bounds for the
intervals, requiring two model checking runs per sub-formula. [29, p. 5]
They did not provide an implementation, and we are not aware of any to date.
1Wimmer et al. [29] already in 2008 mention this problem in a more general setting,
but neither give a concrete counterexample nor propose a solution tailored to PMC.
42 A. Hartmanns
Our contribution. We present the first PMC implementation that computes cor-
rect lower and upper bounds on reachability probabilities despite using floating-
point arithmetic. We benefit from two developments since Wimmer et al.’s paper
of 2008: First, II (published 2014) already uses intervals (though not as Wimmer
et al. envisioned), necessarily doubling memory consumption compared to VI (as
do SVI and OVI, so it appears an unavoidable cost of soundness). In place of
“two model checking runs per sub-formula”, we can make the two interleaved
computations inside II safe w.r.t. rounding. Second, hardware and programming
language support for controlling the rounding direction in floating-point opera-
tions has improved, in particular with the AVX-512 instruction set in the newest
x86-64 CPUs and widespread compiler support for C99’s “floating-point environ-
ment” header fenv.h. Nevertheless, it is nontrivial to achieve runtime that is
only “higher by a small factor”. For the analysis of probabilistic systems, the only
related use of safe rounding we are aware of is in the SSMT tool SiSAT [27].
Structure. We recap PMC and II (Sect. 2) as well as problems and solutions re-
lated to rounding in floating-point arithmetic in Sect. 3. We then present our new
approach in Sect. 4, including important implementation aspects. The perfor-
mance of our approach is crucial to its adoption in tools; thus in Sect. 5we report
on extensive experiments across different software and hardware configurations
on models from the Quantitative Verification Benchmark Set (QVBS) [20].
2 Probabilistic Model Checking
We write {x17→ y1, . . . }to denote the function that maps all xito yi. Given a
set S, its powerset is 2S. A (discrete) probability distribution over Sis a function
µS[0,1] with countable support spt(µ)def
={sS|µ(s)>0}and
Psspt(µ)µ(s) = 1.Dist (S)is the set of all probability distributions over S. If
µ(s)Qfor all sS, we call µarational probability distribution, in Dist
Q
(S).
Markov decision processes (MDP) [25] combine the nondeterminism of Kripke
structures with the finite random choices of discrete-time Markov chains (DTMC).
Definition 1. AMarkov decision process (MDP) is a triple M=hS, sI, T i
where Sis a finite set of states with initial state sISand T:S2Dist
Q
(S)
is the transition function.T(s)must be finite and non-empty for all sS.
For sS, an element µof T(s)is a transition, and if s0spt(µ), then the
transition has a branch to successor state s0with probability µ(s0). If |T(s)|= 1
for all sS, then Mis a DTMC.
Example 1. Fig. 1shows our example MDP Mγ
n, which is actually a DTMC. It
is a simplified and parametrised version of the counterexample of Wimmer et
al. [29, Fig. 2]. It is parametrised in terms of nN(determining the number
of chained states with transitions labelled b) and γ(0,0.5) (changing some
probabilities). We draw transitions as lines to an intermediate node from which
Correct Probabilistic Model Checking with Floating-Point Arithmetic 43
sI
Mγ
n:
s0
s+
s
s1sn
···
a b c
1
2
γ
1
2γ
γ
1γ
γ
1γ
1
1
ntimes
Fig. 1. Example parametrised MDP Mγ
n
probability-labelled branches lead to successor states. We omit the intermediate
node for transitions with a single branch, and label some transitions to easily
refer to them. Mγ
nhas 4 + nstates and transitions, and 7+2nbranches.
In practice, higher-level modelling languages like Modest [14] are used to specify
MDP. The semantics of an MDP is captured by its paths. A path represents a
concrete resolution of all nondeterministic and probabilistic choices. Formally:
Definition 2. Afinite path is a sequence πfin =s0µ0s1µ1. . . µn1snwhere
siSfor all i { 0, . . . , n }and µiT(si)µi(si+1)>0for all i { 0, . . . , n
1}. Let |πfin|def
=nand last(πfin)def
=sn.Πfin (s)is the set of all finite paths starting
in s. A path is an analogous infinite sequence π, and Π(s)is the set of all paths
starting in s. We write sπif i:s=si.
A scheduler (or adversary,policy or strategy) only resolves the nondeterministic
choices of M. For this paper, memoryless deterministic schedulers suffice [4].
Definition 3. A function s:SDist(S)is a scheduler if, for all sS, we
have s(s)T(s). The set of all schedulers of Mis S(M).
We are interested in reachability probabilities. Let M|s=hS, sI, T |siwith T|s(s) =
{s(s)}be the DTMC induced by son M. Via the standard cylinder set con-
struction [10, Sect. 2.2] on M|s, a scheduler induces probability measures PM,s
s
on measurable sets of paths starting in sS.
Definition 4. For state sand goal state gS, the maximum and minimum
probability of reaching gfrom sis defined as PM,s
max (g) = supsSPM,s
s({π
Π(s)|gπ})and PM,s
min (g) = infsSPM,s
s({πΠ(s)|gπ}), respectively.
The definition extends to sets Gof goal states. We omit the superscript for M
when it is clear from the context, and if we omit that for s, then s=sI. From
now on, whenever we have an MDP with a set of goal states G, we assume w.l.o.g.
that all gGare absorbing, i.e. every gonly has one self-loop transition.
Definition 5. Amaximal end component (MEC) of Mis a maximal (sub-)MDP
hS0, T 0, s0
Iiwhere S0S,T0(s)T(s)for all sS0, and the directed graph with
vertex set S0and edge set { hs, s0i|∃µT0(s) : µ(s0)>0}is strongly connected.
44 A. Hartmanns
1function II(M=hS, sI, T i, G, opt , )
// Preprocessing
2if opt = max then M:= CollapseMECs(M , G)// collapse MECs
3S0:= Prob0(M, G, opt ),S1:= Prob1(M , G, opt)// identify 0/1 states
4l:= {s7→ 0|sS\S1} { s7→ 1|sS1}// initialise lower vector
5u:= {s7→ 0|sS0} { s7→ 1|sS\S0}// initialise upper vector
// Iteration
6while (u(sI)l(sI))/l(sI)> do // while relative error > :
7foreach sS\(S0S1)do // update non-0/1 states:
8l(s) := optµT(s)Ps0spt(µ)µ(s0)·l(s0)// iterate lower vector
9u(s) := optµT(s)Ps0spt(µ)µ(s0)·u(s0)// iterate upper vector
10 return 1
2(u(sI)l(sI))
Alg. 1: Interval iteration for probabilistic reachability
2.1 Algorithms
Interval iteration [3,5,12,13] computes reachability probabilities p(s)=Ps
opt (G),
opt { max,min }. We show the basic algorithm as Alg. 1. It iteratively refines
vectors land uthat map each state to a value in Qsuch that, at all times, we
have l(s)p(s)u(s). In each iteration, the values in land uare updated
for all relevant states (line 7) via the classic Bellman equations of value itera-
tion (lines 8-9). Their least fixpoint is p, towards which lconverges from below.
Some preprocessing is needed to ensure that the fixpoint is unique and also u
converges towards p: for maximisation, we need to collapse MECs into single
states (line 2). This can be be done via graph-based algorithms (see e.g. [7])
that only consider the graph structure of the MDP as in Definition 1but do
not perform calculations with the concrete probability values. For both max-
imisation and minimisation, we need to identify the sets S0and S1such that
sS0:p(s)=0and sS1:p(s) = S1(line 3). This can equally done
via graph-based algorithms [10, Algs. 1-4]. We then initialise land uto triv-
ial under-/overapproximations of p(lines 4-5). Iteration stops when the relative
difference between land uat sIis at most (which is often chosen as 103or
106). The corresponding check in line 6assumes that division by zero results
in +, as is the default in IEEE 754. By convergence of land utowards the
fixpoint, II terminates, and we eventually return a value ˆpwith the guarantee
that p(sI)[(1 )·ˆp, (1 + )·ˆp]. This makes II sound.
PCTL. The temporal logic PCTL [16] allows us to construct complex branching-
time properties. It takes standard CTL [2] and replaces the A(ψ)(“for all paths
ψholds”) and E(ψ)(“there exists a path for which ψholds”) operators by the
probabilistic operator Pc(ψ)for “under all schedulers, the probability of the
measurable set of paths for which ψholds is c where { <, , >, }
and c[0,1]. To model-check a PCTL formula on MDP M, we follow the
standard recursive CTL model checking algorithm [2, Sect. 6.4] except for the P
Correct Probabilistic Model Checking with Floating-Point Arithmetic 45
operator, which can be reduced to computing reachability probabilities. For the
“finally”/“eventually” case Pc(Fφ), we can directly use interval iteration: Let Sφ
be the set of states recursively determined to satisfy φ. Call II(M, Sφ,opt, )
of Alg. 1with opt= max if ∼∈{<, } and opt= min otherwise, with two
modifications: Change the stopping criterion of line 6to check the difference for
all states, and in line 10, return the set SP
def
={sS| x[l(s), u(s)]: xc}. If
sS, x [l(s), u(s)]: xc, however, we would need to either abort and report
an “unknown” situation, or continue with a reduced until we can (hopefully
eventually) decide the comparison. None of Prism,Storm, and mcsta appear
to perform this extra check, though. In this paper, we only use PCTL for non-
nested top-level P(F. . .)operators; the results are then true if sISP, should be
unknown in case the “unknown” situation applies to sI, and are false otherwise.
3 Floating-Point Arithmetic
The current implementations of II (in Prism,Storm, and mcsta) use IEEE 754
double-precision floating-point arithmetic to represent (a) the probabilities of
the MDP’s branches and (b) the values in land u. A floating-point number is
stored as a significand dand an exponent ew.r.t. to an agreed-upon base bsuch
that it represents the value d·be. We fix b= 2. IEEE 754 double precision uses
64 bits in total, of which 1 is a sign bit, 52 are for d, and 11 are for e. Standard
alternatives are 32-bit single precision (1 sign, 23 bits for d, and 8 for e) and the
80-bit x87 extended precision format (with 1 sign bit, 64 for d, and 15 for e). The
subset of Qthat can be represented in such a representation is determined by the
numbers of bits for dand e. For example, 1
2or 7
8can be represented exactly in all
formats, but 1
10 cannot. IEEE 754 prescribes that all basic operations (addition,
multiplication, etc.) are performed at “infinite precision” with the result rounded
to a representable number. The default rounding mode is to round to the nearest
such number, choosing an even value in case of ties (round to nearest, ties to
even). In single precision, 1
10 is thus by default rounded to
13421773 ·227 = 0.100000001490116119384765625.
A single rounded operation leads to an error of at most the distance between the
two nearest representable numbers. In iterative computations, however, rounding
may happen at every step. A striking example of the consequences is the failure of
an American Patriot missile battery to intercept an incoming Iraqi Scud missile
in February 1992 in Dharan, Saudi Arabia [28], which resulted in 28 fatalities.
The Patriot system calculated time in seconds by multiplying its internal clock’s
value by a rounded binary representation of 1
10 . After 100 hours of continuous
operation, this lead to a cumulative rounding error large enough to miscalculate
the incoming missile’s position by more than half a kilometre [1].
3.1 Errors in Probabilistic Model Checking
II accumulates and multiplies rounded floating-point values in the land uvectors
with potentially already-rounded values representing the rational probabilities of
46 A. Hartmanns
the model. Using the default rounding mode, how can we be sure that the final
result does not miss the true probability by more than half a kilometre, too?
Following Wimmer et al. [29], let us consider MDP Mγ
nof Fig. 1again, and
determine whether P1
2( { s+})holds. The model is acyclic, so it is easy to see
that
pdef
= Pmax( { s+}) = 1
2+γn+2 >1
2.
Let us fix n= 1 and γ= 106. Then p=1
2+ 1018. This value cannot be
represented in double precision, and is by default rounded to 0.5.
We have encoded Mγ
nin the Modest and Prism languages, and checked the
answers returned by Prism 4.7, Storm 1.6.4, and mcsta 3.1 for the property.
The correct result would be false.Prism returns true in its default configuration,
which uses an unsound algorithm, and false when requesting an algorithm with
exact rational arithmetic, for which Mγ
nis small enough. If we explicitly request
Prism to use II, then the result depends on the specified : for 1011, we get
the correct result of false; for smaller 1012 , i.e. higher precision, however, we
incorrectly get true.Storm incorrectly returns true in its default configuration
as well as when we request a sound algorithm via the --sound parameter. Only
when using an exact rational algorithm via the --exact parameter does Storm
correctly return false.mcsta, when using II (--alg IntervalIteration), in-
correctly returns true, and additionally reports that it computed [l(sI), u(sI)]
as [0.5,0.5], thus not including the true value of p. Other algorithms are not
immune to the problem, either; for example, mcsta also answers true when using
SVI, OVI, and when solving the MDP as a linear programming problem via the
Google OR Tools’ GLOP LP solver.
This example shows that using a sound algorithm does not guarantee correct
results. The problem is not specific to cases of small probabilities like γ= 106
in the MDP; we can achieve the same effect using arbitrarily higher values of γ
if we just increase na little. Such bounded try-and-retry chains where “normal”
probabilities in the model result in very small values during iteration and on
the final result are not uncommon in the systems often modelled as MDPs,
e.g. backoff schemes in communication protocols and randomised algorithms. In
general, tiny differences in probabilities in one place may result in significant
changes of the overall reachability probability; for example, in two-dimensional
random walks, the long-run behaviour when the probabilities to move forward
or backward are both 1
2is vastly different from if they are 1
2+δand 1
2δ,
respectively, for any δ > 0.
3.2 On Precision and Rounding Modes
In our concrete example, we may be able to avoid the problem by increasing
precision: In the 80-bit extended format supported by all x86-64 CPUs, 1
2+1018
is by default rounded to 5.000000000000000009... ·101, so there is a chance
of obtaining false unless other rounding during iterations would lose all the
difference. Extended precision is used for C’s long double type by e.g. the GCC
compiler; it is thus readily accessible to programmers. It is, however, the most
Correct Probabilistic Model Checking with Floating-Point Arithmetic 47
precise format supported in common CPUs today; if we need more precision,
we would have to resort to much slower software implementations using e.g.
the GNU MPFR library. Any a-priori fixed precision, however, just shifts the
problem to smaller differences, but does not eliminate it.
The more general solution that we propose in this paper is to control the
rounding mode of the floating-point operations performed in the II algorithm.
In addition to the default round to nearest, ties to even mode, the IEEE 754
standard defines three directed rounding modes: round towards zero (i.e. trun-
cation), round towards +(i.e. always round up), and round towards −∞ (i.e.
always round down). As we will explain in Sect. 4, using the latter gives us an
easy way to make the computations inside II safe, i.e. guarantee the under- and
overapproximation invariants for land u, respectively. Control of the floating-
point rounding mode however appears to be a rarely-used feature of IEEE 754
implementations; consequently the level and style of support for it in CPUs and
high-level programming languages is diverse.
3.3 CPU Support for Rounding Modes
Storm and mcsta run exclusively on x86-64 systems (with the upcoming ARM-
based systems so far only supported via their x86-64 emulation layers), while
Prism additionally supports several other platforms via manual compilation.
Thus we focus on x64-64 in this paper as the platform probabilistic model check-
ers overwhelmingly run on today.
X87 and SSE. All x64-64 CPUs support two instruction sets to perform floating-
point operations in double precision: The x87 instruction set, originating from
the 8087 floating-point coprocessor, and the SSE instruction set, which includes
support for double precision since the Pentium 4’s SSE2 extension. Both imple-
ment operations according to the IEEE 754 standard. Aside from architectural
particularities such as its stack-based approach to managing registers, the x87
instruction set notably includes support for 80-bit extended precision. In fact,
by default, it performs all calculations in that extended precision, only rounding
to double or single precision when storing values back to 64- or 32-bit memory
locations. This has the advantage of reducing the error across sequences of oper-
ations, but for high-level languages makes the results depend on the compiler’s
choices of when to load/store intermediate values in memory vs. keeping them
in x87 registers. The SSE instructions only support single and double precision.
Both the x87 and SSE instruction sets support all four rounding modes men-
tioned above. The rounding mode of operations for x87 and SSE is determined
by the current value of the x87 FPU control word stored in the x87 FPU control
register or the current value of the SSE MXCSR control register, respectively.
That is, to change rounding mode, we need to obtain the current control regis-
ter value, change the two bits determining rounding mode (with the other bits
controlling other aspects of floating-point operations such as the treatment of
NaNs), and apply the new value. This is done via the FNSTCW/FLDCW in-
struction pair on x87, and VSTMXCSR/VLDMXCSR for SSE. Rounding mode
48 A. Hartmanns
is thus part of the global (per-thread) state, and we must be careful to restore
its original configuration when returning to code that does not expect rounding
mode changes. Frequent changes of rounding mode thus incur a performance
overhead due to the extra instructions that must be executed for every change
and their effects on e.g. pipelining.
AVX-512. AVX-512 is the extension to 512 bits of the sequence of single instruc-
tion, multiple data (SIMD) instruction sets in x84-64 processors that started
with SSE. It became available for general-purpose systems in high-end desk-
top (Skylake-X) and server (Xeon) CPUs in 2017, but it took until the 10th
generation of Intel’s Core mobile CPUs in 2019 before it was more widely avail-
able in end-user systems. It is supposed to appear in AMD CPUs with the
upcoming Zen 4 architecture. Aside from its 512-bit SIMD instructions, AVX-
512 crucially also includes new instructions for single floating-point values where
the operation’s rounding mode is specified as part of the instruction itself via
the new “EVEX” encoding. Of particular note for implementing II are the new
VFMADD(r1r2r3)SD fused multiply-add instructions (the ridetermining how
the operand registers are used) that can directly be used for the sums of prod-
ucts in the Bellman equations in lines 8-9of Alg. 1. Overall, AVX-512 thus makes
rounding mode independent of global state, and may improve performance by
removing the need for extra instruction sequences to change rounding mode.
3.4 Rounding Modes in Programming Languages
Support for non-default rounding modes is lacking in most high-level program-
ming languages. Java, C#, and Python, for example, do not support them at
all. If II is implemented in such a language, there is consequently no hope for a
high-performance solution to the rounding problems described earlier.
For C and C++, the C99 and C++11 standards introduced access to the
floating-point environment. The fenv.h/cfenv headers include the fegetround
and fesetround functions to query the current rounding mode and change it,
respectively. Implementations of these functions on x86-64 read/change both the
x87 and SSE control registers accordingly. In the remainder of this paper, we fo-
cus on a C implementation, but most statements hold for C++ analogously. The
level of support for the C99 floating-point features varies significantly between
compilers; it is in particular still incomplete in Clang2and GCC [11, Further
notes]. Still, both compilers provide access to the fegetround/fesetround func-
tions (via the associated standard libraries), but GCC in particular is not round-
ing mode-aware in optimisations. This means that, for example, subexpressions
that are evaluated twice, with a change in rounding mode in between, may be
compiled by GCC into a single evaluation before the change, with the resulting
value stored in a register and reused after the rounding mode change. This can
2The documentation as of October 2021 states that C99 support in Clang “is feature-
complete except for the C99 floating-point pragmas”.
Correct Probabilistic Model Checking with Floating-Point Arithmetic 49
even happen when using the -frounding-math option3. Programmers thus need
to inspect the generated assembly to ensure that no problematic transformations
have been made, or try to make them impossible by declaring values volatile
or inserting inline assembly “barriers”.
Overall, C thus provides a standardised way to change x87/SSE rounding
mode, but programmers need to be aware of compiler quirks when using these
facilities. Support for AVX-512 instructions that include rounding mode bits in
C, on the other hand, is only slightly more convenient than programming in
assembly as we can use the intrinsics in the immintrin.h header; there is no
standard higher-level abstraction of this feature in either C or C++.
4 Correctly Rounding Interval Iteration
Let us now change II as in Alg. 1to consistently round in safe directions at
every numeric operation. Given that we can change or specify the rounding
mode of all basic floating-point operations on current hardware, we expect that
a high-performance implementation can be achieved. First, the preprocessing
steps require no changes as they are purely graph-based. The changes to the
iteration part of the algorithm are straightforward: In line 6,
while (u(sI)l(sI))/l(sI)> do . . .,
we round the results of the subtraction and of the division towards +to avoid
stopping too early. In line 8,
l(s) := optµT(s)Ps0spt (µ)µ(s0)·l(s0),
the multiplications and additions round towards −∞ while the corresponding
operations on the upper bound in line 9round towards +. Recall that all
probabilities in the MDP are rational numbers, i.e. representable as num
den with
num,den N. We assume that num and den can be represented exactly in the
implementation. Then, in line 8, we calculate the floating-point values for the
µ(s0) = num/den by rounding towards −∞. In line 9, we round the result of the
corresponding division towards +. Finally, instead of returning the middle of
the interval in line 10, we return [l(sI), u(sI)] so as not to lose any information
(e.g. in case the result is compared to a constant as in the example of Sect. 3.1).
With these changes, we obtain an interval guaranteed to contain the true
reachability probability if the algorithm terminates. However, rounding away
from the theoretical fixpoint in the updates of land umeans that we may
reach an effective fixpoint—where land uno longer change because all newly
computed values round down/up to the values from the previous iteration—at
a point where the relative difference of l(sI)and u(sI)is still above . This
will happen in practice: In QComp 2020 [6], mcsta participated in the floating-
point correct track by letting VI run until it reached a fixpoint under the default
rounding mode with double precision. In 9 of the 44 benchmark instances that
mcsta attempted to solve in this way, the difference between this fixpoint and
3The documentation as of Oct. 2021 states that -frounding-math “does not currently
guarantee to disable all GCC optimizations that are affected by rounding mode.”
50 A. Hartmanns
1function SR-SII(M=hS, sI, T i, G, opt , )
2. . . (preprocessing as in Alg. 1). . .
3repeat
4chg := false
5fesetround(towards −∞)
6foreach sS\(S0S1)do
7lnew := optµT(s)Ps0spt(µ)µ(s0)·l(s0)// iterate lower vector
8if lnew 6=l(s)then chg := true
9l(s) := lnew
10 fesetround(towards +)
11 foreach sS\(S0S1)do
12 unew := optµT(s)Ps0spt(µ)µ(s0)·u(s0)// iterate upper vector
13 if unew 6=u(s)then chg := true
14 u(s) := unew
15 until ¬chg (u(sI)l(sI))/l(sI)
16 return [l(sI), l(sI)]
Alg. 2: Safely rounding sequential interval iteration (SR-SII) for x87 or SSE
the true value was more than the specified . With safe rounding away from the
true fixpoint, this would likely have happened in even more cases.
To ensure termination, we thus need to make one further change to the II of
Alg. 1: In each iteration of the while loop, we additionally keep track of whether
any of the updates to land uchanges the previous value. If not, we end the loop
and return the current interval, which will be wider than the requested relative
difference. We refer to II with all of the these modifications as safely rounding
interleaved II (SR-III) in the remainder of this paper.
4.1 Sequential Interval Iteration
When using the x87 or SSE instruction sets to implement SR-III, we need to
insert a call to fesetround just before line 8, and another just before line 9.
If, for an MDP with nstates, we need miterations of the while loop, we will
make 2·n·mcalls to fesetround. This might significantly impact performance
for models with many states, or that need many iterations (such as the haddad-
monmege model of the QVBS, which requires 7million iterations with = 106
despite only having 41 states). As an alternative, we can rearrange the iteration
phase of II as shown in Alg. 2: We first update lfor all states (lines 6-9), then u
for all states (lines 11-14), with the rounding mode changes in between (lines 5
and 10). We call this variant of II safely rounding sequential II (SR-SII). It only
needs 2·mcalls to fesetround, which should improve its performance. However,
it also changes the memory access pattern of II with an a priori unknown effect
on performance. We write III for II to stress that it is interleaved, and SII for
Alg. 2without the safe rounding, in the remainder of this paper.
Correct Probabilistic Model Checking with Floating-Point Arithmetic 51
4.2 Implementation Aspects
We have implemented III, SII, SR-III, and SR-SII in mcsta. While mcsta is writ-
ten in C#, the new algorithms are (necessarily) written in C, called from the
main tool via the P/Invoke mechanism. We used GCC 10.3.0 to compile our
implementations on both 64-bit Linux and Windows 10. We manually inspected
the disassembly of the generated code to ensure that GCC’s optimisations did
not interfere with rounding mode changes as described in Sect. 3.4. In a sig-
nificant architectural change, we modified mcsta’s state space exploration and
representation code to preserve the exact rational values for the probabilities
specified in the model, so that safely-rounded floating-point representations for
the µ(s0)can be computed during iteration as described above.
Of each algorithm, we implemented four variants: a default one that leaves the
choice of instruction set to the compiler and uses fesetround to change round-
ing mode; an x87 variant that forces floating-point operations to use the x87
instructions by attributing the relevant functions with target("fpmath=387")
and that changes rounding mode via inline assembly using FNSTCW/FLDCW;
an SSE variant that forces the SSE instruction set via target("fpmath=sse")
and uses VSTMXCSR/VLDMXCSR in inline assembly for rounding mode chan-
ges; and an AVX-512 variant that implements all floating-point operations re-
quiring non-default rounding modes via AVX-512 intrinsics, in particular using
_mm_fmadd_round_sd in the Bellman equations. All variants use double pre-
cision; default and SSE additionally have a single-precision version (which we
omit for x87 since the reduced precision does not speed up the operations we
use); and x87 also provides an 80-bit extended-precision version (however we
currently return its results as safely-rounded double-precision values due to the
unavailability of a long double equivalent in C#, which limits its use outside of
performance testing for now). All in all, we thus provide 28 variants of interval
iteration for comparison, out of which 14 provide guaranteed correct results.
In particular, the safe rounding makes PMC feasible at 32-bit single precision,
which would otherwise be too likely to produce incorrect results. While we expect
that this may deliver many results with low precision (but which are correct) due
to a rounded fixpoint being reached long before the relative width reaches , it
also halves the memory needed to store land u, and may speed up computations.
At the opposite end, mcsta is now also the first PMC tool that can use 80-bit
extended precision, which however doubles the memory needed for land usince
80-bit long double values occupy 16 bytes in memory (with GCC).
5 Experiments
Using our implementation in mcsta, we first tested all variants of the algorithms
on Mγ
nin the setting of Sect. 3.1. As expected, and validating the correctness of
the approach and its implementation, all SR variants return unknown.
We then assembled a set of 31 benchmark instances—combinations of a
model, values for its configurable parameters, and a property to check—from
52 A. Hartmanns
the QVBS covering DTMC, MDP, and probabilistic timed automata (PTA) [24]
transformed to MDP by mcsta using the digital clocks approach [23]. These are
all the models and probabilistic reachability probabilities from the QVBS sup-
ported by mcsta for which the result was not 0or 1(then it can be computed via
graph-based algorithms) and for which a parameter configuration was available
where PMC terminated within our timeout of 120 s but II needed enough time for
it to be measured reliably ('0.2s). We checked each of these benchmarks with
all 28 variants of our algorithms using = 106on different x86-64 systems:
I11w: an Intel Core i5-1135G7 (up to 4.2GHz) laptop running Windows 10,
this being the only system we had access to with AVX-512 support; AMDw:
an AMD Ryzen 9 5900X (3.7-4.8GHz) workstation running Windows 10, repre-
senting current AMD CPUs in our evaluation; I4x: an Intel Core i7-4790 (3.6-
4.0GHz) workstation running Ubuntu Linux 18.04, representing older-generation
Intel desktop hardware; and IPx: an Intel Pentium Silver J5005 (1.5-2.8GHz)
compact PC running Ubuntu Linux 18.04, representing a non-Core low-power
Intel system. We show a selection of our experimental results in the remainder
of this section, mainly from I11w and AMDw. We remark on cases where the
other systems (all with Intel CPUs) showed different patterns from I11w.
We present results graphically as scatter plots like in Fig. 2. Each such plot
compares two algorithm variants in terms of runtime for the iteration phase of the
algorithm only (i.e. we exclude the time for state space exploration and prepro-
cessing). Every point hx, yicorresponds to a benchmark instance and indicates
that the variant noted on the x-axis took xseconds to solve this instance while
the one noted on the y-axis took yseconds. Thus points above the solid diagonal
line correspond to instances where the x-axis method was faster; points above
(below) the upper (lower) dotted diagonal line are where the x-axis method took
less than half (more than twice) as long.
Fig. 2first shows the performance impact of enabling safe rounding for the
standard interleaved algorithm using double precision. The top row shows the
behaviour on I11w. We see that runtime is drastically longer in the default variant
that uses fesetround, but only increases by a factor of around 2 if we use
the specific inline assembly instructions. We note that GCC includes the code
for fesetround in the generated .dll file on Windows, but in contrast to the
assembly methods does not inline it into the callers. Some of the difference
may thus be function call overhead. The middle row shows the behaviour on
AMDw. Here, default is affected just as badly, but the effect on SEE is worse
while that on x87 is much lower than on the Intel I11w system. In the bottom
row, we show the impact on default on the Linux systems (bottom left and
bottom middle), which is much lower than on Windows. This is despite GCC
implementing fesetround as an external library call here. The overhead still
markedly differs between the two Intel CPUs, though. Finally, as expected, we
see on the bottom right than safe rounding has almost no performance impact
when using the AVX-512 instructions.
Seeing the significant impact enabling safe rounding can have, we next show
what the sequential algorithm brings to the table, in Fig. 3. On the top left, we
Correct Probabilistic Model Checking with Floating-Point Arithmetic 53
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
TO
64
III (I11w, default)
SR-III
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (I11w, SSE)
SR-III
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (I11w, x87)
SR-III
DTMC MDP PTA
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
TO
64
III (AMDw, default)
SR-III
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (AMDw, SSE)
SR-III
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (AMDw, x87)
SR-III
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (I4x, default)
SR-III
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
TO
64
III (IPx, default)
SR-III
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (I11w, AVX-512)
SR-III
Fig. 2. Performance impact of safe rounding across instruction sets and systems
compare the base algorithms without safe rounding, where SII takes up to twice
as long in the worst case. This is likely due to the more cache-friendly memory
access pattern of III: we store land uinterleaved for III, so it always operates
on two adjacent values at a time. The bottom-left plot confirms that reducing
the number of rounding mode changes reduces the overhead of safe rounding to
essentially zero. The remaining four plots show the differences between SR-III
and SR-SII. In all cases except x87 on AMDw, SR-III is slower. We thus have
that III is fastest but unsafe, SII and SR-SII are equally fast but the latter is
safe, and SR-III is safe but tends to be slower on the Intel systems. On the AMD
system, SR-III surprisingly wins over SR-SII with x87, highlighting that the x87
instruction set in Ryzen 3 must be implemented very differently from SSE.
54 A. Hartmanns
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (I11w, SSE)
SII
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (I11w, SSE)
SR-SII
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (I11w, x87)
SR-SII
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SII (I11w, SSE)
SR-SII
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (AMDw, SSE)
SR-SII
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (AMDw, x87)
SR-SII
Fig. 3. Performance of interleaved compared to sequential II
We further investigate the impact of the instruction set in Fig. 4. Confirming
the patterns we saw so far, SSE is slightly faster than x87 on I11w (and we see
similar behaviour on the other Intel systems) but slower by a factor of more
than 2 on the AMD CPU. The rightmost plot highlights that AVX-512 is the
fastest alternative on the most recent Intel CPUs, which may in part be due to
the availability of the fused multiply-add instruction that fits II so well.
All results so far were for double-precision computations. To conclude our
evaluation, we show in Fig. 5that reducing to single precision does not bring
the expected performance benefits. We see in the leftmost plot that the overhead
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (I11w, SSE)
SR-III (x87)
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (AMDw, SSE)
SR-III (x87)
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (I11w, SSE)
SR-III (AVX-512)
Fig. 4. Performance with different instruction sets
Correct Probabilistic Model Checking with Floating-Point Arithmetic 55
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
III (SSE, single)
SR-III (single)
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (SSE, single)
SR-III (double)
0.21 2 4 8 16
0.2
0.5
1
2
4
8
16
32
64
64
SR-III (x87, double)
SR-III (ext.)
Fig. 5. Performance with different precision settings (on I11w)
of safe rounding has a much higher variance compared to Fig. 2. The detailed tool
outputs hint at the reason being that rounding away from the fixpoint occurs in
much larger steps with single precision, which significantly slows down or stops
the convergence in several instances. The middle plot shows that, aside from the
slowly converging outliers, using single precision does not provide a speedup over
using doubles. Finally, on the right, we show that the impact of enabling 80-bit
extended precision on x87 is minimal.
6 Conclusion
There has been ample research into sound PMC algorithms over the past years,
but the problem of errors introduced by naive implementations using default
floating-point rounding has been all but ignored. We showed that a solution ex-
ists that, while perhaps conceptually simple, faces a number of implementation
and performance obstacles. In particular, hardware support for rounding modes
is arguably essential to achieve acceptable performance, but difficult to use from
C/C++ and impossible to access from most other programming languages. We
extensively explored the space of implementation variants, highlighting that per-
formance crucially depends on the combination of the variant, the CPU, and the
operating system. Nevertheless, our results show that truly correct PMC is pos-
sible today at a small cost in performance, which should all but disappear as
AVX-512 is more widely adopted. With our implementation in mcsta, we provide
the first PMC tool that combines fast, scalable, and correct.
Acknowledgments. This work was triggered by Masahide Kashiwagi’s excellent
overview of the different ways to change rounding mode as used by his kv library
for verified numerical computations [21]. The author thanks Anke and Ursula
Hartmanns for contributing to the diversity of hardware on which the experi-
ments were performed by providing access to the AMDw and I11w systems.
Data availability. A dataset to replicate the experimental evaluation, including
the exact versions of the tools and models used, is archived and available at DOI
10.4121/19074047 [17].
56 A. Hartmanns
References
1. Arnold, D.N.: Some disasters attributable to bad numerical computing: The Patriot
missile failure (2000), https://www-users.cse.umn.edu/~arnold/disasters/patriot.
html, last accessed 2021-10-14.
2. Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)
3. Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the re-
liability of your model checker: Interval iteration for markov decision processes.
In: Majumdar, R., Kuncak, V. (eds.) 29th International Conference on Computer
Aided Verification (CAV). Lecture Notes in Computer Science, vol. 10426, pp.
160–180. Springer (2017). https://doi.org/10.1007/978-3-319-63387-9_8
4. Bianco, A., de Alfaro, L.: Model checking of probabalistic and nondeterministic
systems. In: 15th Conference on Foundations of Software Technology and Theoret-
ical Computer Science (FSTTCS). Lecture Notes in Computer Science, vol. 1026,
pp. 499–513. Springer (1995). https://doi.org/10.1007/3-540-60692-0_70
5. Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska,
M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes us-
ing learning algorithms. In: Cassez, F., Raskin, J.F. (eds.) 12th International
Symposium on Automated Technology for Verification and Analysis (ATVA).
Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014).
https://doi.org/10.1007/978-3-319-11936-6_8
6. Budde, C.E., Hartmanns, A., Klauck, M., Kretínský, J., Parker, D., Quatmann,
T., Turrini, A., Zhang, Z.: On correctness, precision, and performance in quanti-
tative verification QComp 2020 competition report. In: Margaria, T., Steffen, B.
(eds.) 9th International Symposium on Leveraging Applications of Formal Meth-
ods (ISoLA). Lecture Notes in Computer Science, vol. 12479, pp. 216–241. Springer
(2020). https://doi.org/10.1007/978-3-030-83723-5_15
7. Chatterjee, K., Henzinger, M.: Faster and dynamic algorithms for maxi-
mal end-component decomposition and related graph problems in proba-
bilistic verification. In: Randall, D. (ed.) Twenty-Second Annual ACM-SIAM
Symposium on Discrete Algorithms (SODA). pp. 1318–1336. SIAM (2011).
https://doi.org/10.1137/1.9781611973082.101
8. Chatterjee, K., Henzinger, T.A.: Value iteration. In: Grumberg, O., Veith,
H. (eds.) 25 Years of Model Checking - History, Achievements, Perspectives.
Lecture Notes in Computer Science, vol. 5000, pp. 107–138. Springer (2008).
https://doi.org/10.1007/978-3-540-69850-0_7
9. Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A Storm is coming: A modern prob-
abilistic model checker. In: Majumdar, R., Kuncak, V. (eds.) 29th International
Conference on Computer Aided Verification (CAV). Lecture Notes in Computer
Science, vol. 10427, pp. 592–600. Springer (2017). https://doi.org/10.1007/978-3-
319-63390-9_31
10. Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: Automated verification
techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) 11th In-
ternational School on Formal Methods for the Design of Computer, Communication
and Software Systems (SFM). Lecture Notes in Computer Science, vol. 6659, pp.
53–113. Springer (2011). https://doi.org/10.1007/978-3-642-21455-4_3
11. Free Software Foundation: Status of C99 features in GCC (2021), https://gcc.gnu.
org/c99status.html, as accessed on 2021-10-14.
12. Haddad, S., Monmege, B.: Reachability in MDPs: Refining convergence of value it-
eration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) 8th International Workshop
Correct Probabilistic Model Checking with Floating-Point Arithmetic 57
on Reachability Problems (RP). Lecture Notes in Computer Science, vol. 8762, pp.
125–137. Springer (2014). https://doi.org/10.1007/978-3-319-11439-2_10
13. Haddad, S., Monmege, B.: Interval iteration algorithm for
MDPs and IMDPs. Theor. Comput. Sci. 735, 111–131 (2018).
https://doi.org/10.1016/j.tcs.2016.12.003
14. Hahn, E.M., Hartmanns, A., Hermanns, H., Katoen, J.P.: A compositional mod-
elling and analysis framework for stochastic hybrid systems. Formal Methods Syst.
Des. 43(2), 191–232 (2013). https://doi.org/10.1007/s10703-012-0167-z
15. Hahn, E.M., Li, Y., Schewe, S., Turrini, A., Zhang, L.: iscasMc: A web-based
probabilistic model checker. In: Jones, C.B., Pihlajasaari, P., Sun, J. (eds.) 19th
International Symposium on Formal Methods (FM). Lecture Notes in Computer
Science, vol. 8442, pp. 312–317. Springer (2014). https://doi.org/10.1007/978-3-
319-06410-9_22
16. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal
Aspects Comput. 6(5), 512–535 (1994). https://doi.org/10.1007/BF01211866
17. Hartmanns, A.: Correct probabilistic model checking with floating-
point arithmetic (artifact). 4TU.Centre for Research Data (2022).
https://doi.org/10.4121/19074047
18. Hartmanns, A., Hermanns, H.: The Modest Toolset: An integrated environment
for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.)
20th International Conference on Tools and Algorithms for the Construction and
Analysis of Systems (TACAS). Lecture Notes in Computer Science, vol. 8413, pp.
593–598. Springer (2014). https://doi.org/10.1007/978-3-642-54862-8_51
19. Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: Lahiri, S.K., Wang,
C. (eds.) 32nd International Conference on Computer Aided Verification (CAV).
Lecture Notes in Computer Science, vol. 12225, pp. 488–511. Springer (2020).
https://doi.org/10.1007/978-3-030-53291-8_26
20. Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantita-
tive verification benchmark set. In: Vojnar, T., Zhang, L. (eds.) 25th International
Conference on Tools and Algorithms for the Construction and Analysis of Systems
(TACAS). Lecture Notes in Computer Science, vol. 11427, pp. 344–350. Springer
(2019). https://doi.org/10.1007/978-3-030-17462-0_20
21. Kashiwagi, M.: kv a C++ library for verified numerical computation, http://
verifiedby.me/kv/index-e.html, last accessed 2021-10-13.
22. Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilis-
tic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) 23rd International
Conference on Computer Aided Verification (CAV). Lecture Notes in Computer
Science, vol. 6806, pp. 585–591. Springer (2011). https://doi.org/10.1007/978-3-
642-22110-1_47
23. Kwiatkowska, M.Z., Norman, G., Parker, D., Sproston, J.: Performance analysis
of probabilistic timed automata using digital clocks. Formal Methods Syst. Des.
29(1), 33–78 (2006). https://doi.org/10.1007/s10703-006-0005-2
24. Kwiatkowska, M.Z., Norman, G., Segala, R., Sproston, J.: Automatic verification
of real-time systems with discrete probability distributions. Theor. Comput. Sci.
282(1), 101–150 (2002). https://doi.org/10.1016/S0304-3975(01)00046-9
25. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic
Programming. Wiley Series in Probability and Statistics, Wiley (1994).
https://doi.org/10.1002/9780470316887
26. Quatmann, T., Katoen, J.P.: Sound value iteration. In: Chockler, H., Weis-
senbacher, G. (eds.) 30th International Conference on Computer Aided Verifica-
58 A. Hartmanns
tion (CAV). Lecture Notes in Computer Science, vol. 10981, pp. 643–661. Springer
(2018). https://doi.org/10.1007/978-3-319-96145-3_37
27. Teige, T., Fränzle, M.: Constraint-based analysis of probabilistic hybrid systems.
In: Giua, A., Mahulea, C., Silva, M., Zaytoon, J. (eds.) 3rd IFAC Conference
on Analysis and Design of Hybrid Systems (ADHS). IFAC Proceedings Vol-
umes, vol. 42, pp. 162–167. Elsevier (2009). https://doi.org/10.3182/20090916-3-
ES-3003.00029
28. United States General Accounting Office: Software problem led to system failure
at Dhahran, Saudi Arabia. Report GAO/IMTEC-92-26 (February 1992), https:
//www-users.cse.umn.edu/~arnold/disasters/GAO-IMTEC-92-96.pdf
29. Wimmer, R., Kortus, A., Herbstritt, M., Becker, B.: Probabilistic model checking
and reliability of results. In: Straube, B., Drutarovský, M., Renovell, M., Gramata,
P., Fischerová, M. (eds.) 11th IEEE Workshop on Design & Diagnostics of Elec-
tronic Circuits & Systems (DDECS). pp. 207–212. IEEE Computer Society (2008).
https://doi.org/10.1109/DDECS.2008.4538787
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
Correct Probabilistic Model Checking with Floating-Point Arithmetic 59
Correlated Equilibria and Fairness in
Concurrent Stochastic Games
Marta Kwiatkowska1, Gethin Norman2, David Parker3, and Gabriel
Santos1()
1Department of Computer Science, University of Oxford, Oxford, UK
{marta.kwiatkowska,gabriel.santos}@cs.ox.ac.uk
2School of Computing Science, University of Glasgow, Glasgow, UK
gethin.norman@glasgow.ac.uk
3School of Computer Science, University of Birmingham, Birmingham, UK
d.a.parker@cs.bham.ac.uk
Abstract. Game-theoretic techniques and equilibria analysis facilitate
the design and verication of competitive systems. While algorithmic
complexity of equilibria computation has been extensively studied, prac-
tical implementation and application of game-theoretic methods is more
recent. Tools such as PRISM-games support automated verication and
synthesis of zero-sum and (ε-optimal subgame-perfect) social welfare
Nash equilibria properties for concurrent stochastic games. However,
these methods become inecient as the number of agents grows and may
also generate equilibria that yield signicant variations in the outcomes
for individual agents. We extend the functionality of PRISM-games to
support correlated equilibria, in which players can coordinate through
public signals, and introduce a novel optimality criterion of social fair-
ness, which can be applied to both Nash and correlated equilibria. We
show that correlated equilibria are easier to compute, are more equitable,
and can also improve joint outcomes. We implement algorithms for both
normal form games and the more complex case of multi-player concur-
rent stochastic games with temporal logic specications. On a range of
case studies, we demonstrate the benets of our methods.
1 Introduction
Game-theoretic verication techniques can support the modelling and design of
systems that comprise multiple agents operating in either a cooperative or com-
petitive manner. In many cases, to eectively analyse these systems we also need
to adopt a probabilistic approach to modelling, for example because agents oper-
ate in uncertain environments, use faulty hardware or unreliable communication
mechanisms, or explicitly employ randomisation for coordination.
In these cases, probabilistic model checking provides a convenient unied
framework for both formally modelling probabilistic multi-agent systems and
specifying their required behaviour. In recent years, progress has been made in
this direction for several models, including turn-based and concurrent stochastic
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 60–78, 2022.
https://doi.org/10.1007/978-3-030-99527-0_4
games (TSGs and CSGs), and for multiple temporal logics, such as rPATL [10]
and its extensions [24]. Tool support has been developed, in the form of PRISM-
games [22], and successfully applied to case studies across a broad range of areas.
Initially, the focus was on zero-sum specications [24], which can be natural
for systems whose participants have directly opposing goals, such as the defender
and attacker in a security protocol minimising or maximising the probability of
a successful attack, respectively. However, agents often have objectives that are
distinct but not directly opposing, and may also want to cooperate to achieve
these objectives. Examples include network protocols and multi-robot systems.
For these purposes, Nash equilibria (NE) have also been integrated into prob-
abilistic model checking of CSGs [24], together with social welfare (SW) opti-
mality criterion, resulting in social welfare Nash equilibria (SWNE). An SWNE
comprises a strategy for each player in the game where no player has an incen-
tive to deviate unilaterally from their strategy and the sum of the individual
objectives over all players is maximised.
One key limitation of SWNE, however, is that, as these techniques are ex-
tended to support larger numbers of players [21], the eciency and scalability
of synthesising SWNE is signicantly reduced. In addition, simply aiming to
maximise the sum of individual objectives may not produce the best perform-
ing equilibrium, either collectively or individually; for example, they can oer
higher gains for specic players, reducing the incentive of the other players to
collaborate and instead motivating them to deviate from the equilibrium.
In this paper, we adopt a dierent approach and introduce, for the rst time
within formal verication, both social fairness as an optimality criterion and
correlated equilibria, and the insights required to make these usable in practical
applications. Social fairness (SF) is particularly novel, as it is inspired by similar
concepts used in economics and distinct from the fairness notions employed in
verication. Correlated equilibria (CE) [3], in which players are able to coordi-
nate through public signals, are easier to compute than NE and can yield better
outcomes. Social fairness, which minimises the dierences between the objectives
of individual players, can be considered for both CE and NE.
We rst investigate these concepts for the simpler case of normal form games,
illustrating their dierences and benets. We then extend the approach to the
more powerful modelling formalism of CSGs and extend the temporal logic
rPATL to formally specify agent objectives. We present algorithms to synthesise
equilibria, using linear programming to nd CE and a combination of back-
wards induction or value iteration for CSGs. We implement our approach in
the PRISM-games tool [22] and demonstrate signicant gains in computation
time and that quantiably more fair and useful strategies can by synthesised
for a range of application domains. An extended version of this paper, with the
complete model checking algorithm, is available [23].
Related work. Nash equilibria have been considered for concurrent systems
in [18], where a temporal logic is proposed whose key operator is a novel path
quantier which asserts that a property holds on all Nash equilibrium computa-
tions of the system. There is no stochasticity and correlated equilibria are not
Correlated Equilibria and Fairness in Concurrent Stochastic Games 61
considered. In [2], a probabilistic logic that can express equilibria is formulated,
along with complexity results, but no implementation has been provided.
The notion of fairness studied here is inspired by fairness of equilibria from
economics [33,34] and aims to minimise the dierence between the payos, as
opposed to maximising the lowest payo among the players in an NE [25]. Our
notion of fairness can be thought of as a constraint applied to equilibria strate-
gies, similar in style to social welfare, and used to select certain equilibria based
on optimality. This is distinct from fairness used in verication of concurrent
processes, where (strong) fairness refers to a property stating that, whenever a
process is enabled innitely often, it is executed innitely often. This notion is
typically dened as a constraint on innite execution paths expressible in logics
LTL and CTL* and needed to prove liveness properties. For probabilistic models,
verication under fairness constraints has been formulated for Markov decision
processes and the logic PCTL* [5,4]. For games on graphs, fairness conditions
expressed as ω-regular winning conditions can be used to synthesise reactive
processes [8]. Algorithms for strong transition fairness for ω-regular games have
been recently studied in [6]. Both qualitative and quantitative approaches have
been considered for verication under fairness constraints, but no equilibria.
2 Normal Form Games
We start by considering normal form games (NFGs), then dene our equilibria
concepts for these games, present algorithms and an implementation for com-
puting them, and nally summarise some experimental results.
We rst require the following notation. Let Dist(X)denote the set of prob-
ability distributions over set X. For any vector vRn, we use v(i)to refer
to the ith entry of the vector. For any tuple x= (x1, . . . , xn)Xn, element
xXand in, we dene the tuples xi
def
= (x1, . . . , xi1, xi+1, . . . , xn)and
xi[x]def
= (x1, . . . , xi1, x, xi+1, . . . , xn).
Denition 1 (Normal form game). A (nite, n-person) normal form game
(NFG) is a tuple N= (N, A, u)where: N={1, . . . , n}is a nite set of players;
A=A1× · · · ×Anand Aiis a nite set of actions available to player iN;
u= (u1, . . . , un)and ui:ARis a utility function for player iN.
We x an NFG N= (N, A, u)for the remainder of this section. In a play of N,
each player iNchooses an action from the set Aiat the same time. If each
player ichooses ai, then the utility received by player jequals uj(a1, . . . , an).
We next dene the strategies for players of Nand strategy proles comprising
a strategy for each player. We also dene correlated proles, which allow the
players to coordinate their choices through a (probabilistic) public signal.
Denition 2 (Strategy and prole). Astrategy σifor player iis an element
of Σi=Dist(Ai)and a strategy prole σis an element of ΣN=Σ1× · · · ×Σn.
For strategy σiof player i, the support is the set of actions {aiAi|σi(ai)>0}
and the support of a prole is the product of the supports of the strategies.
62 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
Denition 3 (Correlated prole). Acorrelated prole is a tuple (τ , ς)com-
prising τDist(D), where D=D1× · · · ×Dn,Diis a nite set of signals for
player i, and ς= (ς1, . . . , ςn), where ςi:DiAi.
For a correlated prole (τ, ς ), the public signal τis a joint distribution over
signals Difor each player isuch that, if player ireceives the signal diDi, then
it chooses action ςi(di). We can consider any correlated prole (τ, ς )as a joint
strategy, i.e., a distribution over A1× · · · ×Anwhere:
(τ, ς )(a1, . . . , an) = {τ(d1, . . . , dn)|diDiς(di) = aifor all iN}.
Conversely, any joint strategy τDist(A1× · · · ×An)can be considered as a
correlated prole (τ, ς )where Di=Aiand ςiis the identity function for iN.
Any strategy prole σcan be mapped to an equivalent correlated prole (in
which τis the joint distribution σ1× · · · ×σnand ςiis the identity function). On
the other hand, there are correlated proles with no equivalent strategy prole.
Under prole σand correlated prole (τ, ς )the expected utilities of player iare:
ui(σ)def
=(a1,...,an)Aui(a1, . . . , an)·(n
j=1 σj(aj))
ui(τ, ς )def
=(d1,...,dn)Dτ(d1, . . . , dn)·ui(ς1(d1), . . . , ςn(dn)) .
Example 1. Consider the two-player NFG where Ai={ai
1, ai
2}and a corre-
lated prole corresponding to the joint distribution τDist(A1×A2)where
τ(a1
1, a1
2) = τ(a2
1, a2
2) = 0.5. Under this correlated prole the players share a fair
coin and both choose their rst action if the coin is heads and their second action
otherwise. This has no equivalent strategy prole.
Optimal equilibria of NFGs. We now introduce the notions of Nash equilib-
rium [27] and correlated equilibrium [3], as well as dierent denitions of opti-
mality for these equilibria: social welfare and social fairness. Using the notation
introduced above for tuples, for any prole σand strategy σ
i, the strategy tuple
σicorresponds to σwith the strategy of player iremoved and σi[σ
i]to the
prole σafter replacing player i’s strategy with σ
i.
Denition 4 (Best response). For a prole σand correlated prole (τ , ς), a
best response for player ito σiand (τ, ςi)are, respectively:
a strategy σ
ifor player isuch that ui(σi[σ
i]) ui(σi[σi]) for all σiΣi;
a function ς
i:DiAifor player isuch that ui(τ, ςi[ς
i]) ui(τ, ςi[ςi])
for all functions ςi:DiAi.
Denition 5 (NE and CE). A strategy prole σis a Nash equilibrium (NE)
and a correlated prole (τ, ς )is a correlated equilibrium (CE) if:
σ
iis a best response to σ
ifor all iN;
ς
iis a best response to (τ, ς
i)for all iN;
respectively. We denote by ΣNand ΣCthe set of NE and CE, respectively.
Correlated Equilibria and Fairness in Concurrent Stochastic Games 63
c1
c2
c3
α u1(α)u2(α)u3(α)
(pro1,pro2,pro3)1000 1000 100
(pro1,pro2,yld3)1000 100 5
(pro1,yld2,pro3) 5 55
(pro1,yld2,yld3) 5 55
(yld1,pro2,pro3)51000 100
(yld1,pro2,yld3)5 5 5
(yld1,yld2,pro3)55 5
(yld1,yld2,yld3)10 10 10
Fig. 1: Example: Cars at an intersection and the corresponding NFG.
Any NE of Nis also a CE, while there can exist CEs that cannot be represented
by a strategy prole and therefore are not NEs. For each class of equilibria,
NE and CE, we introduce two optimality criteria, the rst maximising social
welfare (SW), dened as the sum of the utilities, and the second maximising
social fairness (SF), which minimises the dierence between the players’ utilities.
Other variants of fairness have been considered for NE, such as in [25], where
the authors seek to maximise the lowest utility among the players.
Denition 6 (SW and SF). An equilibrium σis a social welfare (SW) equi-
librium if the sum of the utilities of the players under σis maximal over all
equilibria, while σis a social fair (SF) equilibrium if the dierence between the
player’s utilities under σis minimised over all equilibria.
We can also dene the dual concept of cost equilibria [24], where players try to
minimise, rather than maximise, their expected utilities by considering equilibria
of the game N= (N, A, u)in which the utilities of Nare negated.
Example 2. Consider the scenario, based on an example from [32], where three
cars meet at an intersection and want to proceed as indicated by the arrows
in Figure 1. Each car can either proceed or yield. If two cars with intersecting
paths proceed, then there is an accident. If an accident occurs, the car having
the right of way, i.e., the other car is to its right, has a utility of 100 and the
car that should yield has a utility of 1000. If a car proceeds without causing an
accident, then its utility is 5and the cars that yield have a utility of 5. If all
cars yield, then, since this delays all cars, all have utility 10. The 3-player NFG
is given in Figure 1. Considering the dierent optimal equilibria of the NFG:
the SWNE and SWCE are the same: for c2to yield and c1and c3to proceed,
with the expected utilities (5,5,5);
the SFNE is for c1to yield with probability 1,c2to yield with probability
0.863636 and c3to yield with probability 0.985148, with the expected utilities
(9.254050,9.925742,9.318182);
the SFCE gives a joint distribution where the probability of c2yielding and
of c1and c3yielding are both 0.5with the expected utilities (0,0,0).
Modifying u2such that u2(pro1,pro2,pro3) = 4.5to, e.g., represent a reckless
driver, the SWNE becomes for c1and c3to yield and c2to proceed with the
expected utilities (5,5,5), while the SWCE is still for c2to yield and c1and
c3to proceed. The SFNE and SFCE also do not change.
64 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
Algorithms for computing equilibria. Before we give our algorithm to com-
pute correlated equilibria, we briey describe the approach of [21,24] for Nash
equilibria computation that this paper builds upon. Finding NE in two-player
NFGs is in the class of linear complementarity problems (LCPs) and we follow
the algorithm presented in [24], which reduces the problem to SMT via labelled
polytopes [28] by considering the regions of the strategy prole space, itera-
tively reducing the search space as positive probability assignments are found
and added as restrictions on this space. To nd SWNE and SFNE, we can enu-
merate all NE and then nd the optimal NE.
When there are more than two players, computing NE values becomes a more
complex task, as nding NE within a given support no longer reduces to a linear
programming (LP) problem. In [21] we presented an algorithm using support
enumeration [31], which exhaustively examines all sub-regions, i.e., supports,
of the strategy prole space, one at a time, checking whether that sub-region
contains NEs. For each support, nding SWNE can be reduced to a nonlinear
programming problem [21]. This nonlinear programming problem can be modied
to nd SFNE in each support, similarly to how the LP problem for SWCEs is
modied to nd SFCEs below.
In the case of CE we can rst nd a joint strategy for the players, i.e.,
a distribution over the action tuples, which, as explained above, can then be
mapped to a correlated prole. A SWCE can be found by solving the following
LP problem. Maximise: iNαAui(α)·pαsubject to:
αiAi(ui(αi[ai]) ui(αi[a
i])) ·pαi[ai]0(1)
0pα1(2)
αApα= 1 (3)
for all iN,αA,ai, a
iAi,αiAiwhere Ai
def
={αi|αA}.
The variables pαrepresent the probability of the joint strategy corresponding
to the correlated prole selecting the action-tuple α. The above LP has |A|
variables, one for each action-tuple, and iN(|Ai|2|Ai|) + |A|+1 constraints.
Computation of SFCE can be reduced to the following optimisation problem.
Minimise pmax pmin subject to: (1), (2) and (3) together with:
pi=αApα·ui(α)(4)
(mNpipm)(pmax =pi)(5)
(mNpipm)(pmin =pi)(6)
for all iN,m=i,αA,aj, alAi,αiAi. Again, the variables pαin
the program represent the probability of the players playing the joint action α.
The constraint (4) requires pito equal the utility of player i. The constraints
(5) and (6) set pmax and pmin as the maximum and minimum values within the
utilities of the players, respectively. Given we use the constraints (1), (2) and
(3), we start with the same number of variables and constraints as needed to
compute SWCEs and incur an additional |N|+2 variables and 3·|N|constraints.
Correlated Equilibria and Fairness in Concurrent Stochastic Games 65
Game Players |Ai| |A|NE CE
Supports SW SW SF
Majority voting
games
2
4 16 225 0.07 0.02 0.08
6 36 3,969 0.1 0.02 0.1
8 64 65,025 0.4 0.03 0.3
10 100 1,046,529 5.8 0.07 0.7
33 27 343 1.2 0.07 0.1
4 81 3,375 25.8 0.08 0.3
Covariant
games
33 27 343 8.7 0.08 1.7
4 81 3,375 598.5 0.08 2.9
82 256 6,561 TO 0.3 TO
3 6,561 5,764,801 TO 22.8 TO
10 2 1,024 59,049 TO 1.2 TO
Table 1: Times (s) for synthesis of equilibria in NFGs (timeout 30 mins).
Implementation. To nd SWNE or SFNE of two-player NFGs, we adopt a
similar approach to [24], using labelled polytopes to characterise and nd NE
values through a reduction to SMT in both Z3 [13] and Yices [14]. As an op-
timised precomputation step, when possible we also search for and lter out
dominated strategies, which speeds up the computation and reduces solver calls.
For NFGs with more than two players, solving the nonlinear programming
problem based on support enumeration has been implemented in [21] using a
combination of the SMT solver Z3 [13] and the nonlinear optimisation suite
Ipopt [38]. To mitigate the ineciencies of an SMT solver for such problems,
we used Z3 to lter out unsatisable support assignments with a timeout and
then Ipopt is called to nd SWNE values using an interior-point lter line-search
algorithm [39]. To speed up the overall computation, the support assignments are
analysed in parallel. Computing SFNE increases the complexity of the nonlinear
program and, due to the ineciency in this approach [21], we have not extended
the implementation to compute SFNE.
As shown above, computing SWCE for NFGs reduces to solving an LP, and
we implement this using either the optimisation solver Gurobi [17] or the SMT
solver Z3 [13]. In the case of SFCE, the constraints (5) and (6) include impli-
cations, and therefore the problem does not reduce directly to an LP. When
using Z3, we can encode these constraints directly as it supports assertions that
combine inequalities with logical implications, a feature that linear solvers such
as Gurobi do not have. Section 5discusses implementing SFCE computation in
Gurobi. Both solvers support the specication of lower priority or soft objectives,
which makes it possible to have a consistent ordering for the players’ payos in
cases where multiple equilibria exist.
Eciency and scalability. Table 1presents experimental results for solving
a selection of NFGs randomly generated with GAMUT [29], using Gurobi for
SWCE and NE of two-player NFGs, Z3 for SFCE and both Ipopt and Z3 for
NFGs of more than two players, and running on a 2.10GHz Intel Xeon Gold with
32GB of JVM memory. For each instance, Table 1lists the number of players,
actions for each player, joint actions and supports that need to be enumerated
when nding NE, as well as the time to nd SWNEs, SWCEs and SFCEs (the
time for nding SFNEs of two-player games is the same as for SWNEs). As the
results demonstrate, due to a simpler problem being solved and the fact that we
66 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
do not need to enumerate the solutions, computing CEs scales far better than
NEs as the number of players and actions increases. Finding NEs in games with
more than two players is particularly hard as the constraints are nonlinear. We
also see that SFCE computation is slower than SWCE, which is caused by the
additional variables and constraints required when nding SFCE and using Z3
rather than Gurobi for the solver.
3 Concurrent Stochastic Games
We now further develop our approach to support concurrent stochastic games
(CSGs) [36], in which players repeatedly make simultaneous action choices that
cause the game’s state to be updated probabilistically. We extend the previously
introduced denitions of optimal equilibria to such games, focusing on subgame-
perfect equilibria, which are equilibria in every state of a CSG. We then present
algorithms to reason about and synthesise such equilibria.
Denition 7 (Concurrent stochastic game). Aconcurrent stochastic multi-
player game (CSG) is a tuple G= (N, S, ¯
S, A, ∆, δ, AP ,L)where:
N={1, . . . , n}is a nite set of players;
Sis a nite set of states and ¯
SSis a set of initial states;
A= (A1 {⊥})× · · · ×(An {⊥})and Aiis a nite set of actions available
to player iNand is an idle action disjoint from the set n
i=1Ai;
:S2n
i=1Aiis an action assignment function;
δ: (S×A)Dist(S)is a (partial) probabilistic transition function;
AP is a set of atomic propositions and L:S2AP is a labelling function.
For the remainder of this section we x a CSG Gas in Denition 7. The game
Gstarts in one of its initial states ¯s¯
Sand, supposing Gis in a state s, then
each player iof Gchooses an action from the set that are available, dened
as Ai(s)def
=(s)Aiif (s)Aiis non-empty and Ai(s)def
={⊥} otherwise.
Supposing each player chooses ai, then the game transitions to state swith
probability δ(s, (a1, . . . , an)). To enable quantitative analysis of Gwe augment it
with reward structures, which are tuples r=(rA, rS)of an action reward function
rA:S×ARand state reward function rS:SR.
Apath of Gis a sequence π=s0
α0
s1
α1
· · · where skS,αk=
(ak
1, . . . , ak
n)A,ak
iAi(sk)for iNand δ(sk, αk)(sk+1)>0for all k
0. We denote by FPathsG,s and IPaths G,s the sets of nite and innite paths
starting in state sof Grespectively and drop the subscript swhen considering
all nite and innite paths of G. As for NFGs, we can dene strategies of G
that resolve the choices of the players. Here, a strategy for player iis a function
σi:FPathsGDist(Ai {⊥})such that, if σi(π)(ai)>0, then aiAi(last(π))
where last(π)is the nal state of π. Furthermore, we can dene strategy proles,
correlated proles and joint strategies analogously to Denitions 2and 3.
Correlated Equilibria and Fairness in Concurrent Stochastic Games 67
The utility of a player iof Gis dened by a random variable Xi:IPathsGR
over innite paths. For a prole4σand state s, using standard techniques [20],
we can construct a probability measure Prob σ
G,s over the paths with initial state s
corresponding to σ, denoted IPathsσ
G,s and the expected value Eσ
G,s(Xi)of player
i’s utility from sunder σ. Given utilities X1, . . . , Xnfor all the players of G, we
can then dene NE and CE (see Denition 5) as well as the restricted classes of
SW and SF equilibria as for NFGs (see Denition 6). Following [24,21], we focus
on subgame-perfect equilibria [30], which are equilibria in every state of G.
Nonzero-sum properties. As in [24] (for two-player CSGs) and [21] (for n-
player CSGs) we can specify equilibria-based properties using temporal logic.
For simplicity, we restrict attention to nonzero-sum properties without nesting,
allowing for the specication of NE and CE against either SW or SF optimality.
Denition 8 (Nonzero-sum specications). The syntax of nonzero-sum spec-
ications θfor CSGs is given by the grammar:
ϕ:=C(1, 2)optx(θ)
θ:=P[ψ]+· · ·+P[ψ]|Rr[ρ]+· · ·+Rr[ρ]
ψ:=Xa|aUka|aUa
ρ:=I=k|Ck|Fa
where C=C1:· · · :Cm,C1, . . . , Cmare coalitions of players such that CiCj=
for all 1i=jmand m
i=1Ci=N,(1, 2) {ne,ce}×{sw,sf},opt
{min,max}, {<, ,, >},xQ,ris a reward structure, kNand ais
an atomic proposition.
The nonzero-sum formulae of Denition 8extend the logic of in [24,21] in that
we can now specify the type of equilibria, NE or CE, and optimality criteria, SW
or SF. A probabilistic formula C1:· · ·:Cm(1, 2)maxx(P[ψ1]+· · ·+P[ψm]) is
true in a state if, when the players form the coalitions C1, . . . , Cm, there is a
subgame-perfect equilibrium of type 1meeting the optimality criterion 2for
which the sum of the values of the objectives P[ψ1], . . . , P[ψm]for the coalitions
C1, . . . , Cmsatises x. The objective ψiof coalition Ciis either a next (Xa),
bounded until (a1Uka2) or until (a1Ua2) formula, with the usual equivalences,
e.g., Fatrue U a.
For a reward formula C1:· · ·:Cm(1, 2)optx(Rr1[ρ1]+· · ·+Rrm[ρm]) the
meaning is similar; however, here the objective of coalition Cirefers to a re-
ward formula ρiwith respect to reward structure riand this formula is either
a bounded instantaneous reward (I=k), bounded accumulated reward (Ck) or
reachability reward (Fa).
For formulae of the form C1:· · ·:Cm(1, 2)minx(θ), the dual notions of
cost equilibria are considered. We also allow numerical queries of the form
C1:· · ·:Cm(1, 2)opt=?(θ), which return the sum of the optimal subgame-
perfect equilibrium’s values.
4We can also construct such a probability measure and expected value given a corre-
lated prole or joint strategy.
68 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
Model checking nonzero-sum specications. Similarly to [24,21], to allow
model checking of nonzero-sum properties we consider a restricted class of CSGs.
We make the following assumption, which can be checked using graph algorithms
with time complexity quadratic in the size of the state space [1].
Assumption 1. For each subformula P[a1Ua2], a state labelled ¬a1a2is
reached with probability 1 from all states under all strategy proles and correlated
proles. For each subformula Rr[Fa], a state labelled ais reached with probability
1 from all states under all strategy proles and correlated proles.
We now show how to compute the optimal values of a nonzero-sum formula
ϕ=C1:· · · :Cm(1, 2)optx(θ)when opt = max. The case when opt = min
can be computed by negating all utilities and maximising.
The model checking algorithm broadly follows those presented in [24,21], with
the dierences described below. The problem is reduced to solving an m-player
coalition game GCwhere C={C1, . . . , Cm}and the choices of each player iin GC
correspond to the choices of the players in coalition Ciin G. Formally, we have
the following denition in which, without loss of generality, we assume Cis of
the form {{1, . . . , n1},{n1+1, . . . n2}, . . . , {nm1+1, . . . nm}} and let jCdenote
player j’s position in its coalition.
Denition 9 (Coalition game). For CSG G= (N, S, ¯
S, A, ∆, δ, AP ,L)and
partition C={C1, . . . , Cm}of the players into mcoalitions, we dene the coali-
tion game GC= ({1, . . . , m}, S, ¯
S, AC, C, δ C,AP,L)as an m-player CSG where:
AC= (AC
1 {⊥})× · · · ×(AC
m {⊥});
AC
i= (jCi(Aj {⊥})\ {(, . . . , )})for all 1im;
for any sSand 1im:aC
iC(s)if and only if either (s)Aj=
and aC
i(jC) = or aC
i(jC)(s)for all jCi;
for any sSand (aC
1, . . . , aC
m)AC:δC(s, (aC
1, . . . , aC
m)) = δ(s, (a1, . . . , an))
where for iMand jCiif aC
i=, then aj=and otherwise aj=aC
i(jC).
If all the objectives in θare nite-horizon, backward induction [35,27] can be ap-
plied to compute (precise) optimal equilibria values with respect to the criterion
2and equilibria type 1. On the other hand, if all the objectives are innite-
horizon, value iteration [9] can be used to approximate optimal equilibria values
and, when there is a combination of objectives, the game under study is modied
in a standard manner to make all objectives innite-horizon.
Backward induction and value iteration over the CSG GCboth work by iter-
atively computing new values for each state sof GC. The values for each state,
in each iteration, are found by computing optimal equilibria values of an NFG N
whose utility function is derived from the outgoing transition probabilities from
sin the CSG and the values computed for successor states of sin the previous
iteration. The dierence here, with respect to [21], is that the NFGs are solved
for the additional equilibria and optimality conditions considered in this paper,
which we compute using the algorithms presented in Section 2.
Algorithm for probabilistic until. Because of space limitations, we only
present here the details of value iteration for (unbounded) probabilistic until, i.e.,
Correlated Equilibria and Fairness in Concurrent Stochastic Games 69
for ϕ=C1:· · · :Cm(1, 2)maxx(θ)where θ=P[a1
1Ua1
2]+ · · · +P[am
1Uam
2].
The complete model checking algorithm can be found in [23].
Following [21], we use VGC(s, 1, 2, θ, n)to denote the vector of computed
values, at iteration n, in state sof GCfor optimality criterion 2(SW or SF),
equilibria type 1(NE or CE) and (until) objectives θ. We also use 1mand 0m
to denote a vector of size mwhose entries all equal to 1 or 0, respectively. For
any set of states S, atomic proposition aand state swe let ηS(s)equal 1if
sSand 0otherwise, and ηa(s)equal 1if aL(s)and 0otherwise.
Each step of value iteration also keeps track of two sets D, E M, where
M={1, . . . , m}are the players of GC. We use Dfor the subset of players that
have already reached their goal (by satisfying ai
2) and Efor the players who
can no longer can satisfy their goal (having reached a state that fails to satisfy
ai
1). It can then be ensured that their payos no longer change and are set to 1
or 0, respectively. In these cases, we eectively consider a modied game where,
although the payos for these players are set, we still need to take their strategies
into account in order to guarantee an optimal equilibrium.
Optimal values for all states sin the CSG GCcan be computed as the follow-
ing limit: VGC(s, 1, 2, θ) = limn→∞ VGC(s, 1, 2, θ, n), where VGC(s, 1, 2, θ, n) =
VGC(s, 1, 2,,, θ, n)and, for any D, E Msuch that DE=:
VGC(s, 1, 2, D, E, θ, n) =
(ηD(1), . . . , ηD(m)) if DE=M
(ηa1
2(s), . . . , ηam
2(s)) else if n= 0
VGC(s, 1, 2, D D, E, θ, n)else if D=
VGC(s, 1, 2, D, E E, θ, n)else if E=
val(N, 1, 2)otherwise
where D={lM\(DE)|al
2L(s)},E={lM\(DE)|al
1∈
L(s)and sL(al
2)}and val(N, 1, 2)equals optimal values of the NFG N=
(M, AC, u)with respect to the criterion 2and of equilibria type 1in which for
any 1lmand αAC:
ul(α) =
1if lD
0else if lE
sSδC(s, α)(s)·vs,l
n1otherwise
and (vs,1
n1, vs,2
n1, . . . , vs,m
n1) = VGC(s, 1, 2, D, E , θ, n1) for all sS.
Since this paper considers equilibria for any number of coalitions (in par-
ticular, for more than two), the above follows the algorithm of [21] in the way
that it keeps track of the coalitions that have satised their objective (D) or can
no longer do so (E). By contrast the CSG algorithm of [24] was limited to two
coalitions, which enabled the exploitation of ecient MDP analysis techniques
for such coalitions. As explained in [21], in such a scenario we cannot reduce the
analysis from an n-coalition game to an (n1)-coalition game, as otherwise we
would give one of the remaining coalitions additional power (the action choices
of the coalition that has satised their objective or can no longer do so), which
would therefore give this coalition an advantage over the other coalitions.
70 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
Strategy synthesis. As in [24,21] we can extend the model checking algorithm
to perform strategy synthesis, generating a witness (i.e., a prole or joint strat-
egy) representing the corresponding optimal equilibrium. This is achieved by
storing the prole or joint strategy for the NFG solved in each state. Both the
proles and joint strategies require nite memory and are probabilistic. Memory
is required as choices change after a path formula becomes true or a target is
reached and to keep track of the step bound in nite-horizon properties. Ran-
domisation is required for both NE and CE of NFGs.
Correctness and complexity. The correctness of the algorithm follows directly
from [24,21], as changing the class of equilibria or optimality criterion does not
change the proof. The complexity of the algorithm is linear in the formula size
and value iteration requires nding optimal NE or CE for an NFG in each state
of the model. Computing NEs of an NFG with two (or more) players is PPAD-
complete [12,11], while nding optimal CEs of an NFG is in P [15].
4 Case Studies and Experimental Results
We have developed an implementation of our techniques for equilibria synthe-
sis on CSGs, described above, building on top of the PRISM-games [22] model
checker. Our implementation extends the tool’s existing support for construction
and analysis of CSGs, which is contained within its sparse matrix based “explicit”
engine written in Java. We have considered a range of CSG case studies (supple-
mentary material can be found at [40]). Below, we summarise the eciency and
scalability of our approach, again running on a 2.10GHz Intel Xeon Gold with
32GB JVM memory, and then describe our ndings on individual case studies.
Eciency and scalability. Table 2summarises the performance of our imple-
mentation on the case studies that we have considered. It shows the statistics for
each CSG, and the time taken to build it and perform equilibria synthesis, for
several dierent variants (NE vs. CE, SW vs. SF). Comparing the eciency of
synthesising SWNE and SWCE, we see that the latter is typically much faster.
For two-player NE, the social fairness variant is no more expensive to compute as
we enumerate all NEs. For CE, which uses Z3 rather than Gurobi for nding SF,
we note that, although Z3 is able to nd optimal equilibria, it is not primarily
developed as an optimisation suite, and therefore generally performs poorly in
comparison with Gurobi. The benets of the social fair equilibria, in terms of
the values yielded for individual players, are discussed in the in-depth coverage
of the dierent case studies below.
Aloha. In this case study, introduced in [24], a number of users try to send
packets using the slotted Aloha protocol. We suppose that each user has one
packet to send and, in a time slot, if kusers try and send their packet, then
the probability that each packet is successfully sent is q/k where q[0,1]. If a
user fails to send a packet, then the number of slots it waits before resending
the packet is set according to Aloha’s exponential backo scheme. The scheme
requires that each user maintains a backo counter, which it increases each time
Correlated Equilibria and Fairness in Concurrent Stochastic Games 71
Case study & property Players 1,⋆2Param. CSG statistics Constr. Verif.
[parameters] values States Trans. time(s) time (s)
Aloha
(1,⋆2)min =? (Rtime [Fsi])
[bmax ,q]
2
ne,sw
4,0.8 2,778 6,285 0.1
2.2
ce,sw 2.1
ne,sf 2.1
ce,sf 23.3
3ce,sw 4,0.8 107,799 355,734 3.0 80.1
ce,sf 114.6
4ne,sw 2,0.8 68,689 161,904 1.9 1042.9
ce,sw 58.8
Aloha
(1,⋆2)max =? (Pmax =?[FsitD])
[bmax ,q,D]
4ne,sw 2,0.8,8 159,892 388,133 3.9 1027.5
ce,sw 224.5
5ce,sw 2,0.8,8 1,797,742 5,236,655 54.5 4,936.8
ce,sf TO
Power control
(1,⋆2)max =? (Rr[Fei])
[powmax , emax , qfail ]
2
ne,sw
8,40,0.2 32,812 260,924 1.2
564.5
ne,sf 566.3
ce,sw 177.9
3ce,sw 5,15,0.2 42,156 740,758 3.5 147.0
ce,sf TO
Public good
(1,⋆2)max =? (Rc[I=rmax ])
[f, rmax ]
3ne,sw 2.5,3 16,202 35,884 0.8 27.5
ce,sw 1.9
4ne,sw 3,3 391,961 923,401 13.0 71.9
ce,sw 35.3
5ce,sw 4,2 59,294 118,342 3.1 5.2
Investors
(1,⋆2)max =? (Rprof [Fcini])
[pbar,months ]
2ce,sw 0.2,8 71,731 315,804 2.4 47.5
ce,sf 2,401.9
3ce,sw 0.2,5 83,081 462,920 3.6 79.3
ce,sf 861.2
Table 2: Statistics for a set of CSG verication instances (timeout 2 hours).
there is a packet failure (up to bmax) and, if the counter equals kand a failure
occurs, randomly chooses the slots to wait from {0,1, . . . , 2k1}.
We suppose that the objective of each user is to minimise the expected
time to send their packet, which is represented by the nonzero-sum formula
usr 1:· · · :usr m(1, 2)min=?(Rtime [Fs1]+· · ·+Rtime [Fsm]). Synthesising opti-
mal strategies for this specication, we nd that the cases for SWNE and SWCE
coincide (although SWCE returns a joint strategy for the players, this joint strat-
egy can be separated to form a strategy prole). This prole requires one user
to try and send rst, and then for the remaining users to take turns to try and
send afterwards. If a user fails to send, then they enter backo and allow all
remaining users to try and send before trying to send again. There is no gain to
a user in trying to send at the same time as another, as this will increase the
probability of a sending failure, and therefore the user having to spend time in
backo before getting to try again. For SFNE, which has only been implemented
for the two-player case, the two users follow identical strategies, which involve
randomly deciding whether to wait or transmit, unless they are the only user
that has not transmitted, and then they always try to send when not in backo.
In the case of SFCE, users can employ a shared probabilistic signal to coordinate
which user sends next. Initially, this is a uniform choice over the users, but as
time progresses the signal favours the users with lower backo counters as these
users have had fewer opportunities to send their packet previously.
In Figure 2we have plotted the optimal values for the players, where SWi
correspond to the optimal values (expected times to send their packets) for player
72 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
0.4 0.6 0.8 1
1
2
3
4
5
q
Expected time
two users
SFNEi
SW1
SW2
SFCEi
0.4 0.6 0.8 1
1
2
3
4
5
6
7
8
9
q
Expected time
three users
SW1
SW2
SW3
SFi
0.4 0.6 0.8 1
2
3
4
5
6
7
8
9
10
11
q
Expected time
four users
SW1
SW2
SW3
SW4
SFi
Fig. 2: Aloha: usr 1:· · · :usr m(1, 2)min=?(Rtime [Fs1]+· · ·+Rtime [Fsm])
ifor both SWNE and SWCE for the cases of two, three and four users. We see
that the optimal values for the dierent users under SFNE and SFCE coincide,
while under SWNE and SWCE they are dierent for each user (with the user
sending rst having the lowest and the user sending last the highest). Comparing
the sum of the SWNE (and SWCE) values and that of the SFCE values, we see
a small decrease in the sum of less than 2% of the total, while for SFNE there
is a greater dierence as the players cannot coordinate, and hence try and send
at the same time.
Power control. This case study is based on a model of power control in cel-
lular networks from [7]. In the network there are a number of users that each
have a mobile phone. The phones emit signals that the users can strengthen by
increasing the phone’s power level up to a bound (pow max ). A stronger signal
can improve transmission quality, but uses more energy and lowers the qual-
ity of the transmissions of other phones due to interference. We use the ex-
tended model from [22], which adds a probability of failure (qfail ) when a power
level is increased and assumes each phone has a limited battery capacity (emax).
There is a reward structure associated with each phone representing transmis-
sion quality, which is dependent on both the phone’s power level and the power
levels of other phones due to interference. We consider the nonzero-sum prop-
erty p1:· · ·:pm(1, 2)max=? (Rr1[Fe1]+· · ·+Rrm[Fem]), where each user tries
to maximise their expected reward before their phone’s battery is depleted.
In Figure 3we have presented the expected rewards of the players under
the synthesised SWCE and SFCE joint strategies. When performing strategy
synthesis, in the case of two users the SWNE and SWCE yield the same prole
in which, when the users’ batteries are almost depleted, one user tries to increase
their phone’s power level and, if successful, in the next step, the second user then
tries to increase their phone’s power level. Since the rst user’s phone battery
is depleted when the second tries to increase, this increase does not cause any
interference. On the other hand, if the rst user fails to increase their power
level, then both users increase their battery levels. For the SFCE, the users
can coordinate and ip a coin as to which user goes rst: as demonstrated by
Figure 3this yields equal rewards for the users, unlike the SWCE. In the case of
three users, the SWNE and SWCE dier (we were only able to synthesise SWNE
for powmax = 2 as for larger values the computation had not completed within
Correlated Equilibria and Fairness in Concurrent Stochastic Games 73
2345
1,700
1,800
1,900
2,000
powmax
Rewards
two users
SW1
SW2
FRCEi
FRNEi
2345
1,700
1,800
1,900
2,000
2,100
powmax
Rewards
three users
SWCE1
SWCE2
SWCE3
FRCEi
Fig. 3: Power control: p1:· · ·:pm(1, 2)max=? (Rr1[Fe1) ]+· · ·+Rrm[Fem])
1.5 2 2.5 3
0
25
50
75
100
125
f
Expected Capital
three player
p1(SWCE)
p2(SWCE)
p3(SWCE)
pi(FRCE)
p1(SWNE)
p2(SWNE)
p3(SWNE)
1.5 2 2.5 3
0
50
100
150
200
250
300
350
400
f
Expected Capital (sum)
three player
SWNE
SWCE
SFCE
Fig. 4: Public good: p1:· · · :pm(1, 2)max=? (Rc1[I=rmax ]+· · ·+Rcm[I=rmax ])
the timeout), again users take turns to try and increase their phone’s power
level. However, here if the users are unsuccessful the SWCE can coordinate as to
which user goes next trying to increase their phone’s battery level. Through this
coordination, the users’ rewards can be increased as the battery level of at most
one phone increases at a time, which limits interference. On the other hand, for
the SWNE users must decide independently whether to increase their phone’s
battery level and they each randomly decide whether to do so or not.
Public good. We next consider a variant of a public good game [19], based
on the one presented in [22] for the two-player case. In this game a number
of players each receive an initial amount of capital (einit) and, in each of rmax
months, can invest none, half or all of their current capital. The total invested
by the players in a month is multiplied by a factor fand distributed equally
among the players before the start of the next month. The aim of the play-
ers is to maximise their expected capital which is represented by the formula:
p1:· · · :pm(1, 2)max=?(Rc1[I=rmax ]+· · ·+Rcm[I=rmax ]).
Figure 4plots, for the three-player model, both the expected capital of indi-
vidual players and the total expected capital after three months for the SWNE,
SWCE and SFNE as the parameter fvaries. As the results demonstrate the play-
ers benet, both as individuals and as a population, by coordinating through a
correlated strategy. In addition, under the SFCE, all players receive the same
expected capital with only a small decrease in the sum from that of the SWCE.
Investors. The nal case study concerns a concurrent multi-player version of
futures market investor model of [26], in which a number of investors (the players)
74 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
23456789
4.4
4.6
4.8
5
5.2
5.4
Number of months
Expected reward
two player CE (solid) & NE (dashed)
SW1
SW2
SFi
2345678
1.5
1.75
2
2.25
2.5
3
Number of months
Expected reward
three player SW (solid) & SF (dashed)
CE1
CE2
CE3
Fig. 5: Investors: inv 1:· · ·:invm(1, 2)max=? (Rpf 1[Fcin1]+· · ·+Rpf m[Fcinm])
interact with a probabilistic stock market. In successive months, the investors
choose whether to invest, wait or cash in their shares, while at the same time the
market decides with probability pbar to bar each investor, with the restriction
that an investor cannot be barred two months in a row or in the rst month,
and then the values of shares and cap on values are updated probabilistically.
We consider both two- and three-player models, where each investor tries to
maximise its individual prot represented by the following nonzero-sum prop-
erty: inv 1:· · ·:invm(1, 2)max=?(Rpf 1[Fcin1]+· · ·+Rpf m[Fcinm]). In Figure 5
we have plotted the dierent optimal values for NE and CE of the two-player
game and the dierent optimal values for CE of the three-player game (the
computation of NE values timed out for the three player case). As the results
demonstrate, again we see that the coordination that CEs oer can improve the
returns of the players and that, although considering social fairness does decrease
the returns of some players, this is limited, particularly for CEs.
5 Conclusions
We have presented novel techniques for game-theoretic verication of proba-
bilistic multi-agent systems, focusing on correlated equilibria and a notion of
social fairness. We began with the simpler case of normal form games and then
extended this to concurrent stochastic games, and used temporal logic to for-
mally specify equilibria. We proposed algorithms for equilibrium synthesis, im-
plemented them and illustrated their benets, in terms of eciency and fairness,
on case studies from a range of application domains.
Future work includes exploring the use of further game-theoretic topics within
this area, such as techniques for mechanism design or other concepts such as
Stackelberg equilibria. We plan to implement SFCE computation in Gurobi using
the big-M method [16] to encode implications and techniques from [37] to encode
conjunctions, which should yield a signicant speed-up in their computation.
Acknowledgements. This project was funded by the ERC under the European
Union’s Horizon 2020 research and innovation programme (FUN2MODEL, grant
agreement No. 834115).
Correlated Equilibria and Fairness in Concurrent Stochastic Games 75
References
1. de Alfaro, L.: Formal Verication of Probabilistic Systems. Ph.D. thesis, Stanford
University (1997)
2. Aminof, B., Kwiatkowska, M., B. Maubert, B., Murano, A., Rubin, S.: Probabilistic
strategy logic. In: Proc. IJCAI’19. pp. 32–38 (2019)
3. Aumann, R.: Subjectivity and correlation in randomized strategies. Journal of
Mathematical Economics 1(1), 67–96 (1974)
4. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press (2008)
5. Baier, C., Kwiatkowska, M.: Model checking for a probabilistic branching time
logic with fairness. Distributed Computing 11(3), 125–155 (1998)
6. Banerjee, T., Majumdar, R., Mallik, K., Schmuck, A.K., Soudjani, S.: Fast symbolic
algorithms for omega-regular games under strong transition fairness. Tech. Rep.
MPI-SWS-2020-007r, Max Planck Institute (2021)
7. Brenguier, R.: PRALINE: A tool for computing Nash equilibria in concurrent
games. In: Sharygina, N., Veith, H. (eds.) Proc. CAV’13. LNCS, vol. 8044, pp.
890–895. Springer (2013), lsv.fr/Software/praline/
8. Chatterjee, K., Fijalkow, N.: A reduction from parity games to simple stochastic
games. EPTCS 54, 74–86 (2011)
9. Chatterjee, K., Henzinger, T.: Value iteration. In: 25 Years of Model Checking.
LNCS, vol. 5000, pp. 107–138. Springer (2008)
10. Chen, T., Forejt, V., Kwiatkowska, M., Parker, D., Simaitis, A.: Automatic veri-
cation of competitive stochastic systems. Formal Methods in System Design 43(1),
61–92 (2013)
11. Chen, X., Deng, X., Teng, S.H.: Settling the complexity of computing two-player
Nash equilibria. J. ACM 56(3) (2009)
12. Daskalakis, C., Goldberg, P., Papadimitriou, C.: The complexity of computing a
Nash equilibrium. Communications of the ACM 52(2), 89–97 (2009)
13. De Moura, L., Bjørner, N.: Z3: An ecient SMT solver. In: Proc. TACAS’08.
LNCS, vol. 4963, pp. 337–340. Springer (2008), github.com/Z3Prover/z3
14. Dutertre, B.: Yices2.2. In: Biere, A., Bloem, R. (eds.) Proc CAV’14. LNCS,
vol. 8559, pp. 737–744. Springer (2014), yices.csl.sri.com
15. Gilboa, I., Zemel, E.: Nash and correlated equilibria: Some complexity considera-
tions. Games and Economic Behavior 1(1), 80–93 (1989)
16. Griva, I., Nash, S., Sofer, A.: Linear and Nonlinear Optimization: Second Edition.
CUP (01 2009)
17. Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2021),
www.gurobi.com
18. Gutierrez, J., Harrenstein, P., Wooldridge, M.J.: Reasoning about equilibria in
game-like concurrent systems. In: Proc. 14th International Conference on Princi-
ples of Knowledge Representation and Reasoning (KR’14) (2014)
19. Hauser, O., Hilbe, C., Chatterjee, K., Nowak, M.: Social dilemmas among unequals.
Nature 572, 524–527 (2019)
20. Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains. Springer (1976)
21. Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Multi-player equilibria ver-
ication for concurrent stochastic games. In: Gribaudo, M., Jansen, D., Remke, A.
(eds.) Proc. QEST’20. LNCS, Springer (2020)
22. Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: PRISM-games 3.0: Stochas-
tic game verication with concurrency, equilibria and time. In: Proc. CAV’20. pp.
475–487. LNCS, Springer (2020)
76 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
23. Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Correlated equilibria and
fairness in concurrent stochastic games (2022), arXiv:2201.09702
24. Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Automatic verication of
concurrent stochastic systems. Formal Methods in System Design pp. 1–63 (2021)
25. Littman, M., Ravi, N., Talwar, A., Zinkevich, M.: An ecient optimal-equilibrium
algorithm for two-player game trees. In: Proc. UAI’06. pp. 298–305. AUAI Press
(2006)
26. McIver, A., Morgan, C.: Results on the quantitative mu-calculus qMu. ACM Trans.
Computational Logic 8(1) (2007)
27. von Neumann, J., Morgenstern, O., Kuhn, H., Rubinstein, A.: Theory of Games
and Economic Behavior. Princeton University Press (1944)
28. Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.: Algorithmic Game Theory.
CUP (2007)
29. Nudelman, E., Wortman, J., Shoham, Y., Leyton-Brown, K.: Run the GAMUT:
A comprehensive approach to evaluating game-theoretic algorithms. In: Proc. AA-
MAS’04. pp. 880–887. ACM (2004), gamut.stanford.edu
30. Osborne, M., Rubinstein, A.: An Introduction to Game Theory. OUP (2004)
31. Porter, R., Nudelman, E., Shoham, Y.: Simple search methods for nding a Nash
equilibrium. In: Proc. AAAI’04. pp. 664–669. AAAI Press (2004)
32. Prisner, E.: Game Theory Through Examples. Mathematical Association of Amer-
ica, 1 edn. (2014)
33. Rabin, M.: Incorporating fairness into game theory and economics. The American
Economic Review 83(5), 1281–1302 (1993)
34. Rabin, M.: Fairness in repeated games. working paper 97–252, University of Cali-
fornia at Berkeley (1997)
35. Schwalbe, U., Walker, P.: Zermelo and the early history of game theory. Games
and Economic Behavior 34(1), 123–137 (2001)
36. Shapley, L.: Stochastic games. PNAS 39, 1095–1100 (1953)
37. Stevens, S., Palocsay, S.: Teaching use of binary variables in integer linear pro-
grams: Formulating logical conditions. INFORMS Transactions on Education
18(1), 28–36 (2017)
38. Wächter, A.: Short tutorial: Getting started with ipopt in 90 minutes. In: Com-
binatorial Scientic Computing. No. 09061 in Dagstuhl Seminar Proceedings,
Leibniz-Zentrum für Informatik (2009), github.com/coin-or/Ipopt
39. Wächter, A., Biegler, L.: On the implementation of an interior-point lter line-
search algorithm for large-scale nonlinear programming. Mathematical Program-
ming 106(1), 25–57 (2006)
40. Supporting material, www.prismmodelchecker.org/les/tacas22equ/
Correlated Equilibria and Fairness in Concurrent Stochastic Games 77
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
78 Marta Kwiatkowska, Gethin Norman, David Parker, Gabriel Santos
Omega Automata
A Direct Symbolic Algorithm for Solving
Stochastic Rabin Games
Tamajit Banerjee1, Rupak Ma jumdar2, Kaushik Mallik2,
Anne-Kathrin Schmuck2, and Sadegh Soudjani3
1IIT Delhi, New Delhi, India
2MPI-SWS, Kaiserslautern, Germany
3Newcastle University, Newcastle upon Tyne, UK
Abstract. We consider turn-based stochastic 2-player games on graphs
with ω-regular winning conditions. We provide a direct symbolic algo-
rithm for solving such games when the winning condition is formulated
as a Rabin condition. For a stochastic Rabin game with kpairs over a
game graph with nvertices, our algorithm runs in O(nk+2k!) symbolic
steps, which improves the state of the art.
We have implemented our symbolic algorithm, along with performance
optimizations including parallellization and acceleration, in a BDD-based
synthesis tool called Fairsyn. We demonstrate the superiority of Fairsyn
compared to the state of the art on a set of synthetic benchmarks derived
from the VLTS benchmark suite and on a control system benchmark from
the literature. In our experiments, Fairsyn performed significantly faster
with up to two orders of magnitude improvement in computation time.
1 Introduction
Symbolic algorithms for 2-player graph games are at the heart of many prob-
lems in the automatic synthesis of correct-by-construction hardware, software,
and cyber-physical systems from logical specifications. The problem has a
rich pedigree, going back to Church [10] and a sequence of seminal results
[6,31,17,30,13,14,34,21]. A chain of reductions can be used to reduce the syn-
thesis problem for ω-regular specifications to finding winning strategies in
2-player games on graphs, for which (symbolic) algorithms are known (see, e.g.,
[29,14,34,27]). These algorithms form the basis for algorithmic reactive synthesis.
For systems under uncertainty, it is also essential to capture non-determinism
quantitatively using probability distributions [5,18,22,25]. Turn-based stochas-
tic 2-player games [3,9], also known as 21/2-player games, generalize 2-player
graph games with an additional category of “random” vertices: Whenever the
game reaches a random vertex, a random process picks one of the outgoing
edges according to a probability distribution. The qualitative winning problem
asks whether a vertex of the game graph is almost surely winning for Player 0.
Stochastic Rabin games were studied by Chatterjee et al. [7], who showed that
the problem is NP-complete and that winning strategies can be restricted to
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 81–98, 2022.
https://doi.org/10.1007/978-3-030-99527-0_5
V
1
.
1
A
r
t
i
f
a
c
t
s
A
v
a
i
l
a
b
l
e
be pure (non-randomized) and memoryless. Moreover, they showed a reduc-
tion from qualitative winning in an n-vertex k-pair stochastic Rabin game to
an O(n(k+ 1))-vertex (k+ 1)-pair (deterministic) Rabin game, resulting in an
O(n(k+ 1))k+2(k+ 1)!algorithm. In contrast, we provide a direct O(nk+2k!)
symbolic algorithm for the problem.
Our new direct symbolic algorithm is obtained in the following way. We
replace the probabilistic transitions with transitions of the environment con-
strained by extreme fairness as described by Pnueli [28]. Extreme fairness is
specified via a special set of Player 1 vertices, called live vertices. A run is ex-
tremely fair if whenever a live vertex is visited infinitely often, every outgoing
edge from this vertex is taken infinitely often. As our first contribution, we show
that to solve a qualitative stochastic Rabin game, we can equivalently solve a
(deterministic) Rabin game over the same game graph by interpreting random
vertices of the stochastic game as live vertices.
As our second contribution we prove a direct symbolic algorithm to solve
(deterministic) Rabin games with live vertices, which we call extremely fair ad-
versarial Rabin games. In particular, we show a surprisingly simple syntactic
transformation that modifies well-known symbolic fixpoint algorithm for solving
2-player Rabin games on graphs (without live vertices), such that the modified
fixpoint solves the extremely fair adversarial version of the game.
To appreciate the simplicity of our modification, let us consider the well-
known fixpoint algorithms for uchi and co-B¨uchi games—particular classes of
Rabin games—given by the following µ-calculus formula:
uchi: νY. µX. (GCpre(Y)) (Cpre(X)) ,
Co-B¨uchi: µX. νY. (GCpre(X)) (Cpre(Y)) .
where Cpre(·) denotes the controllable predecessor operator and Gdenotes the
set of goal states that should be visited recurrently. In the presence of strong
transition fairness, the new algorithm becomes
uchi: νY. µX. (GCpre(Y)) (Apre(Y , X)) ,
Co-B¨uchi: νW. µX. ν Y. (GApre(W, X)) (Cpre(Y)) .
The only syntactic change (highlighted in blue) we make is to substitute the
controllable predecessor for the µvariable Xby a new almost sure predecessor
operator Apre(Y , X) incorporating also the previous νvariable Y; if the fixpoint
starts with a µvariable (with no previous νvariable), like for co-B¨uchi games,
we introduce one additional νvariable in the front. For the general class of
Rabin specifications, with a more involved fixpoint and with arbitrarily high
nesting depth depending on the number of Rabin pairs, we need to perform this
substitution for every such Cpre(·) operator for every µvariable.
We prove the correctness of this syntactic fixpoint transformation for solv-
ing Rabin games [31,27] in this paper. It can be shown that the same syntactic
transformation may be used to obtain fixpoint algorithms for qualitative solution
of stochastic games with other popular ω-regular objectives, namely Reachabil-
ity, Safety, (generalized) uchi, (generalized) co-B¨uchi, Rabin-chain, parity, and
82 T. Banerjee et al.
GR(1). Owing to page constraints, these additional fixpoints are only discussed
in the extended version [4] of this paper, where we also generalize all results
presented in this paper to a weaker notion of fairness, called transition fairness.
In a nutshell, these results show that one can solve games with live vertices
while retaining the algorithmic characteristics and implementability of known
symbolic fixpoint algorithms that do not consider fairness assumptions.
We have implemented our symbolic algorithm for solving stochastic Rabin
games in a symbolic BDD-based reactive synthesis tool called Fairsyn.Fairsyn
additionally uses parallellization and a fixpoint acceleration technique [23] to
boost performance. We evaluate our tool on two case studies, one using synthetic
benchmarks derived from the VLTS benchmark suite [15] and the other from
controller synthesis for stochastic control systems [12]. We show that Fairsyn
scales well on these case studies, and outperforms the state-of-the-art methods
by up to two orders of magnitude.
All the technical proofs, the fixpoints for various other specifications, and an
additional benchmark taken from the software engineering literature [8] can be
found in the extended version of this paper under a slighly more relaxed setting
of the problem (transition fairness instead of extreme fairness) [4].
2 Preliminaries
Notation: We write N0to denote the set of natural numbers including zero.
Given a, b N0, we write [a;b] to denote the set {nN0|anb}. By
definition, [a;b] is an empty set if a > b. For any set AUdefined on the
universe U, we write Ato denote the complement of A. Given an alphabet A,
we use the notation Aand Aωto denote respectively the set of all finite words
and the set of all infinite words formed using the letters of the alphabet A. Let
Aand Bbe two sets and RA×Bbe a relation. For any element aA, we
use the notation R(a) to denote the set {bB|(a, b)R}.
21/2-player game graph: We consider usual turn-based stochastic games, also
known as 21/2-player games, played between Player 0, Player 1, and a third player
representing environmental randomness which is treated as a “half player.” For-
mally, a 21/2-player game graph is a tuple G=hV, V0, V1, Vr, Eiwhere (i) Vis a
finite set of vertices, (ii) V0,V1, and Vrare subsets of Vwhich form a partition of
V, and (iii) EV×Vis the set of directed edges. The vertices in Vrare called
random vertices, and the edges originating in a random vertex are called random
edges, denoted as Er. A 21/2-player game graph with no random vertices (i.e.
Vr=) is called a 2-player game graph.A21/2-player game graph with V1=
is called a 11/2-player game graph (also known as Markov Decision Processes or
MDPs). A 21/2-player game graph with V=Vris known as a Markov chain.
Strategies: A (deterministic) strategy of Player 0 is a function ρ0:VV0V
with ρ0(wv)E(v) for every wv VV0. Likewise, a strategy of Player 1 is a
function ρ1:VV1Vwith ρ1(wv)E(v) for every wv VV1. We denote
the set of strategies of Player iby Πi. A strategy ρiof Player i(i {0,1}) is
memoryless if for every w1v, w2vVVi, we have ρi(w1v) = ρi(w2v). In this
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 83
paper we restrict attention to deterministic strategies, as randomized strategies
are no more powerful than deterministic ones for 21/2-player Rabin games [7].
Plays: Consider an infinite sequence of vertices4π=v0v1v2. . . Vω. The
sequence πis called a play over Gstarting at the vertex v0if for every iN0, we
have viVand (vi, vi+1)E. A play is finite if it is of the form v0v1. . . vnfor
some finite nN0. Let ρ0Π0and ρ1Π1be a pair of strategies for the two
players, and v0Vbe a given initial vertex. For every finite play π=v0v1. . . v n,
the next vertex vn+1 is obtained as follows: If vnV0then vn+1 =ρ0(v0. . . vn);
if vnV1then vn+1 =ρ1(v0. . . vn); and if vnVrthen vn+1 is chosen uniformly
at random from the set Er(vn). The uniform probability distribution over the
random edges is without loss of generality for the problem considered in this
paper; we will come back to this after setting up the problem statement. Every
play generated in this way by fixing ρ0, ρ1, and v0is called a play compliant with
ρ0and ρ1that starts at vertex v0. The random choice in the random vertices
induces a probability measure Pρ01
v0on the sample space of plays.5This is in
contrast to 2-player games, where for any choice of ρ0Π0,ρ1Π1, and
v0V, the resulting compliant play is unique.
Winning Conditions: Awinning condition ϕis a set of infinite plays over G,
i.e., ϕVω, where the game graph Gwill always be clear from the context. We
adopt Linear Temporal Logic (LTL) notation for describing winning conditions.
The atomic propositions for the LTL formulas are sets of vertices, i.e., elements
of the set 2V. We use the standard symbols for the Boolean and the temporal
operators: ¬ for negation, for conjunction, for disjunction, for
implication, U for until (AUBmeans “the play remains inside the set Auntil
it moves to the set B”), for next (Ameans “the next vertex is in the set
A”), for eventually (Ameans “the play will eventually visit a vertex from
the set A”), and for always (Ameans “the play will only visit vertices
from the set A”). The syntax and semantics of LTL can be found in standard
textbooks [3]. By slightly abusing notation, we use ϕinterchangeably to denote
both the LTL formula and the set of plays satisfying ϕ. Hence, we write πϕ
to denote the satisfaction of the formula ϕby the play π.
Rabin Winning Conditions: ARabin winning condition is expressed using a
set of kRabin pairs R={hG1, R1i,...,hGk, Rki}, where kis any positive integer
and Gi, RiVfor all i[1; k]. We say that Rhas the index set P= [1; k]. A
play πsatisfies the Rabin condition Rif πsatisfies the LTL formula
ϕ:=WiP♦Ri♦Gi.(2)
Almost Sure Winning: Let Gbe 21/2-player game graph, ρ0Π0and ρ1Π1
be a pair of strategies, v0Vbe an initial vertex, and ϕbe an ω-regular
4In our convention for denoting vertices, superscripts (ranging over N0) will denote
the position of a vertex within a given sequence/play, whereas subscripts, either 0,
1, or r, will denote the membership of a vertex in the sets V0,V1, or Vrrespectively.
5The unique measure Pρ01
v0is obtained through Carath´eodory’s extension theorem
by extending the pre-measure on every infinite extension—called the cylinder set—of
every finite play; see [3, pp. 757] for details.
84 T. Banerjee et al.
specification over the vertices of G. Then Pρ01
v0(ϕ) denotes the probability of
satisfaction of ϕby the plays compliant with ρ0and ρ1and starting at v0.
The set of almost sure winning states of Player 0 for the specification ϕis
defined as the set Wa.s.Vsuch that for every v0 Wa.s.the following
holds: supρ0Π0infρ1Π1Pρ01
v0(ϕ) = 1.It is known [7, Thm. 4] that there is
an optimal (deterministic) memoryless strategy ρ
0Π0—called the optimal
almost sure winning strategy—such that for every v0 Wa.s.it holds that
infρ1Π1Pρ
01
v0(ϕ) = 1.
We extend the notion of winning to 2-player games as follows. Fix a 2-player
game graph G=hV, V0, V1,, E iand an ω-regular specification ϕover V. Player 0
wins the game from a vertex v0Vif Player 0 has a strategy ρ0such that for
every Player 1 strategy ρ1, the unique resulting play starting at v0is in ϕ. The
winning region W Vis the set of vertices from which Player 0 wins the game.
It is known that Player 0 has a memoryless strategy ρ
0—called the optimal
winning strategy —such that for every Player 1 strategy ρ1Π1and for every
initial vertex v0 W, the resulting unique compliant play is in ϕ[19].
3 Problem Statement and Outline
Given a 21/2-player game graph Gand a Rabin specification ϕas in (2), we
consider the problem of solving the induced qualitative reactive synthesis prob-
lem. That is, we want to compute the set of almost sure winning states Wa.s.
of Gw.r.t. ϕand the corresponding optimal memoryless winning strategy ρ
0of
Player 0. This problem was solved by Chatterjee et al. [7] via a reduction from
qualitative winning in the original 21/2-player Rabin game to winning in a larger
(deterministic) 2-player Rabin game with an additional Rabin pair.
Instead of inflating the game graph and introducing an extra Rabin pair at
the cost of more expensive computation, we propose a direct and computationally
more efficient symbolic algorithm over the original game graph G. We get this
algorithm by interpreting the random vertices of Gas special Player 1 vertices,
called live vertices, which are subject to an extreme fairness assumption: along
every play, if a live vertex vis visited infinitely often, then all outgoing transitions
of vare also taken infinitely often. This re-interpretation results in a 2-player
Rabin game with special live Player 1vertices that are subjected to extreme
fairness assumptions on Player 1’s behavior. We call such games extremely fair
adversarial (2-player) Rabin games. The correctness of our symbolic algorithm
then follows from the two main results of our paper.
(I) We show that qualitative winning in a 21/2-player Rabin game Gis equiv-
alent to winning in the extremely fair adversarial (2-player) Rabin game G`
obtained from G. Moreover, the winning strategy ρ0of Player 0 in G`is also the
optimal almost sure winning strategy in Gfor ϕ(see Thm. 1in Sec. 4).
(II) We give a direct symbolic algorithm to compute the set of winning states,
along with the Player 0 winning strategy for extremely fair adversarial (2-player)
Rabin games (see Thm. 2in Sec. 5).
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 85
Both contributions are discussed in detail in Sec. 4and Sec. 5, respectively.
Even though, for convenience, we have assumed a uniform probability distribu-
tion over the random edges, our contributions are valid for any arbitrary prob-
ability distribution. This follows from the established fact that the qualitative
analysis of 21/2-player games does not depend on the precise probability values
but only on the supports of the distributions [7].
We conclude the paper by an experimental evaluation in Sec. 6.
4 From Randomness to Extreme Fairness
In this section, we show that qualitative winning in 21/2-player Rabin games
is equivalent to winning in extremely fair adversarial (2-player) Rabin games
over the same underlying game graph. While it is known [16, Thm. 11.1] that
the reduction of random vertices to extreme fairness is sound and complete
for liveness winning conditions6we extend this connection to arbitrary Rabin
winning conditions in this section, and therefore to the entire class of ω-regular
specifications. We start with a formal definition of extremely fair adversarial
games and the connection between randomness and extreme fairness, before
stating our main result in Thm. 1.
Extremely Fair Adversarial Games: Let G=hV, V0, V1,, E ibe a 2-player
game graph with live vertices V`V1, denoted using the tuple G`=hG , V `i.
The set of edges originating from the live vertices are called the live edges, and
is denoted as E`:= (V`×V)E. A play πover G`is extremely fair with respect
to V`if it satisfies the following LTL formula:
α:=V(v,v0)E`(♦v♦(v v0)) .(3)
Given G`and an ω-regular winning condition ϕover V, Player 0 wins the ex-
tremely fair adversarial game over G`for ϕfrom a vertex v0Vif Player 0
wins the game over G`for the winning condition αϕfrom v0.
Randomness as Extreme Fairness: Let G=hV , V0, V1, Vr, Eibe a 21/2-player
game graph. Then we say that Ginduces the 2-player game graph with live
vertices G`:=hhV, V0, V1Vr,, E i, Vri. Intuitively, we interpret every random
vertex of Gas a live Player 1 vertex in G`. Obviously, this reinterpretation does
not change the structure of the underlying graph specified by Vand E.
Soundness of the Reduction: It remains to show that the almost sure winning
set and the optimal almost sure winning strategy of Player 0 in Gfor ϕis the same
as the winning state set and the winning strategy of Player 0 in G`for ϕ. This is
formalized in the following theorem when ϕis given as a Rabin condition. The
proof essentially shows that the random vertices of Gsimulate the live vertices
of G`, and vice versa; details are in the extended version [4, App. B.6, pp. 61].
6An LTL formula ϕover Vdescribes a liveness property if every finite play πover G
allows for a continuation π0s.t. ππ0ϕ.
86 T. Banerjee et al.
Theorem 1. Let Gbe a 21/2-player game graph with vertex set V,ϕVωbe a
Rabin winning condition as in (2), and G`be the 2-player game graph with live
edges induced by G. Let W Vbe the set of vertices from which Player 0wins
the extremely fair adversarial game over G`with respect to ϕ, and Wa.s.be the
almost sure winning set of Player 0in the 21/2-player game Gwith respect to ϕ.
Then, W=Wa.s.. Moreover, an optimal almost sure winning strategy in G`is
also an optimal winning strategy in G, and vice versa.
5 Extremely Fair Adversarial Rabin Games
This section presents our main result, which is a symbolic fixpoint algorithm that
computes the winning region of Player 0 in the extremely fair adversarial game
over G`with respect to any ω-regular property formalized as a Rabin winning
condition. This new symbolic fixpoint algorithm has multiple unique features.
(I) It works directly over G`, without requiring any pre-processing step to reduce
G`to a “normal” 2-player game with larger set of vertices.
(II) Our new fixpoint algorithm is obtained from the algorithm of Piterman et al.
[27] by a simple syntactic change. We simply replace all controllable predecessor
operators over least fixpoint variables by a new almost sure predecessor operator
invoking the preceding maximal fixpoint variable. This makes the proof of our
new fixpoint algorithm conceptually simple (see Sec. 5.3).
At a higher level, we make a simple yet efficient syntactic transformation of
the fixpoint to incorporate the fairness assumption on the live vertices, without
introducing any extra computational complexity. Most remarkably, this transfor-
mation also works directly for fixpoint algorithms for reachability, safety, B¨uchi,
(generalized) co-B¨uchi, Rabin-chain, and parity games, as these can be formal-
ized as particular instances of a Rabin game. Moreover, it also works for gener-
alized Rabin, generalized uchi, and GR(1) games. Owing to page constrains,
these additional cases are described in the extended version [4].
5.1 Preliminaries on Symbolic Computations over Game Graphs
Set Transformers: Our goal is to develop symbolic fixpoint algorithms to char-
acterize the winning region of an extremely fair adversarial game over a game
graph with live edges. As a first step, given G`, we define the required symbolic
transformers of sets of states. We define the existential, universal, and control-
lable predecessor operators as follows. For SV, we have
Pre
0(S):={vV0|E(v)S6=∅},(4a)
Pre
1(S):={vV1|E(v)S},and (4b)
Cpre(S):= Pre
0(S)Pre
1(S).(4c)
Intuitively, the controllable predecessor operator Cpre(S) computes the set of all
states that can be controlled by Player 0 to stay in Safter one step regardless
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 87
of the strategy of Player 1. Additionally, we define two operators which take
advantage of the fairness assumption on the live vertices. Given two sets S, T
V, we define the live-existential and almost sure predecessor operators:
Lpre(S):={vV`|E(v)S6=∅},and (5a)
Apre(S, T ):= Cpre(T)Lpre(T)Pre
1(S).(5b)
Intuitively, the almost sure predecessor operator7Apre(S, T ) computes the set
of all states that can be controlled by Player 0 to stay in T(via Cpre(T)) as well
as all Player 1 states in V`that (a) will eventually make progress towards Tif
Player 1 obeys its fairness-assumptions encoded in α(via Lpre(T)) and (b) will
never leave Sin the “meantime” (via Pre
1(S)). All the used set transformers are
monotonic with respect to set inclusion. Further, Cpre(T)Apre(S, T ) always
holds, Cpre(T) = Apre(S, T ) if V`=, and Apre(S, T )Cpre(S) if TS.
Fixpoint Algorithms in the µ-calculus: We use µ-calculus [20] as a con-
venient logical notation to define a symbolic algorithm (i.e., an algorithm that
manipulates sets of states rather than individual states) for computing a set of
states with a particular property over a given game graph G. The formulas of the
µ-calculus, interpreted over a 2-player game graph G, are given by the grammar
ϕ::=p|X|ϕϕ|ϕϕ|pre(ϕ)|µX.ϕ |νX.ϕ
where pranges over subsets of V,Xranges over a set of formal variables, pre
ranges over monotone set transformers in {Pre
0,Pre
1,Cpre,Lpre,Apre}, and µ
and νdenote, respectively, the least and the greatest fixed point of the functional
defined as X7→ ϕ(X). Since the operations ,, and the set transformers pre
are all monotonic, the fixed points are guaranteed to exist. A µ-calculus formula
evaluates to a set of states over G, and the set can be computed by induction over
the structure of the formula, where the fixed points are evaluated by iteration.
We omit the (standard) semantics of formulas (see [20]).
5.2 The Symbolic Algorithm
We now present our new symbolic fixpoint algorithm to compute the winning
region of Player 0 in the extremely fair adversarial game over G`with respect to
a Rabin winning condition R. A detailed correctness proof can be found in the
extended version [4, App. B.3, pp. 40].
Theorem 2. Let G`=hG, V `ibe a game graph with live edges and Rbe a Rabin
condition over Gwith index set P= [1; k]. Further, let Zdenote the fixed point
of the following µ-calculus expression:
νYp0.µXp0.[
p1P
νYp1.µXp1.[
p2P\1
νYp2.µXp2. . . . [
pkP\k1
νYpk.µXpk.
k
[
j=0
Cpj
,
(6a)
7We will justify the naming of this operator later in Rem. 1.
88 T. Banerjee et al.
where Cpj:=Tj
i=0 RpiGpjCpre(Ypj)Apre(Ypj, Xpj),(6b)
with8p0= 0,Gp0:=and Rp0:=as well as P\i:= P\ {p1, . . . , pi}. Then Z
is equivalent to the winning region Wof Player 0in the extremely fair adver-
sarial game over G`for the winning condition ϕin (2). Moreover, the fixpoint
algorithm runs in O(nk+2k!) symbolic steps, and a memoryless winning strategy
for Player 0can be extracted from it.
5.3 Proof Outline
Given a Rabin winning condition over a “normal” 2-player game, [27] provided a
symbolic fixpoint algorithm which computes the winning region for Player 0. The
fixpoint algorithm in their paper is almost identical to our fixpoint algorithm
in (6): it only differs in the last term of the constructed C-terms in (6b). [27]
defines the term Cpjas
Tj
i=0 RpiGpjCpre(Ypj)Cpre(Xpj).
Intuitively, a single term Cpjcomputes the set of states that always remain within
Qpj:= Tj
i=0 Rpiwhile always re-visiting Gpj. That is, given the simpler (local)
winning condition
ψ:=Q♦G(7)
for two sets Q, G V, the set
νY. µX. Q [(GCpre(Y)) (Cpre(X))] (8)
is known to define exactly the states of a “normal” 2-player game Gfrom which
Player 0 has a strategy to win the game with winning condition ψ[26]. Such
games are typically called Safe uchi Games. The key insight in the proof of
Thm. 2is to show that the new definition of C-terms in (6b) via the new al-
most sure predecessor operator Apre actually computes the winning state sets
of extremely fair adversarial safe uchi games. Subsequently, we generalize this
intuition to the fixpoint for the Rabin games.
Fair Adversarial Safe uchi Games: The following theorem characterizes
the winning states in an extremely fair adversarial safe uchi game.
Theorem 3. Let G`=hG, V `ibe a game graph with live vertices and Q, G V
be two state sets over G. Further, let
Z:=νY. µX. Q [(GCpre(Y)) (Apre(Y , X))] .(9)
Then Zis equivalent to the winning region of Player 0in the extremely fair ad-
versarial game over G`for the winning condition ψin (7). Moreover, the fixpoint
algorithm runs in O(n2)symbolic steps, and a memoryless winning strategy for
Player 0can be extracted from it.
8The Rabin pair hGp0, Rp0i=h∅,∅i in (6) is artificially introduced to make the
fixpoint representation more compact. It is not part of R.
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 89
Intuitively, the fixpoints in (8) and (9) consist of two parts: (a) A minimal
fixpoint over Xwhich computes (for any fixed value of Y) the set of states that
can reach the “target state set” T:=QGCpre(Y) while staying inside the
safe set Q, and (b) a maximal fixpoint over Ywhich ensures that the only states
considered in the target Tare those that allow to re-visit a state in Twhile
staying in Q.
By comparing (8) and (9) we see that our syntactic transformation only
changes part (a). Hence, in order to prove Thm. 3it essentially remains to show
that this transformation works for the even simpler safe reachability games.
Extremely Fair Adversarial Safe Reachability Games: A safe reachabil-
ity condition is a tuple hT, Qiwith T , Q Vand a play πsatisfies the safe
reachability condition hT , Qiif πsatisfies the LTL formula
ψ:= QUT. (10)
A safe reachability game is often called a reach-while-avoid game, where the
safe sets are specified by an unsafe set R:= Qthat needs to be avoided. Their
extremely fair adversarial version is formalized in the following theorem and
proved in the extended version [4, Thm. 3.3].
Theorem 4. Let G`=hG, V `ibe a game graph with live edges and hT , Qibe a
safe reachability winning condition. Further, let
Z:=νY. µX. T (QApre(Y, X )).(11)
Then Zis equivalent to the winning region of Player 0in the extremely fair
adversarial game over G`for the winning condition ψin (10). Moreover, the fix-
point algorithm runs in O(n2)symbolic steps, and a memoryless winning strategy
for Player 0can be extracted from it.
To gain some intuition on the correctness of Thm. 4, let us recall that the
fixpoint for safe reachability games without live edges is given by:
µX. T (QCpre(X)).(12)
Intuitively, the fixpoint computation in (12) is initialized with X0=and
computes a sequence X0, X1, . . . , Xkof increasing sets until Xk=Xk+1. We
say that vhas rank rif vXr\Xr1. All states contained in Xrallow Player 0
to force the play to reach Tin at most r1 steps while staying in Q. The
corresponding Player 0 strategy ρ0is known to be winning w.r.t. (10) and along
every play πcompliant with ρ0, the path πremains in Qand the rank is always
decreasing.
To see why the same strategy is also sound in the extremely fair adversarial
safe reachability game G`, first recall that for vertices v /V`of G`, the operator
Apre(X, Y ) simplifies to Cpre(X). With this, we see that for every v /V`a
Player 0 winning strategy eρ0in G`can always force plays to stay in Qand to
decrease their rank, similar to ρ0. Then every play πcompliant with such a
strategy eρ0and visiting a vertex in V`only finitely often satisfies (10).
90 T. Banerjee et al.
12 3
45
6
7
8
9
Fig. 1. Fair adversarial game graph discussed in Ex. 1and Ex. 2with Player 0 and
Player 1 vertices being indicated by circles and squares, respectively. The live vertices
are V`={2,3,5}(double square, blue), the target vertices are G={6,9}(double
circle, green), and the unsafe vertices are Q={1}(red,dotted).
The only interesting case for soundness of Thm. 4is therefore every play π
that visits states in V`infinitely often. However, as the number of vertices is
finite, we only have a finite number of ranks and hence a certain vertex vV`
with a finite rank rneeds to get visited by πinfinitely often. From the definition
of Apre, we know that only states vV`are contained in Xrif vhas an outgoing
edge reaching Xkwith k < r. Because of the extreme fairness condition, reaching
vinfinitely often implies that also a state with rank ks.t. k < r will get visited
infinitely often. As X1=Twe can show by induction that Tis eventually visited
along πwhile πalways remains in Quntil then.
In order to prove completeness of Thm. 4we need to show that all states
in V\Zare losing for Player 0. Here, again the reasoning is equivalent to the
“normal” safe reachability game for v /V`. For live vertices vV`, we see
that vis not added to Zvia Apre if v /Tand either (i) none of its outgoing
edges make progress towards Tor (ii) some of its outgoing edges leave Z. One
can therefore construct a Player 1 strategy that for (i)-vertices always choose
an arbitrary transition and thereby never makes progress towards T(also if v
is visited infinitely often), and for (ii)-vertices ensures that they are only visited
once on plays which remain in Q. This ensures that (ii)-vertices never make
progress towards Tvia their possibly existing rank-decreasing edges.
In the extended version [4], we have provided a detailed soundness and com-
pleteness proof of Thm. 4along with the respective Player 0 and Player 1 strat-
egy construction. In addition, there we also proved Thm. 3using a reduction to
Thm. 4for every iteration over Y.
Example 1 (Extremely Fair adversarial safe reachability game). We consider an
extremely fair adversarial safe reachability game over the game graph depicted
in Fig. 1with target vertex set T=G={6,9}and safe vertex set Q=V\ {1}.
We denote by Ymthe m-th iteration over the fixpoint variable Yin (11),
where Y0=V. Further, we denote by Xmi the set computed in the i-th iteration
over the fixpoint variable Xin (11) during the computation of Ymwhere Xm0=
. We further have Xm1=T={6,9}as Apre(·,) = . Now we compute
X12 =T(QApre(Y0, X11))
={6,9} (V\ {1} [Cpre(X11)
| {z }
{7,8}
(Lpre(X11)Pre
1(V))
| {z }
{3,5}
]) = {3,5,6,7,8,9}.
(13)
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 91
We observe that the only vertices added to Xvia the Cpre term are 7 and
8. The live vertices 3 and 5 are added due to their outgoing edges leading to
the target vertex 6. The additional requirement Pre
1(V) in Apre(Y0, X11) is
trivially satisfied for all vertices at this point as Y0=Vand can therefore be
ignored. Doing one more iteration over Xwe see that now vertex 4 gets added
via the Cpre term (as it is a Player 0 vertex that allows progress towards 5) and
vertex 2 is added via the Apre term (as it is live and allows progress to 3). The
iteration over Xterminates with Y1=X1=V\ {1}.
Re-iterating over Xfor Y1gives X22 =X12 ={3,5,6,7,8,9}as before.
However, now vertex 2 does not get added to X23 because vertex 2 has an
edge leading to V\Y1={1}. Therefore the iteration over Xterminates with
Y2=X2=V\{1,2}. When we now re-iterate over Xfor Y2we see that vertex
3 is not added to X32 any more, as vertex 3 has a transition to V\Y2={1,2}.
Therefore the iteration over Xnow terminates with Y3=X3=V\ {1,2,3}.
Now re-iterating over Xdoes not change the vertex set anymore and the fixed-
point terminates with Y=Y3=V\ {1,2,3}.
We note that the fixpoint expression (12) for “normal” safe reachability
games terminates after two iterations over Xwith X={6,7,8,9}, as ver-
tices 7 and 8 are the only vertex added via the Cpre operator in (13). Due to
the stricter notion of Cpre requiring that all outgoing edges of Player 0 vertices
make process towards the target, (12) does not require an outer largest fixed-
point over Yto “trap” the play in a set of vertices which allow progress when
“waiting long enough”. This “trapping” required in (11) via the outer fixpoint
over Yactually fails for vertices 2 and 3 (as they are excluded from the winning
set of (11)). Here, Player 1 can enforce to “escape” to the unsafe vertex 1 in
two steps before 2 and 3 are visited infinitely often (which would imply progress
towards 6 via the existing live edges).
We see that the winning region in the “normal” game is much smaller than the
winning region for the extremely fair adversarial game, as adding live transitions
restricts the strategy choices of Player 1, making it easier for Player 0 to win.
Example 2 (Extremely fair adversarial safe uchi game). We now consider an
extremely fair adversarial safe uchi game over the game graph depicted in Fig. 1
with target set G={6,9}and safe set Q=V\ {1}.
We first observe that we can rewrite the fixpoint in (9) as
νY. µX. [QGCpre(Y)] [Q(Apre(Y , X))] .(14)
Using (14) we see that for Y0=Vwe can define T0:= QGCpre(V) = G=
{6,9}. Therefore the first iteration over Xis equivalent to (13) and terminates
with Y1=X1=V\ {1}.
Now, however, we need to re-compute Tfor the next iteration over Xand
obtain T1=QGCpre(Y1) = V\ {1}∩{6,9} V\ {1,2,9}={6}. This
re-computation of T1checks which target vertices are repeatedly reachable, as
required by the uchi condition. As vertex 9 has no outgoing edge trivially it
cannot be reached repeatedly.
92 T. Banerjee et al.
With this, we see that for the next iteration over Xwe only have one target
vertex T1={6}. Unlike the safe reachability case in Ex. 1, the vertex 7 cannot
be added to X22, since Player 1 can always decide to take the edge towards 9
from 7, and therefore prevents repeated visit of a target state. Vertices 2 and 3
get eliminated for the same reason as in the safe reachability game within the
second and third iteration over Y. The overall fixpoint computation therefore
terminates with Y=Y3={4,5,6,8}.
Proof of Thm. 2:The proof of Thm. 2essentially follows from the same
arguments as in the soundness proof of the Rabin fixpoint for 2-player game by
Piterman et al. [27], which utilizes Thm. 4and Thm. 3at all suitable places. In
[4, App. A, pp. 29], we illustrate the steps of the Rabin fixpoint in (6) using a
simple extremely fair adversarial Rabin game with two Rabin pairs.
Remark 1. We remark that the fixpoint (11), as well as the Apre operator, are
similar in structure to the solution of almost surely winning states in concurrent
reachability games [1]. In concurrent games, the fixpoint captures the largest
set of states in which the game can be trapped while maintaining a positive
probability of reaching the target. In our case, the fixpoint captures the largest
set of states in which Player 0 can keep the game while ensuring a visit to the
target either directly or through some of the edges from the live vertices. The
commonality justifies our notation and terminology for Apre.
Remark 2. [2] studied fair CTL and LTL model checking where the fairness con-
dition is given by exteme fairness with all vertices of the transition system being
live. They show that CTL model checking under this all-live fairness condition,
can be syntactically transformed to non-fair CTL model checking. A similar
transformation is possible for fair model checking of uchi, Rabin, and Streett
formulas. The correctness of their transformation is based on reasoning similar
to our Apre operator. For example, a state satisfies the CTL formula punder
fairness iff all paths starting from the state either eventually visits por always
visits states from which a visit to pis possible.
Complexity Analysis of (6):For Rabin games with kRabin pairs, Piterman et
al. [27] proposed a fixpoint formula with alternation depth 2k+ 1 . Using the ac-
celerated fixpoint computation technique of Long et al. [23], they deduce a bound
of O(nk+1k!) symbolic steps. We can apply the same acceleration technique to
our fixpoint (6), yielding a complexity upper bound of O(nk+2 k!) symbolic steps.
(The additional complexity is because of an additional outermost ν-fixpoint.)
6 Experimental Evaluation
We developed a C++-based tool Fairsyn9, which implements the symbolic fair
adversarial Rabin fixpoint from Eq. (6) using Binary Decision Diagrams (BDD).
9Repository URL: https://gitlab.mpi-sws.org/kmallik/synthesis-with-edge-fairness
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 93
Fairsyn has a single-threaded and a multi-threaded version, which respectively
use the CUDD BDD library [32] and the Sylvan BDD library [11]. In both, we
used a fixpoint acceleration procedure that “warm-starts” the inner fixpoints by
exploiting a monotonicity property (detailed in the extended version [4]).
We demonstrate the effectiveness of our proposed symbolic algorithm for 21/2-
player Rabin games using a set of synthetic benchmark experiments derived from
the VLTS benchmark suite (Sec. 6.1) and a controller synthesis experiment for
a stochastic dynamical system (Sec. 6.2); in the extended version [4], we include
an additional software engineering benchmark example from the literature. In
all of these examples, Fairsyn significantly outperformed the state-of-the-art.
The experiments in Sec. 6.1 were performed using the multi-threaded Fairsyn
on a computer equipped with a 3 GHz Intel Xeon E7 v2 processor with 48 CPU
cores and 1.5 TiB RAM. The experiments in Sec. 6.2 were performed using the
single-threaded Fairsyn on a Macbook Pro (2015) laptop equipped with a 2.7 GHz
Dual-Core Intel Core i5 processor with 16 GiB RAM.
6.1 The VLTS Benchmark Experiments
We present a collection of synthetic benchmarks for empirical evaluation of the
merits of our direct symbolic algorithm compared to the one using the reduction
to 2-player games [7]; in the following, we refer the latter as the indirect approach.
Like our direct algorithm, the indirect approach has been implemented in Fairsyn
and benefits from the same Sylvan-based parallel BDD-library and accelerated
fixpoint solution technique. We collect the first 20 transition systems from the
Very Large Transition Systems (VLTS) benchmark suite [15]; their descriptions
can be found in the VLTS benchmark website. For each of them, we randomly
generated instances of 21/2-player Rabin games with up to 3 Rabin pairs using
the following procedure: (i) we labeled a given fraction of the vertices as ran-
dom vertices, (ii) we equally partitioned the remaining vertices into system and
environment vertices, and (iii) for every set in R={hG1, R1i,...,hGk, Rki}, we
randomly selected up to 5% of all vertices to be contained in the set. All the ver-
tices in (i), (ii), and (iii) were selected randomly. In these examples, the number
of vertices ranged from 289–164,865, the number of BDD variables ranged from
9–18, and the number of transitions from 1224–2,621,480.
In Fig. 2, we compare the running times of Fairsyn and the indirect approach.
On the left scatter plot, every point corresponds to one instance of the randomly
generated benchmarks, where the X and the Y coordinates represent the run-
ning time for Fairsyn and the indirect approach respectively. The solid red line
indicates the exact same performance for both methods, whereas the dashed
red line indicates an order of magnitude performance improvement for Fairsyn
compared to the indirect approach. Observe that Fairsyn was faster by up to
two orders of magnitude for the majority of the cases. In the experiments, the
memory footprint of Fairsyn and the indirect approach was similar.
In the right plot, the X-axis corresponds to the proportion of random vertices
within the set of vertices in percentage: 0% corresponds to a 2-player game and
100% corresponds to a Markov chain. The Y-axis corresponds to the running
94 T. Banerjee et al.
time normalized with respect to the running time for the 0% case. We observe
that Fairsyn was insensitive to the change of proportion of the random vertices.
On the other hand, the indirect approach took longer time for larger proportion
of random vertices, because for every random vertex it adds 3k+ 2 additional
vertices, thus causing a linear blowup in the size of the game graph. The big
variations in the time differences of the two approaches are due to the varying
size of the experiments: The larger a game graph is, the larger is the difference.
Interestingly, for both Fairsyn and the indirect method, there is a dip in the
running time when all the vertices are random (i.e. the 100% case), which is
possibly due to faster computation of the Cpre and Apre operators and faster
convergence of the fixpoint algorithm, owing to the absence of Player 0 and
Player 1 vertices.
103100103
103
100
103
Fairsyn (s)
The indirect approach (s)
0 20 40 60 80 100
0
500
1,000
1,500
Fraction of random vertices (%)
Normalized running time
Fig. 2. LEFT: Comparison of running time of Fairsyn and the indirect approach on
the VLTS benchmarks. All axes are in log-scale. RIGHT: Sensitivity of normalized
running time w.r.t. variation of the proportion of random vertices. The blue and the red
lines correspond to different instances of Fairsyn and the indirect approach respectively.
6.2 Synthesis for Stochastically Perturbed Dynamical Systems
Synthesizing verified symbolic controllers for continuous dynamical systems is an
active area in cyber-physical systems research [33]. We consider a stochastically
perturbed dynamical system model, called the bistable switch [12], which is an
important model studied in molecular biology. The system model, call it Σ, has a
continuous and compact two-dimensional state space X= [0,4]×[0,4] R2and
a finite input space U={−0.5,0,0.5} × {−0.5,0,0.5}. Suppose for any given
time kN,x1(k), x2(k) are the two states, u1(k), u2(k) are the two inputs,
and w1(k), w2(k) are a pair of statistically independent noise samples drawn
from a pair of distributions with bounded supports W1= [0.4,0.2], W2=
[0.4,0.2] respectively. Then the states of Σin the next time instant are:
x1(k+ 1) = x1(k)+0.05 (1.3x1(k) + x2(k)) + u1(k) + w1(k),(15)
x2(k+ 1) = x2(k)+0.05 (x1(k))2
(x1(k))2+ 1 0.25x2(k)+u2(k) + w2(k).
A controller Cfor Σis a function C:XUmapping the state x(k) at any
time instant kto a suitable control input u(k). Then applying (15) repeatedly
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 95
Table 1. Performance comparison between Fairsyn and StochasticSynthesis (abbrevi-
ated as SS) [12] on a comparable implementation of the abstraction (uniform grid-based
abstraction). Col. 1 shows the size of the resulting 21/2-player game graph (computed
using the algorithm given in [24]), Col. 2 and 3 compare the total synthesis times and
Col. 4 and 5 compare the peak memory footprint (as measured using the “time” com-
mand) for Fairsyn and SS respectively. “OoM” stands for out-of-memory.
# vertices in
21/2-game abstraction
Total synthesis time Peak memory footprint
Fairsyn SS Fairsyn SS
3.8×1030.4 s 30 s 66 MiB 156 MiB
2.2×1048.2 s 55 s 72 MiB 1 GiB
1.1×1051 min 23 s 16 min 1 s 108 MiB 81 GiB
6.6×1055 min 27 s OoM 166 MiB 126 GiB
4.3×10641 min 7 s OoM 517 MiB 127 GiB
with u(k) = C(x(k)), starting with an initial state (x1(0), x2(0)) = x(0) = xinit ,
gives us an infinite sequence of states (x(0), x(1), x(2), . . .) called a path. For
a fixed controller Cand for a given initial state xinit, we obtain a probability
measure PC
xinit on the sample space of paths of Σ, in a way similar to how we
obtained the probability measure Pρ01
v0over infinite plays of 21/2-player games.
Let ϕXωbe a Rabin specification, defined using a finite predicate over X.
C
B
C
C
C A
04
0
4
Fig. 3. Predicates over X.
We extend the notion of almost sure winning for con-
trol systems in the obvious way: A state xXof Σis
almost sure winning if there is a controller Csuch that
PC
x(ϕ) = 1. The controller synthesis problem asks to
compute an optimal controller Csuch that for every
almost sure winning state x,PC
x(ϕ) = 1.
Majumdar et al. [24] show that this synthesis prob-
lem can be approximately solved by lifting the system
Σto a finite 21/2-player game. We used Fairsyn to solve the resulting 21/2-player
Rabin games obtained for the controller synthesis problem for Σin (15) and for
the following specification given in LTL using the predicates A, B, C, D as shown
in Fig. 3:ϕ:= (♦BC)(A¬C).
In Table 1, we compare the performance of Fairsyn against the state-of-the-
art algorithm for solving this problem, which is implemented in the tool called
StochasticSynthesis (SS) [12]. It can be observed that Fairsyn significantly out-
performs SS for every abstraction of different coarseness considered here.
Acknowledgments:
R. Majumdar and K. Mallik are funded through the DFG project 389792660
TRR 248–CPEC, A.-K. Schmuck is funded through the DFG project (SCHM
3541/1-1), and S. Soudjani is funded through the EPSRC New Investigator
Award CodeCPS (EP/V043676/1).
References
1. de Alfaro, L., Henzinger, T.A., Kupferman, O.: Concurrent reachability games. In:
39th Annual Symposium on Foundations of Computer Science, FOCS. pp. 564–575.
96 T. Banerjee et al.
IEEE Computer Society (1998)
2. Aminof, B., Ball, T., Kupferman, O.: Reasoning about systems with transition
fairness. In: 11th International Conference on Logic for Programming, Artificial
Intelligence, and Reasoning. LNCS, vol. 3452, pp. 194–208. Springer (2004)
3. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press (2008)
4. Banerjee, T., Majumdar, R., Kaushik, M., Schmuck, A.K., Soudjani, S.: Fast sym-
bolic algorithms for omega-regular games under strong transition fairness (2021),
https://www.mpi-sws.org/tr/2020-007.pdf
5. Belta, C., Yordanov, B., Gol, E.A.: Formal methods for discrete-time dynamical
systems, vol. 15. Springer (2017)
6. Buchi, J.R., Landweber, L.H.: Solving sequential conditions by finite-state strate-
gies. Transactions of the American Mathematical Society 138, 295–311 (1969)
7. Chatterjee, K., de Alfaro, L., Henzinger, T.A.: The complexity of stochastic Ra-
bin and Streett games. In: Proceedings of the 32nd International Colloquium on
Automata, Languages and Programming (ICALP). Lecture Notes in Computer
Science, vol. 3580, pp. 878–890. Springer (2005)
8. Chatterjee, K., De Alfaro, L., Faella, M., Majumdar, R., Raman, V.: Code aware
resource management. Formal Methods in System Design 42(2), 146–174 (2013)
9. Chatterjee, K., Jurdzi´nski, M., Henzinger, T.A.: Quantitative stochastic parity
games. In: Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete
algorithms. pp. 121–130. Society for Industrial and Applied Mathematics (2004)
10. Church, A.: Logic, arithmetic, and automata. Proceedings of the International
Congress of Mathematicians, 1962 pp. 23–35 (1963)
11. van Dijk, T., van de Pol, J.: Sylvan: Multi-core decision diagrams. In: International
Conference on Tools and Algorithms for the Construction and Analysis of Systems.
pp. 677–691. Springer (2015)
12. Dutreix, M., Huh, J., Coogan, S.: Abstraction-based synthesis for stochastic sys-
tems with omega-regular objectives. arXiv preprint arXiv:2001.09236 (2020)
13. Emerson, E.A., Jutla, C.S.: The complexity of tree automata and logics of pro-
grams. In: FoCS. vol. 88, pp. 328–337 (1988)
14. Emerson, E.A., Jutla, C.S.: Tree automata, mu-calculus and determinacy. In: FoCS.
vol. 91, pp. 368–377 (1991)
15. Garavel, H., Descoubes, N.: Very large transition systems (2003), http://cadp.
inria.fr/resources/vlts/
16. van Glabbeek, R., ofner, P.: Progress, justness, and fairness. ACM Comput. Surv.
52(4) (2019)
17. Gurevich, Y., Harrington, L.: Trees, automata, and games. In: Proceedings of the
fourteenth annual ACM symposium on Theory of computing. pp. 60–65 (1982)
18. Kamgarpour, M., Summers, S., Lygeros, J.: Control design for property specifica-
tions on stochastic hybrid systems. Hybrid Systems: Computation and Control pp.
303–312 (April 2013)
19. Klarlund, N.: Progress measures, immediate determinacy, and a subset construc-
tion for tree automata. Annals of Pure and Applied Logic 69(2-3), 243–268 (1994)
20. Kozen, D.: Results on the propositional µ-calculus. Theoretical Computer Science
27(3), 333 354 (1983), international Colloquium on Automata, Languages and
Programming (ICALP)
21. Kupferman, O., Vardi, M.Y.: Safraless decision procedures. In: 46th Annual IEEE
Symposium on Foundations of Computer Science (FOCS’05). pp. 531–540. IEEE
(2005)
A Direct Symbolic Algorithm for Solving Stochastic Rabin Games 97
22. Laurenti, L., Lahijanian, M., Abate, A., Cardelli, L., Kwiatkowska, M.: Formal
and efficient synthesis for continuous-time linear stochastic hybrid processes. IEEE
Transactions on Automatic Control (2020)
23. Long, D.E., Browne, A., Clarke, E.M., Jha, S., Marrero, W.R.: An improved al-
gorithm for the evaluation of fixpoint expressions. In: International Conference on
Computer Aided Verification. pp. 338–350. Springer (1994)
24. Majumdar, R., Mallik, K., Schmuck, A.K., Soudjani, S.: Symbolic qualitative con-
trol for stochastic systems via finite parity games. In: ADHS 2021 (2021)
25. Majumdar, R., Mallik, K., Soudjani, S.: Symbolic controller synthesis for uchi
specifications on stochastic systems. In: Proceedings of the 23rd International Con-
ference on Hybrid Systems: Computation and Control. pp. 1–11 (2020)
26. Maler, O., Pnueli, A., Sifakis, J.: On the synthesis of discrete controllers for timed
systems. In: Annual Symposium on Theoretical Aspects of Computer Science. pp.
229–242. Springer Berlin Heidelberg (1995)
27. Piterman, N., Pnueli, A.: Faster solutions of Rabin and Streett games. In: 21st
Annual IEEE Symposium on Logic in Computer Science (LICS’06). pp. 275–284
(2006)
28. Pnueli, A.: On the extremely fair treatment of probabilistic algorithms. In: Pro-
ceedings of the fifteenth annual ACM symposium on Theory of computing. pp.
278–290 (1983)
29. Pnueli, A., Rosner, R.: A framework for the synthesis of reactive modules. In: Vogt,
F.H. (ed.) International Conference on Concurrency, Proceedings. LNCS, vol. 335,
pp. 4–17. Springer (1988)
30. Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Annual ACM
Symposium on Principles of Programming Languages. pp. 179–190. ACM Press
(1989)
31. Rabin, M.O.: Decidability of second-order theories and automata on infinite trees.
Transactions of the American Mathematical Society 141, 1–35 (1969)
32. Somenzi, F.: Cudd 3.0.0 (2019), https://github.com/ivmai/cudd
33. Tabuada, P.: Verification and control of hybrid systems: a symbolic approach.
Springer Science & Business Media (2009)
34. Zielonka, W.: Infinite games on finitely coloured graphs with applications to au-
tomata on infinite trees. Theor. Comput. Sci. 200(1-2), 135–183 (1998)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
98 T. Banerjee et al.
Practical Applications of the
Alternating Cycle Decomposition
Antonio Casares1()ID , Alexandre Duret-Lutz2ID , Klara J. Meyer3ID ,
Florian Renkin2ID , and Salomon Sickert4ID
1LaBRI, Universit´e de Bordeaux, France, antonio.casares-santos@labri.fr
2LRDE, EPITA, France, adl@lrde.epita.fr,frenkin@lrde.epita.fr
3Independent Researcher, email@klarameyer.de
4School of Computer Science and Engineering, The Hebrew University, Israel,
salomon.sickert@mail.huji.ac.il
Abstract. In 2021, Casares, Colcombet, and Fijalkow introduced the
Alternating Cycle Decomposition (ACD) to study properties and trans-
formations of Muller automata. We present the first practical implemen-
tation of the ACD in two different tools, Owl and Spot, and adapt it
to the framework of Emerson-Lei automata, i.e., ω-automata whose ac-
ceptance conditions are defined by Boolean formulas. The ACD provides
a transformation of Emerson-Lei automata into parity automata with
strong optimality guarantees: the resulting parity automaton is minimal
among those automata that can be obtained by duplication of states.
Our empirical results show that this transformation is usable in practice.
Further, we show how the ACD can generalize many other specialized
constructions such as deciding typeness of automata and degeneraliza-
tion of generalized uchi automata, providing a framework of practical
algorithms for ω-automata.
1 Introduction
Automata over infinite words have many applications, including verification and
synthesis of reactive systems with specifications given in formalisms such as Lin-
ear Temporal Logic (LTL) [27,23,11,12,2,29]. The synthesis problem from
LTL specifications asks, given an LTL formula φ, to build a controller that pro-
cesses an input word letter by letter, producing an output word, such that the
combined input-output-word satisfies φ. The automata-theoretic approach to
this problem (first introduced by Pnueli and Rosner [27]) consists of building a
deterministic ω-automaton Aequivalent to the LTL specification φ, then con-
struct a game from Ain which the opponent chooses the input letters for the
automaton, and finally solve this game and obtain a controller from a winning
strategy (whenever such a strategy exists). The automaton Acan use differ-
ent kinds of acceptance conditions (Rabin, Emerson-Lei, Muller, parity...) and
Salomon Sickert is supported in part by the Deutsche Forschungsgemeinschaft (DFG)
under project number 436811179, and in part funded by the European Research
Council (ERC) under the European Union’s Horizon 2020 research and innovation
programme under grant agreement No. 787367 (PaVeS)
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 99–117, 2022.
https://doi.org/10.1007/978-3-030-99527-0_6
thus we obtain games with different winning conditions. Among these games,
parity games are the easiest to solve and there are highly-developed techniques
for parity games solvers. Thus it is common practice to transform the automa-
ton Ato a parity one (for which we might need to augment the state space
of the automaton). The top-ranked tools in the SyntComp competitions [17],
Strix [23] (winner in editions 2018, 2019, 2020 and 2021) and ltlsynt [26], use
this approach, producing a transition-based Emerson-Lei automata (TELA) as
an intermediate step before constructing the parity automaton. For this reason,
optimal and efficient procedures to transform Emerson-Lei automata into parity
automata are of great importance.
Emerson-Lei (EL) acceptance conditions (first defined by Emerson and Lei
[10], and reinvented in the HOA format [3]) are arbitrary positive Boolean for-
mulas over the primitives Inf(c) and Fin(c) where c’s are colors from a set Γ. A
run is accepting if the set of colors F 2Γseen infinitely often is a satisfying as-
signment to the EL acceptance condition (see Section 2for a formal definition).
Note that an explicit representation of all satisfying assignments is comparable
to the Muller condition [15, Section 1.3.2]. Since the Boolean structure of LTL
formulas can be mimicked by the Emerson-Lei acceptance conditions, a transla-
tion of LTL formulas to Emerson-Lei automata is particularly convenient.
Many algorithms to transform Emerson-Lei and Muller automata to parity
have been proposed. In essence they all transform an automaton by turning
each original state qinto multiple states of the form (q, r) where rrecords some
information about the current run, and transitions leaving (q, r) otherwise have a
one-to-one mapping with those leaving q. Definition 3calls this a locally bijective
morphism, and we like to refer to those as algorithms that duplicate states. For
instance in the Later Appearance Record (LAR) [16], ris a list of all colors
ordered by most recent appearance, producing therefore a blow-up of |Γ|! in the
state-space of the automaton. The State Appearance Record (SAR) [24,22] is a
variation of this idea for state-based conditions, and the Color Appearance Record
(CAR) [28] is a variation for the Emerson-Lei condition. The Index Appearance
Record (IAR) [24,22,20] is a specialized construction for Rabin and Streett
conditions, where ris now an ordering of pair indices. These algorithms have
no particular insights about the input acceptance condition, such as inclusion or
redundancies between colors (or pairs). In the Zielonka-tree transformation [31],
ris a reference to a branch in a tree representation of a Muller condition. That
tree representation is tailored to the condition and allows such simplifications
compared to previous methods (it can be proven to be always better [6,25]).
While none of these algorithms use the structure of the input automaton to
optimize the produced automata, some heuristics have been proposed [28,25,21].
In 2021, inspired by the Zielonka tree,Casares et al. introduced the Alternat-
ing Cycle Decomposition (ACD) of a Muller automaton [6]. Simply put, the ACD
is a forest, i.e., a list of trees, that captures how accepting and rejecting cycles
interleave in the automaton. They use the ACD to transform Muller automata
into parity automata, and they prove a strong optimality result: the resulting
automaton uses an optimal number of colors and has a minimal number of states
100 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
among those parity automata that can be obtained by duplicating states of the
original one (see Theorem 1for a formal statement). The main novelty of this
transformation is that it does not only take into account the structure of both
the acceptance condition and the automaton, but it exactly captures how they
interact with each other. Moreover, Casares et al. [6] show that we can obtain
some other valuable information about a Muller automaton from its ACD: for
example the ACD can be used to decide typeness, i.e, if we can relabel it with
another acceptance condition (parity, Rabin, Streett...). Their approach is pri-
marily theoretical and puts the emphasis on how the ACD can be useful to
obtain new results concerning Muller automata, but little is said about the costs
of computing the ACD or the applicability of the transformation in practice.
Contributions. In this paper, we show that the ACD is practical. We adapt the
definition of the ACD to Emerson-Lei automata and the HOA format [3]. We
implement the ACD and the associated transformation in two tools: Owl [18]
and Spot [9], providing baselines for efficient implementations of these struc-
tures. We show that the ACD gives a usable and useful method to transform
Emerson-Lei automata into parity ones, improving upon any previous transfor-
mation in terms of the size of the output parity automaton. We extend the ACD
to produce state-based automata, and show that the ACD generally beats tradi-
tional degeneralization-based procedures. Our implementation can also use the
ACD to check typeness of deterministic automata.
Structure of the paper. We begin by providing some common definitions in Sec-
tion 2. In Section 3, we define the Alternating Cycle Decomposition, adapting
the definition of Casares et al. [6] to Emerson-Lei automata, and we provide an
algorithm to compute it. In Section 5, we study the transformation of Emerson-
Lei automata into parity ones using the ACD and we show experimental results
obtained by comparing the ACD-transform implemented in Spot and Owl with
other commonly used transformations. In Section 6we show experimental re-
sults in the particular case of degeneralization of generalized uchi automata.
In Section 7we discuss the utility of the ACD to decide typeness of automata.
2 Preliminaries
We denote by |A|the cardinality of a set Aand by 2Aits power set. For a
finite alphabet Σ, we write Σand Σωfor the sets of finite and infinite words,
respectively, over Σ. The empty word is denoted by ε. Given vΣ, w Σω,
we denote their concatenation by v·wand we write vwif vis a prefix of w.
We note inf(w) the set of letters that occur infinitely often in w. Given a map
σ:ABand a subset AA, we denote σ|Athe restriction of σto A. We
extend σto Aand Aωcomponent-wise and we denote these extensions by σ
whenever no confusion arises.
A (directed, edge-colored) graph is a pair G= (V, E) where Vis a finite set
of vertices and EV×Γ×Vis a finite set of Γ-colored edges. Note that with
Practical Applications of the Alternating Cycle Decomposition 101
Table 1: Encoding of common acceptance conditions into Emerson-Lei condi-
tions. The variables c, c0, c1, . . . stand for arbitrary colors from the set Γ.
(B) uchi Inf(c)
(GB) generalized uchi ViInf (ci)
(C) co-B¨uchi Fin(c)
(GC) generalized co-B¨uchi WiFin(ci)
(R) Rabin Wi(Fin(c2i)Inf (c2i+1 ))
(S) Streett Vi(Inf(c2i)Fin(c2i+1 ))
(P) parity min even Inf (0) (Fin(1) (Inf(2) (Fin(3) . . .)))
parity min odd Fin(0) (Inf(1) (Fin(2) (Inf (3) . . .)))
this definition one can have multiple differently colored edges from a vertex vto
a vertex u. A graph G= (V, E) is a subgraph of G(written GG) if VV
and EE. A graph G= (V, E ) is strongly connected if for every pair of vertices
(v, u)V2there is a path from vto u. A strongly connected component (SCC)
of a graph Gis a maximal strongly connected subgraph of G.
Emerson-Lei acceptance conditions. Let Γ={0, . . . , n 1}be a finite set of n
integers called colors, from now on also written Γ={0,1, . . .}in our examples.
We define the set EL(Γ) of acceptance conditions according to the following
grammar, where cstands for any color in Γ:
α::= | | Inf(c)|Fin(c)|(αα)|(αα)
Acceptance conditions are interpreted over subsets of Γ. For CΓwe define
the satisfaction relation C|=αinductively according to the following semantics:
C|=C|=Inf(c) iff cC C |=α1α2iff C|=α1and C|=α2
C|=C|=Fin(c) iff c /C C |=α1α2iff C|=α1or C|=α2
We denote by ¬αthe negation of the acceptance condition α, i.e., Fin(m) be-
comes Inf(m), and vice-versa, becomes , etc. We assume that constants are
propagated, i.e., a formula is either ,, or does not contain and .
Table 1shows how common acceptance conditions can be encoded into
Emerson-Lei conditions. Note that colors may appear multiple times; for in-
stance (Fin(0)Inf(1)) (Fin(1)Inf(0)) is a Rabin condition.
Emerson-Lei automata. Atransition-based Emerson-Lei automaton (TELA) is
a tuple A= (Q, Σ, Q0, ∆, Γ , α), where Qis a finite set of states, Σis a finite
input alphabet, Q0Qis a non-empty set of initial states, Γis a set of colors,
Q×Σ×2Γ×Qis a finite set of transitions, and αEL(Γ) is an Emerson-Lei
condition. The graph of Ais the directed edge-colored graph GA= (Q, E) where
the edges E={(q, C, q ) : aΣ. (q, a, C, q)}are obtained from by
removing Σ. We denote the transition (q , a, C, q)and the edge (q, C, q )E
by qa:C
qand qC
q, respectively. Further, we might omit aor Cif they are
102 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
clear from the context. We denote by γthe projection of or Eto the set of
colors Γ. Given a word w=a0·a1·a2· · · Σω, a run over win Ais a sequence
ϱ= (q0, a0, C0, q1)·(q1, a1, C1, q2)· · · ωsuch that q0Q0. The output of the
run ϱ, is the word γ(ϱ)(2Γ)ω. A run ϱis accepting if inf (γ(ϱ)) α. A word
wΣωis accepted (or recognized ) by Aif there exists an accepting run over
win A. We denote L(A) the set of words accepted by A. Two automata A,A
are equivalent if L(A) = L(A). The size of an automaton, written |A|, is the
cardinality of its set of states. A state qQis reachable if there is a path from
some state in Q0to qin GA.
An automaton Ais deterministic if Q0is a singleton and for every qQ
and aΣthere is at most one transition from qlabeled with a,qa:C
q.
We will use automata with acceptance defined over transitions (instead of
stated-based acceptance) by default. However, in Sections 5and 6we will also
discuss transformations towards automata with state-based acceptance.
If the acceptance condition of an automaton is represented as a condition of
kind X(cf. Table 1), we call it an X-automaton. We assume that each transition
of a parity-automaton is colored with exactly one color; this can be achieved by
substituting the set Cin a transition qa:C
qby min C(if C=) or by {|Γ|+1}
if C=. (If Cis a singleton we will omit the brackets in the notation).
Labeled trees. Atree is a non-empty prefix-closed set TNwhose elements
are called nodes. It is partially ordered by the prefix relation; if xywe say
that xis an ancestor of yand yis a descendant of x(we add the adjective
“strict” if moreover x=y). The empty string εis the root of the tree. The set
of children of a node xTis ChildrenT(x) = {x·iT:iN}. The set of
leaves of Tis Leaves(T) = {xT:ChildrenT(x) = ∅}. Nodes belonging to a
same set ChildrenT(x) are called siblings, and they are ordered from left to right
by increasing value of their last component. If Ais a set of labels, an A-labeled
tree is a pair T , ηof a tree Tand a map η:TA. The depth of a node xis
Depth(x) = |x|. The height of Tis Height(T) = max
xTDepth(x).
3 The Alternating Cycle Decomposition
The Alternating Cycle Decomposition (ACD), proposed by Casares et al. [6], is
a generalization of the Zielonka tree. The ACD of an automaton Ais a forest, a
collection of trees, labeled with accepting and rejecting cycles of the automaton.
For each SCC of Awe have a unique tree and the labeling of each tree alternates
between accepting and rejecting cycles. Thus the ACD captures the complexity
of the cycle structure of each SCC. We present now the definition of the ACD
adapted to TELA.
For the rest of this section, let A= (Q, Σ, Q0, ∆, Γ, α) be a TELA and let
G
A= (Q, E) be the associated graph with edges colored by γ:E2Γ. We lift
γto sets and define γ(E) = S
eE
γ(e) for every subset EE.
Practical Applications of the Alternating Cycle Decomposition 103
Definition 1. Acycle of Ais a subset of edges Eforming a closed path in
G
A. A cycle is accepting (resp. rejecting) if γ()α(resp. γ()α). The set
of states of a cycle is States() = {qQ:some epasses through q}. The
set of cycles of Ais denoted Cycles(A). It is (partially) ordered by set inclusion.
Definition 2 ([6]). Let S1, . . . , Skbe an enumeration of the strongly connected
components of G
A. The Alternating Cycle Decomposition of A, denoted ACD(A),
is a collection of kCycles(A)-labeled trees ⟨T1,...,Tkwith Ti=Ti, ηisuch that:
ηi(ε)is the set of edges of Si, for i= 1, . . . , k.
If xTiand ηi(x)is an accepting cycle, then xhas a child in Tifor each
maximal element in {Cycles(A) : ηi(x)and is rejecting}. In this
case, we say that xis a round node.
If xTiand ηi(x)is a rejecting cycle, then xhas a child in Tifor each
maximal element in {Cycles(A) : ηi(x)and is accepting}. In this
case, we say that xis a square node.
If qQis a state belonging to the SCC Siin A, we define the tree associated
to qas the subtree Tq=Tq, ηqgiven by:
Tq={ε}∪{xTi:qStates(ηi(x))}, ηq=ηi|Tq.
Remark 1. We provide examples online at https://spot.lrde.epita.fr/ipynb/zlk
tree.html and an executable copy of this notebook is included in the artifact [8].
4 An Efficient Computation of the ACD
In this section we give an algorithm to compute the Alternating Cycle Decom-
position of an Emerson-Lei automaton A, implemented in Owl [18] and Spot [9].
This can be done by first computing an SCC-decomposition of G
Awhich gives us
the labels of the roots of the trees ⟨T1,...,Tk, and then recursively computing
the children of the nodes of each tree, following the definition of ACD(A). Algo-
rithm 1shows how to compute the children of a given node and uses notation
we introduce now.
Let CΓbe a subset of colors and let S= (QS, ES)G
Abe a subgraph.
We define the projection of Son C, denoted SC= (QS, E
S), as the subgraph
of Sobtained by removing the edges eESsuch that γ(e)C, that is,
E
S={(q, D, q )ES:DC}. We write Colors(S) = SeESγ(e). We say
that S S is an C-strongly connected component in S(C-SCC) if it is an SCC
of Sand Colors(S) = C. Further, maxis the set of all maximal elements
according to the partial order defined by .
Note that Algorithm 1uses Algorithm 2, which simplifies the Emerson-Lei
conditions before passing the formula to a Max-SAT function (a SAT-solver
that computes maximal satisfying assignments, e.g., by clause blocking) [4]. This
preprocessing ensures that the ACD for Rabin or Streett acceptance conditions
can be constructed without making use of the general purpose algorithm for
computing maximal satisfying assignments.
104 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
Algorithm 1 Computing the children of a node.
1: Input: A cycle S=ηi(x) corresponding to the label of a node xof ACD(A).
2: Output: The set of labels for the children of x, (S1,...,Sk).
3: function Compute-Children(S)
4: children ,CColors(S)
5: if Cαthen Maximal subsets DCsuch that DαCα
6: {C1,...,Ck} Max-Satisfying-Subsets(C, ¬α)
7: else
8: {C1,...,Ck} Max-Satisfying-Subsets(C, α)
9: for D {C1,...,Ck}do
10: for SSCCs of SDdo These might not be D-SCC in S
11: if Colors(S)αDαthen
12: children children {S}
13: else
14: children children Compute-Children(S)
15: return maxchildren Remove from children non-maximal cycles
Algorithm 2 The subprocedure Max-Satisfying-Subsets.
1: Input: A subset of colors CΓand an EL condition αEL(Γ).
2: Output: max{DC:Dα}.
3: function Max-Satisfying-Subsets(C, α)
4: if Cαthen
5: return {C}
6: αα[if cCthen celse ]Replace colors not in Cby false
7: L {cC:¬cdoes not occur in α}
8: if L=then
9: αα[if cLthen else c]Replace colors in Lby true
10: {C1,...,Ck} Max-Satisfying-Subsets(C\L, α)
11: return {C1L,...,CkL}
12: if α=¬c1 · ·· ¬cnthen
13: return {{c1,...,cn} \ {ci}: 1 in}}
14: return Max-SAT(α)
Memoization. To optimize the construction of the ACD and to avoid duplicated
recursive calls, we perform two kinds of memoization: First, we memoize the
results of calling Algorithm 2from Algorithm 1. (Thus we implicitly construct
a Zielonka DAG for α.) Second, we memoize the recursive calls to Algorithm 1:
this is useful, as distinct nodes in the ACD can be labeled by the same cycles.
5 From Emerson-Lei to Parity Automata
In this section we describe the transformation from TELA to parity automata
using the Alternating Cycle Decomposition [6]. This transformation provides
strong optimality guarantees: the resulting parity automaton has minimal size
Practical Applications of the Alternating Cycle Decomposition 105
among those that can be produced without merging states from the TELA and
it uses an optimal number of colors (Theorem 1). We also show that this trans-
formation can be adapted to produce state-based automata. Note that in this
case we loose the first optimality guarantee.
5.1 The ACD Transformation
Let A= (Q, Σ, Q0, ∆, Γ, α) be a TELA and let ACD(A) = ⟨T1,...,Tk. We
introduce the following notation that will allow us to move in the ACD.
Given a transition e=qa:C
qsuch that both qand qbelong to the i-th
SCC of Aand a node xTi, we define Support(x, e) to be the least ancestor z
of xin Tisuch that eηi(z). If Support(x, e)=xand it is not a leaf in Tq, let
zbe the only child of Support(x, e) that is an ancestor of x, and let y1, . . . , ys
be an enumeration from left to right of the nodes in ChildrenTq(Support(x, e)).
We define NextBranch(x, e) as:
Support(x, e),if Support(x, e) = xor if Support(x, e) is a leaf in Tq,
y1,if z=ys,
yj+1,if z=yj,1j < s.
We define a parity automaton PACD(A)= (P, Σ, P0, P, ΓP, β ) (ACD transform
of A) equivalent to Aas follows:
States. The states of PACD(A)are of the form (q , x), for qQand xa leaf of
the tree associated to q. Initial states are of the form (q0, x) with q0Q0is
an initial state in Aand xis the leftmost leaf on its corresponding tree.
P=[
qQ
{qLeaves(Tq), P0={(q0, x) : q0Q0,xthe leftmost leaf in Tq0}.
Transitions. For each transition e=qa:C
qin and each state (q, x)P,
let us define a transition (q, x)a:p
(q, y) in Pas follows: first, qis the
destination state for the original transition. If qand qare not in the same
SCC then yis defined as the leftmost leaf in Tqand p= 1 (except if all Ti
have height 1 and a rounded root: in that case p= 0). Otherwise, if both q
and qbelong to the i-th SCC of A, then the destination leaf yis the leftmost
descendant of NextBranch(x, e) in Tq.
We define the color pof the transition as Depth(Support(x, e)), if the root
of Tiis a round node (ηi(ε)α), or as Depth(Support(x, e)) + 1 otherwise.
We remark that in this way, pis even if and only if ηi(z)α.
Parity condition. The condition βis a parity min even condition (cf. Table 1).
Remark 2. If the color 0 does not appear on any transition then we shift all
colors by 1 and replace βby a parity min odd condition.
Proposition 1 ([6]). The automaton PACD (A)recognizes L(A).
106 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
Remark 3. The ACD transformation preserves many properties (determinism,
completeness, good-for-gameness, unambiguity...) of the automaton A, see [6].
Remark 4. Since the number of colors used by PACD(A)is at most the height of
a tree in ACD(A), we obtain that PACD(A)never uses more colors than |Γ|+ 1.
Furthermore, since the TELA does not require all transitions to have a color, we
can omit the maximal one and produce an automaton with at most |Γ|colors.
In order to state the optimality of this transformation we introduce the
notion of locally bijective morphisms of automata. Given an automaton A=
(Q, Σ, Q0, ∆, Γ, α) and qQ, we denote OutA(q) the set of outgoing transitions
of q, i.e., OutA(q) = {qa:C
q:aΣ, C Γ , qQ}.
Definition 3 ([6]). Let A= (Q, Σ, Q0, ∆, Γ, α)and A= (Q, Σ , Q
0, , Γ , α)
be two EL automata over Σ. A locally bijective morphism from Ato A(denoted
φ:A A) is a pair of maps φQ:QQ,φ:such that:
φQ|Q0is a bijection between Q0and Q
0.
φq1
a:C
q2=φQ(q1)a:C
φQ(q2)for some CΓ.
For every qQ,φ|Out A(q)is a bijection between OutA(q)and OutA(φQ(q))
For every run ϱωin A,ϱis accepting iff φ(ϱ)is accepting in A.
Theorem 1 ([6]). Let Abe an Emerson-Lei automaton, and let PACD (A)be
the parity automaton obtained by applying the ACD transformation. Then,
There is a locally bijective morphism φ:PACD (A) A.
If Pis a parity automaton admitting a locally bijective morphism to A, then
|PACD(A)| |P |.
If Pis a parity automaton recognizing L(A),Puses at least as many colors
as PACD(A).
Note that all state-duplicating constructions mentioned in the introduction
create locally bijective morphisms. Thus the above theorem shows that the ACD
transformation duplicates the least number of states.
5.2 Experimental Results
Figures 1and 2compare four different paritization procedures applied to 1065
TELA generated5from LTL formulas from the Synthesis Competition. These
automata have between 2 and 55 colors (mean 5.92, median 5) and between
1 and 245761 states (mean 2023.20, median 20). Automata with fewer than 2
colors have been ignored since they are trivial to paritize.
The procedures are Owl’s and Spot’s implementation of ACD transform, as
well as Spot’s implementation of the Zielonka Tree transform [6], and Spot’s
previous paritization function (called to parity) [28]. We refer the reader to
Section 8for information about the used versions. Two dotted lines on the sides
5We used ltl2tgba -G -D from Spot, and ltl2dela from Owl.
Practical Applications of the Alternating Cycle Decomposition 107
101103105
Owl ACD trans. (states)
101
103
105
Spot ACD trans. (states)
9 cases
above diag.
14 cases
below diag.
101103105
Spot ZlkTree trans. (states)
4 cases
above diag.
877 cases
below diag.
101103105
Spot to parity (states)
1 case
above diag.
123 cases
below diag.
Fig. 1: Comparison of the output size of the four paritization procedures.
104101102
Owl ACD trans. (s)
103
101
101
103
Spot ACD trans. (s)
180 cases
above diag.
884 cases
below diag.
102101
Spot ZlkTree trans. (s)
552 cases
above diag.
508 cases
below diag.
102101
Spot to parity (s)
37 cases
above diag.
1020 cases
below diag.
Fig. 2: Time spent performing these four paritization procedures.
of the plots hold cases that did not finish within 500 seconds (red, inner line),
or where the tool reported an error6(orange, outer line). Pink dots represent
input automata that already have parity acceptance: for those, running the ACD
transform still makes sense as it will produce an output with a minimal number
of colors. However, Owl’s implementation, which mostly cares about reducing the
number of states, uses a shortcut and will return the input automaton unmodified
in this case: this explains the pink cloud on the left of Figure 2.
Owl’s and Spot’s implementations of the ACD transform produce automata
with the same size, as expected. The cases that are not on the diagonal all
correspond to timeouts or tool errors. The Zielonka Tree transform, which does
not take the automaton structure into consideration, produces automata that
are on the average 2.11 times bigger (median 1.60), while its runtime is on the
average 6.55 times slower (median 0.97). Lastly, Spot’s to parity function is
not far from the optimal size given by ACD transform: on the average its output
is 3.28 times larger, but the median of that size ratio is 1.00. Similarly, it is on
the average 15.94 times slower, but with a median of 1.04.
6Either “out-of-memory”, or “too many colors” as Spot is restricted to 32 colors.
108 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
5.3 ACD Transformation Towards State-Based Parity Automata
Sometimes it is desired to obtain an automaton with the acceptance defined over
states. A state-based parity automaton is a tuple A= (Q, Σ , Q0, ∆, ϕ:QN)
where (Q, Σ, Q0, ) is the underlying structure defined as for transition-based
automata in Section 2(with the only difference that Q×Σ×Qnow), and
ϕ:QNis a map associating colors to states. A run over Ais accepting if the
minimal color visited infinitely often is even.
Let Abe a TELA with ACD(A) = ⟨T1,...,Tk. We define an equivalent
state-based parity automaton Psb-ACD(A)= (P, Σ , P0, P, ϕ:PN) as follows:
States. States are of the form (q, x), for qQand xTq(now the second
component corresponds to a node of the ACD that is not necessarily a leaf).
The set of initial states is the same as for PACD(A):
P=[
qQ
{q} × Tq, P0={(q0, x) : q0Q0,xthe leftmost leaf in Tq0}.
Transitions. For each transition e=qa:C
qand (q, x)Pwe define
one transition (q, x)a
(q, y)P. To specify the destination node y, we
distinguish two cases:
Suppose that xis a leaf in Tq. If NextBranch(x, e) is not the leftmost child
of Support(x, e) in Tq, then yis the leftmost leaf below NextBranch(x, e) in
Tq(as in the transition-based case). If NextBranch(x, e) is the leftmost child
(a “lap” around Support(x, e) is finished), then we set y=Support(x, e).
If xis not a leaf in Tq, the destination yis determined exactly as if the
transition started in (q, x) for xthe leftmost leaf in Tqunder x.
Parity condition. ϕ((q , x)) = Depth(x), if the root of Tqis a round node, and
ϕ((q, x)) = Depth(x) + 1 otherwise.
Note that we do not have the same optimality guarantee as in the transition-
based case: If xis not a leaf in its corresponding tree, then the states of the form
(q, x)Pare not necessarily reachable in Psb-ACD (A). We only need to add
those that can be reached from the initial state. However, the set of reachable
states does depend on the ordering of the children in the trees of the ACD, and
therefore the size of the final automaton depends on this ordering.
We propose a heuristic to order the children of nodes in ACD(A). Let Tibe
a tree in ACD(A) and xTi. We define:
Di(x) = {qQ:qa
q/ηi(x),for some qStates (ηi(x)), a Σ}.
The heuristic consists in ordering the children of a node Tiby decreasing |Di(x)|.
Experiments involving transformations towards state-based automata and test-
ing this heuristic can be found in Section 6.2.
Practical Applications of the Alternating Cycle Decomposition 109
6 Degeneralization of Generalized B¨uchi Automata
The transformation of generalized-B¨uchi automata with ncolors into uchi au-
tomata (with a single color) is known as degeneralization and has been a
very common processing step between algorithms that translate temporal-logic
formulas into generalized-B¨uchi automata, and model-checking algorithms that
(used to) only work with uchi automata. While it initially consisted in making
2ncopies of the GBA [30, Appendix B] to remember the set of colors that had
yet to be seen, degeneralization to state-based uchi acceptance can be done us-
ing only n+ 1 copies once an arbitrary order of colors has been selected [13]. A
similar construction to transition-based uchi acceptance requires only ncopies
of the original automaton. Different orders of colors may lead to a different num-
bers of reachable states in the uchi automaton. Some tools even attempted to
start the degeneralization in different copies to reduce the number of reachable
states [14]. Nowadays, an implementation such as the degeneralization of Spot
implements several SCC-based optimizations [2] to reduce the number of output
states, but is still sensitive to the arbitrary order selected for colors.
6.1 Transition-based Degeneralization
This order-sensitivity of the degeneralization, even in its transition-based vari-
ant, makes a striking difference with ACD. When applied to a generalized uchi
automaton that has some accepting and rejecting paths, the ACD-transform pro-
duces an automaton with acceptance Inf(0)Fin(1). Since all transitions are ei-
ther labeled by 0or 1, color 1is superfluous7and the condition can be reduced
to Inf(0). In this context, ACD-transform therefore gives us a transition-based
uchi automaton by duplicating the fewest number of states (Theorem 1(2)).
It can be seen that the cycling around the different children of the ACD
(whose ordering is arbitrary) performed during ACD-transform is similar to the
process used in traditional degeneralization. What makes the latter sensitive to
color ordering is that it only “sees” one transition at a time, while the ACD
provides a view of the cycles. For instance a degeneralization would process
the sequence xyz
0 1 differently from the sequence xyz
1 0
depending on the order in which colors are expected to be encountered. However,
if there is no other transition reaching or leaving ythe two colors will always be
seen together so their order should not matter: the two transitions belong to the
same node of the ACD. The propagation of colors [28] is a related preprocessing
step that can improve the degeneralization by propagating all colors common to
the incoming transitions of a state to its outgoing transitions and vice-versa. It
would turn the previous situation into xyz
0 1 0 1 making the color
order selected by the degeneralization irrelevant (in this case).
A comparison of the output size of the traditional degeneralization imple-
mented in Spot (which includes several optimizations learned over the years)
7In an automaton with “parity min” acceptance where all transitions are colored, the
maximal color can always be omitted and replaced by the empty set.
110 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
3 4 5 6 7 8 9 10 11
TBA.degen (states)
3
4
5
6
7
8
9
10
11
TBA.acd (states)
31 34 9 1 1
76 54 44 20 5 1
86 40 25 17 3
108 33 20 11 3 3
84 32 6 11 5
121 5 6 16
15 2 1
10 4
57
0 case
above diag.
419 cases
below diag.
581 cases
on diag. 3 4 5 6 7 8 9 10 11
TBA.degen propagate (states)
3
4
5
6
7
8
9
10
11
TBA.acd (states)
43 28 4 1
132 32 15 16 5
128 26 13 4
137 27 10 3 1
115 17 2 4
128 6 5 9
18
11 3
57
0 case
above diag.
235 cases
below diag.
765 cases
on diag.
Fig. 3: Two-dimensional histogram of the sizes of 1000 automata, degeneralized
to transition-based uchi automata, using Spot’s degeneralization function (with
or without propagation of colors), or using ACD-transform.
against that of ACD-transform is given in the left plot of Figure 3. Unsurpris-
ingly, because of ACD-transform’s optimality, there are no cases where ACD
loses to Spot’s transition-based degeneralization. The use of the propagation of
colors (right of the plot) is an improvement (the non-optimal cases dropped from
419 to 235) but not a cure.
Remark 5. The input automata used in this section and the next one is a set of
1000 randomly generated, minimal, deterministic, transition-based generalized
uchi automata, with 3 or 4 states and 2 or 3 colors. The reason for using such
small minimal automata is to be able to use a SAT-based minimization [1] on
the degeneralized state-based output in the next section to estimate how large
the gap between an optimal and our procedure is.
6.2 State-based degeneralization
If ACD is used to produce a state-based output, as explained in Subsection 5.3,
the obtained automaton is not guaranteed to be minimal with respect to locally
bijective morphisms. In this case we can obtain a weaker optimality result:
Proposition 2. Let Abe a generalized uchi automaton, and let Bsb−ACD(A)
be the state-based uchi automaton obtained by applying the ACD state-based
transformation. If Bbe is a state-based B¨uchi automaton admitting a locally
bijective morphism to A, then |Bsb−ACD(A)| |B|+|A|.
Proof. Let Bbe a state-based uchi automaton admitting a locally bijective
morphism to A. We can transform it into a transition-based uchi automaton
B
trans by setting the transitions leaving accepting states to be accepting. This
automaton has the same size than Band it also accepts a locally bijective
morphism to A. Therefore, by Theorem 1, we have that |BACD(A)| |B
trans|=
Practical Applications of the Alternating Cycle Decomposition 111
3 4 5 6 7 8 9 10 11 12 13 14 15
SBA.acd (states)
3
4
5
6
7
8
9
10
11
12
13
14
15
SBA.degen (states)
11
3 38 2
3 14 58 5
4 10 31 70 7 1
4 27 39 83 13 1
2 13 23 27 73 13
2 4 15 17 24 79 24 3
1 4 10 19 9 37 10 1
3 7 10 11 6 12 2
2 7 9 3 2 22 2
1 1 3 8 2 2 7 6 4
1 2 4 5 8 2
1 2 3 1 1 4
402 cases
above diag.
96 cases
below diag.
502 cases
on diag. 3 4 5 6 7 8 9 10 11 12 13
SBA.acd.heuristic (states)
3
4
5
6
7
8
9
10
11
12
13
14
15
SBA.degen (states)
11
3 38 2
3 14 61 2
4 10 37 71 1
4 36 38 86 3
2 14 25 48 61 1
2 4 18 15 36 93
1 8 11 15 28 28
3 10 14 11 7 6
4 9 8 2 4 20
1 2 10 1 4 1 15
3 6 13
1 1 1 3 6
498 cases
above diag.
9 cases
below diag.
493 cases
on diag.
Fig. 4: Comparison of three ways to degeneralize to state-based uchi: (acd,
acd.heuristic) using the state-based version of ACD-transform with or without
heuristic, and (degen) classical degeneralization.
3 4 5 6 7 8 9 10 11 12 13 14 15
SBA.acd (states)
3
4
5
6
7
8
9
10
11
12
13
SBA.acd.heuristic (states)
21
70
13619
13924 2
112841 6 1
210631 1
89 55 8 2
24 14 1
8 6 1
25 5
9 15 10
3 cases
above diag.
241 cases
below diag.
756 cases
on diag. 3 4 5 6 7 8 9
SBA.minimal (states)
3
4
5
6
7
8
9
SBA.acd.heuristic (states)
21
70
12 143
6 14 141
1 3 32 118
1 2 17 38
2 3 25
94 cases
above diag.
0 case
below diag.
555 cases
on diag.
Fig. 5: Effect of the heuristic for ordering children of the ACD, and comparison
to the minimal degeneralized automata (when known).
|B|, where BACD (A)is the transition-based automaton obtained applying the
ACD-transformation. We claim that |Bsb−AC D(A)| |BAC D(A)|+|A| (therefore
implying that |Bsb−ACD(A)| |B|+|A|). Indeed, the set of states of Bsb−ACD (A)
is the union of the set of states of BACD(A)and a subset of nodes of the form
(q, ε), where εis the root of Tq. There are at most |A| nodes of this form.
Figure 4compares three ways to perform state-based degeneralization. The
ACD comes in two variants, with or without the heuristic of Section 5.3, and it
is compared against the state-based degeneralization of Spot.
Figure 5shows how the heuristic variant compares to the one without, and
how it compares with the size of a minimal DBA, when its size could be computed
in reasonable time (in 649 cases). Note that there might not be a local bijective
112 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
morphism between the input automaton and the minimal DBA computed this
way, nonetheless these minimal size automata can serve as a reference point to
estimate the quality of a degeneralization. Compared to this subset of minimal
DBA, the average number of additional states produced by the state-based ACD
is 0.17 with heuristics, and 0.33 without. Comparatively, Spot’s degeneralization
has an average of 1.21 extra states.
7 Deciding Typeness
We highlight now how the ACD can be used to decide typeness of deterministic
TELA. This problem, first introduced by Krishnan and Brayton [19], consists of
deciding whether we can replace the acceptance condition of a given automaton
by another (hopefully simpler) without changing the transition structure and
preserving the language (see Table 1for a list of common acceptance conditions).
Let A= (Q, Σ, Q0, ∆, Γ, α) be a TELA. We say that Ais X-type, for X
{B,C,GB,GC,P,R,S}, if there is an X-automaton over the same structure,
A= (Q, Σ, Q0, , Γ , β ) (where and only differ on the coloring of the
transitions), such that L(A) = L(A) and βbelongs to X. We emphasize that
we permit to use a different set of colors Γin A. Some conditions can always
be rewritten as conditions of other kinds (for example, uchi conditions can be
expressed as parity ones, so being B-type implies being P-type). We should not
confuse this notion with the expressive power of deterministic automata using
these conditions. For example, both deterministic parity automata and Rabin
automata recognize all ω-regular languages, but there are Rabin automata that
are not parity-type. Further, we say that an automaton Ais weak if for every
SCC Sof A, all cycles in Sare accepting or all of them are rejecting.
The following result shows that the ACD is a sufficient data structure for
deciding typeness for many common acceptance conditions. We remark that the
second item adds to the results of Casares et al. [7] (this statement only holds if
transitions of automata are labeled with subsets of colors, which is not allowed
in their model).
Proposition 3 ([7, Section 5.2]). Let Abe a deterministic TELA such that
all its states qQare reachable and let ACD(A) = ⟨T1,...,Tkbe its Alternat-
ing Cycle Decomposition. Then the following statements hold:
1. Ais Rabin-type (resp. Streett type) if and only if for every qQ, every round
node (resp. square node) of Tqhas at most one child in Tq. It is parity-type
if and only if it is both Rabin and Streett-type.
2. Ais generalized B¨uchi-type (resp. generalized co-B¨uchi-type) if and only if
for every 1ik, Height(Ti)2and in case of equality, the root of Tiis
a round node (resp. square node).
3. Ais weak if and only if for every 1ik, Height(Ti) = 1.
Also, the least number of colors used by a deterministic parity automaton
recognizing L(A)is max
1ikHeight(Ti) + ν, where ν= 0 if the root of all trees of
maximal height have the same shape (round or square), and ν= 1 otherwise.
Practical Applications of the Alternating Cycle Decomposition 113
If one of the previous conditions holds, then ACD(A)also provides an effec-
tive procedure to relabel Awith the corresponding acceptance condition.
Remark 6. The ACD gives a typeness result for each SCC of the automaton,
which allows to simplify the acceptance condition of each of them indepen-
dently. Further, implications from right to left in Proposition 3also hold for
non-deterministic automata.
Proposition 3provides an effective procedure to check typeness of TELA:
we just have to build the ACD and verify that it has the appropriate shape.
Spot’s implementation of ACD has options to abort the construction as soon
as it detects that the shape is wrong. Moreover, if an automaton is parity-type,
the ACD provides a method to relabel the automaton with a minimal number
of colors. Finally, if the automaton already has parity acceptance, the ACD
transformation boils down to the algorithm of Carton and Maceiras [5].
8 Availability
The ACD and the transformations based on it are currently implemented in two
open-source tools: Spot 2.10 [9] and Owl 21.0 [18]. (The original developments
were independent before the authors met and worked on this joint paper.)
In Spot 2.10, the ACD can be played with using the Python bindings. The acd
class implements the decomposition, and will render it as an interactive forest of
nodes that can be clicked to highlight the relevant cycles in the input automaton.
The acd transform() and acd transform sbacc() implements the transition-
based and state-based variant of the paritization procedure. Additionally, the
acd class has options to heuristically order the children to favor the state-based
construction, or to abort the construction as soon as it is clear that the ACD
does not have Rabin or Street shape (in case one wants to use it to establish
typeness of automata). All these features are illustrated at https://spot.lrde.ep
ita.fr/ipynb/zlktree.html. In the future, ACD will be used more by the rest of
Spot, and will be one option of the ltlsynt tool (for LTL synthesis).
In Owl, the ACD transformation is available through the aut2parity com-
mand. This command reads an automaton in the HOA format [3] using arbi-
trary acceptance, and produces a parity automaton in the same format. The tool
Strix [23], which builds upon Owl, gained in version 21.0.0 the option to use the
ACD-construction as an intermediate step.
Instructions to reproduce all experiments and included in the artifact [8].
9 Conclusion
We have shown that ACD is more than a theoretically-appealing construction:
our two implementations show that the construction is very usable in practice,
and provide a baseline for further improvements. We have also shown that ACD is
a Swiss-army knife for ω-automata in the sense that it can generalize and replace
several specific constructions (paritization, degeneralization, typeness checks).
114 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
References
1. Baarir, S., Duret-Lutz, A.: Mechanizing the minimization of deterministic gen-
eralized uchi automata. In: Proceedings of the 34th IFIP International Confer-
ence on Formal Techniques for Distributed Objects, Components and Systems
(FORTE’14), Lecture Notes in Computer Science, vol. 8461, pp. 266–283, Springer
(Jun 2014), https://doi.org/10.1007/978-3-662-43613-4 17
2. Babiak, T., Badie, T., Duret-Lutz, A., ret´ınsk´y, M., Strejˇcek, J.: Compositional
approach to suspension and other improvements to LTL translation. In: Proceed-
ings of the 20th International SPIN Symposium on Model Checking of Software
(SPIN’13), Lecture Notes in Computer Science, vol. 7976, pp. 81–98, Springer (Jul
2013), https://doi.org/10.1007/978-3-642- 39176-7 6
3. Babiak, T., Blahoudek, F., Duret-Lutz, A., Klein, J., ret´ınsk´y, J., uller, D.,
Parker, D., Strejˇcek, J.: The hanoi omega-automata format. In: Kroening, D.,
as˘areanu, C.S. (eds.) Computer Aided Verification, pp. 479–486, Springer Inter-
national Publishing (2015)
4. Battiti, R., , Protasi, M.: Handbook of Combinatorial Optimization: Volume 1–3,
chap. Approximate Algorithms and Heuristics for MAX-SAT, pp. 77–148. Springer
US (1998), ISBN 978-1-4613-0303-9, https://doi.org/10.1007/978-1-4613-0303- 9 2
5. Carton, O., Maceiras, R.: Computing the Rabin index of a parity automaton.
Informatique th´eorique et applications 33(6), 495–505 (1999), URL http://www.
numdam.org/item/ITA 1999 33 6 495 0/
6. Casares, A., Colcombet, T., Fijalkow, N.: Optimal transformations of games and
automata using Muller conditions. In: Bansal, N., Merelli, E., Worrell, J. (eds.) Pro-
ceedings of the 48th International Colloquium on Automata, Languages, and Pro-
gramming (ICALP’21), Leibniz International Proceedings in Informatics (LIPIcs),
vol. 198, pp. 123:1–123:14, Schloss Dagstuhl Leibniz-Zentrum ur Informatik,
Dagstuhl, Germany (2021), https://doi.org/10.4230/LIPIcs.ICALP.2021.123
7. Casares, A., Colcombet, T., Fijalkow, N.: Optimal transformations of muller condi-
tions. Extended version of [6], on ArXiv. (2021), https://arxiv.org/abs/2011.13041
8. Casares, A., Duret-Lutz, A., Meyer, K.J., Renkin, F., Sickert, S.: Artifact for the
paper “Practical applications of the alternating cycle decomposition”. https://do
i.org/10.5281/zenodo.5572613 (2021)
9. Duret-Lutz, A., Lewkowicz, A., Fauchille, A., Michaud, T., Renault, E., Xu, L.:
Spot 2.0 a framework for LTL and ω-automata manipulation. In: Proceedings of
the 14th International Symposium on Automated Technology for Verification and
Analysis (ATVA’16), Lecture Notes in Computer Science, vol. 9938, pp. 122–129,
Springer (Oct 2016), https://doi.org/10.1007/978-3-319-46520-3 8
10. Emerson, E.A., Lei, C.L.: Modalities for model checking (extended abstract):
Branching time strikes back. In: Proceedings of the 12th ACM symposium on
Principles of Programming Languages (POPL’85), pp. 84–96, ACM (1985), https:
//doi.org/10.1145/318593.318620
11. Esparza, J., ret´ınsk´y, J., Raskin, J.F., Sickert, S.: From LTL and limit-
deterministic uchi automata to deterministic parity automata. In: Proceedings of
the 23rd International Conference on Tools and Algorithms for the Construction
and Analysis of Systems (TACAS’17), Lecture Notes in Computer Science, vol.
10205, pp. 426–442, Springer-Verlag (2017), https://doi.org/10.1007/978-3-662-
54577-5 25
12. Esparza, J., ret´ınsk´y, J., Sickert, S.: A unified translation of linear temporal logic
to ω-automata. J. ACM 67(6) (Oct 2020), https://doi.org/10.1145/3417995
Practical Applications of the Alternating Cycle Decomposition 115
13. Gastin, P., Oddoux, D.: Fast LTL to uchi automata translation. In: Berry, G.,
Comon, H., Finkel, A. (eds.) Proceedings of the 13th International Conference on
Computer Aided Verification (CAV’01), Lecture Notes in Computer Science, vol.
2102, pp. 53–65, Springer-Verlag (2001), https://doi.org/10.1007/3-540-44585-4 6
14. Giannakopoulou, D., Lerda, F.: From states to transitions: Improving translation of
LTL formulæ to uchi automata. In: Peled, D., Vardi, M. (eds.) Proceedings of the
22nd IFIP WG 6.1 International Conference on Formal Techniques for Networked
and Distributed Systems (FORTE’02), Lecture Notes in Computer Science, vol.
2529, pp. 308–326, Springer-Verlag, Houston, Texas (Nov 2002)
15. Gr¨adel, E., Thomas, W., Wilke, T. (eds.): Automata Logics, and Infinite Games.
Springer, Berlin, Heidelberg (2002), https://doi.org/10.1007/3-540-36387-4
16. Gurevich, Y., Harrington, L.: Trees, automata, and games. In: Proceedings of the
14th annual ACM symposium on Theory of computing (STOC’82), pp. 60–65
(1982), https://doi.org/10.1145/800070.802177
17. Jacobs, S., Bloem, R., Colange, M., Faymonville, P., Finkbeiner, B., Khalimov,
A., Klein, F., Luttenberger, M., Meyer, P.J., Michaud, T., Sakr, M., Sickert, S.,
Tentrup, L., Walker, A.: The 5th reactive synthesis competition (SYNTCOMP
2018): Benchmarks, participants & results. CoRR abs/1904.07736 (2019), URL
http://arxiv.org/abs/1904.07736
18. Kret´ınsk´y, J., Meggendorfer, T., Sickert, S.: Owl: A library for ω-words, automata,
and LTL. In: Proceedings of the 16th International Symposium on Automated
Technology for Verification and Analysis (ATVA’18), Lecture Notes in Computer
Science, vol. 11138, pp. 543–550, Springer (2018), https://doi.org/10.1007/978-3-
030-01090-4 34
19. Krishnan, Sriram C.and Puri, A., Brayton, R.K.: Deterministic ωautomata vis-a-
vis deterministic buchi automata. In: Algorithms and Computation, pp. 378–386,
Springer Berlin Heidelberg, Berlin, Heidelberg (1994)
20. ret´ınsk´y, J., Meggendorfer, T., Waldmann, C., Weininger, M.: Index appearance
record for transforming Rabin automata into parity automata. In: Legay, A., Mar-
garia, T. (eds.) Proceedings of the 23st International Conference on Tools and
Algorithms for the Construction and Analysis of Systems (TACAS’17), Lecture
Notes in Computer Science, vol. 10205, pp. 443–460 (2017), https://doi.org/10.1
007/978-3-662-54577- 5 26
21. ret´ınsk´y, J., Meggendorfer, T., Waldmann, C., Weininger, M.: Index appearance
record with preorders. Acta Informatica (2021), https://doi.org/10.1007/s00236-0
21-00412-y
22. oding, C.: Optimal bounds for transformations of ω-automata. In: Proceedings
of the 19th Conference on Foundations of Software Technology and Theoretical
Computer Science (FSTTCS’99), Lecture Notes in Computer Science, vol. 1738,
pp. 97–109, Springer (1999), https://doi.org/10.1007/3-540-46691-6 8
23. Luttenberger, M., Meyer, P.J., Sickert, S.: Practical synthesis of reactive systems
from LTL specifications via parity games. Acta Informatica pp. 3–36 (2020), https:
//doi.org/10.1007/s00236-019-00349-3
24. oding, C.: Methods for the Transformation of ω-Automata: Complexity and
Connection to Second Order Logic. Master’s thesis, Institute of Computer Sci-
ence and Applied Mathematics Christian-Albrechts-University of Kiel (1998), URL
https://old.automata.rwth-aachen.de/users/loeding/diploma loeding.pdf
25. Meyer, P., Sickert, S.: On the optimal and practical conversion of Emerson-Lei
automata into parity automata. Unpublished manuscript, obsoleted by the work
of Casares et al. [6]. (2021)
116 A. Casares, A. Duret-Lutz, K.J. Meyer, F. Renkin, S. Sickert
26. Michaud, T., Colange, M.: Reactive synthesis from LTL specification with Spot.
In: Proceedings of the 7th Workshop on Synthesis, SYNT@CAV 2018, Electronic
Proceedings in Theoretical Computer Science (2018)
27. Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Proceedings
of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming
languages (POPL’89), pp. 179––190 (1989), https://doi.org/10.1145/75277.75293
28. Renkin, F., Duret-Lutz, A., Pommellet, A.: Practical “paritizing” of Emerson-Lei
automata. In: Proceedings of the 18th International Symposium on Automated
Technology for Verification and Analysis (ATVA’20), Lecture Notes in Computer
Science, vol. 12302, pp. 127–143, Springer (Oct 2020), https://doi.org/10.1007/97
8-3-030-59152- 6 7
29. Vardi, M.Y.: An automata-theoretic approach to linear temporal logic. In: Logics
for Concurrency: Structure versus Automata, volume 1043 of Lecture Notes in
Computer Science, pp. 238–266, Springer-Verlag (1996)
30. Vardi, M.Y., Wolper, P.: An automata-theoretic approach to automatic program
verification. In: Proceedings of the 1st Symposium on Logic in Computer Science
(LICS’86), pp. 332–344, IEEE Computer Society Press (Jun 1986)
31. Zielonka, W.: Infinite games on finitely coloured graphs with applications to au-
tomata on infinite trees. Theoretical Computer Science 200(1), 135–183 (1998),
https://doi.org/10.1016/S0304-3975(98)00009-7
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
Practical Applications of the Alternating Cycle Decomposition 117
Sky Is Not the Limit
Tighter Rank Bounds for Elevator Automata in
Büchi Automata Complementation
Vojtěch Havlena , Ondřej Lengál , and Barbora Šmahlíková
ihavlena@fit.vut.cz,lengal@vut.cz,xsmahl00@vut.cz
Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Abstract. We propose several heuristics for mitigating one of the main causes
of combinatorial explosion in rank-based complementation of Büchi automata
(BAs): unnecessarily high bounds on the ranks of states. First, we identify elevator
automata, which is a large class of BAs (generalizing semi-deterministic BAs),
occurring often in practice, where ranks of states are bounded according to the
structure of strongly connected components. The bounds for elevator automata
also carry over to general BAs that contain elevator automata as a sub-structure.
Second, we introduce two techniques for refining bounds on the ranks of BA states
using data-flow analysis of the automaton. We implement out techniques as an
extension of the tool Ranker for BA complementation and show that they indeed
greatly prune the generated state space, obtaining significantly better results and
outperforming other state-of-the-art tools on a large set of benchmarks.
1 Introduction
Büchi automata (BA) complementation has been a fundamental problem underlying
many applications since it was introduced in 1962 by Büchi [8,17] as an essential part of
a decision procedure for a fragment of the second-order arithmetic. BA complementation
has been used as a crucial part of, e.g., termination analysis of programs [13,20,10] or
decision procedures for various logics, such as S1S [8], the first-order logic of Sturmian
words [33], or the temporal logics ETL and QPTL [38]. Moreover, BA complementation
also underlies BA inclusion and equivalence testing, which are essential instruments in
the BA toolbox. Optimal algorithms, whose output asymptotically matches the lower
bound of (0.76𝑛)𝑛[43] (potentially modulo a polynomial factor), have been devel-
oped [37,1]. For a successful real-world use, asymptotic optimality is, however, not
enough and these algorithms need to be equipped with a range of optimizations to make
them behave better than the worst case on BAs occurring in practice.
In this paper, we focus on the so-called rank-based approach to complementation,
introduced by Kupferman and Vardi [24], further improved with the help of Friedgut [14],
and finally made optimal by Schewe [37]. The construction stores in a macrostate partial
information about all runs of a BA Aover some word 𝛼. In addition to tracking states
that Acan be in (which is sufficient, e.g., in the determinization of NFAs), a macrostate
also stores a guess of the rank of each of the tracked states in the run DAG that captures
all these runs. The guessed ranks impose restrictions on how the future of a state might
look like (i.e., when Amay accept). The number of macrostates in the complement
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 118–136, 2022.
https://doi.org/10.1007/978-3-030-99527-0_7
depends combinatorially on the maximum rank that occurs in the macrostates. The
constructions in [24,14,37] provides only coarse bounds on the maximum ranks.
A way of decreasing the maximum rank has been suggested in [15] using a PSpace
(and, therefore, not really practically applicable) algorithm (the problem of finding the
optimal rank is PSpace-complete). In our previous paper [19], we have identified several
basic optimizations of the construction that can be used to refine the tight-rank upper
bound (TRUB) on the maximum ranks of states. In this paper, we push the applicability
of rank-based techniques much further by introducing two novel lightweight techniques
for refining the TRUB, thus significantly reducing the generated state space.
Firstly, we introduce a new class of the so-called elevator automata, which occur
quite often in practice (e.g., as outputs of natural algorithms for translating LTL to
BAs). Intuitively, an elevator automaton is a BA whose strongly connected components
(SCCs) are all either inherently weak1or deterministic. Clearly, the class substantially
generalizes the popular inherently weak [6] and semi-deterministic BAs [11,3,4]). The
structure of elevator automata allows us to provide tighter estimates of the TRUBs,
not only for elevator automata per se, but also for BAs where elevator automata occur
as a sub-structure (which is even more common). Secondly, we propose a lightweight
technique, inspired by data flow analysis, allowing to propagate rank restriction along
the skeleton of the complemented automaton, obtaining even tighter TRUBs. We also
extended the optimal rank-based algorithm to transition-based BAs (TBAs).
We implemented our optimizations within the Ranker tool [18] and evaluated our
approach on thousands of hard automata from the literature (15 % of them were elevator
automata that were not semi-deterministic, and many more contained an elevator sub-
structure). Our techniques drastically reduce the generated state space; in many cases we
even achieved exponential improvement compared to the optimal procedure of Schewe
and our previous heuristics. The new version of Ranker gives a smaller complement in
the majority of cases of hard automata than other state-of-the-art tools.
2 Preliminaries
Words, functions. We fix a finite nonempty alphabet Σand the first infinite ordinal
𝜔={0,1, . . .}. For 𝑛𝜔, by [𝑛]we denote the set {0, . . . , 𝑛}. For 𝑖𝜔we use
bb𝑖cc to denote the largest even number smaller of equal to 𝑖, e.g., bb42cc =bb43cc =42.
An (infinite) word 𝛼is represented as a function 𝛼:𝜔Σwhere the 𝑖-th symbol is
denoted as 𝛼𝑖. We abuse notation and sometimes also represent 𝛼as an infinite sequence
𝛼=𝛼0𝛼1. . . We use Σ𝜔to denote the set of all infinite words over Σ. For a (partial)
function 𝑓:𝑋𝑌and a set 𝑆𝑋, we define 𝑓(𝑆)={𝑓(𝑥) | 𝑥𝑆}. Moreover, for
𝑥𝑋and 𝑦𝑌, we use 𝑓 {𝑥↦→ 𝑦}to denote the function (𝑓\ {𝑥↦→ 𝑓(𝑥)}) {𝑥↦→ 𝑦}.
Büchi automata. A (nondeterministic transition/state-based) Büchi automaton (BA)
over Σis a quadruple A=(𝑄 , 𝛿, 𝐼 , 𝑄𝐹𝛿𝐹)where 𝑄is a finite set of states,𝛿:𝑄×Σ
2𝑄is a transition function,𝐼𝑄is the sets of initial states, and 𝑄𝐹𝑄and 𝛿𝐹𝛿are
the sets of accepting states and accepting transitions respectively. We sometimes treat 𝛿
as a set of transitions 𝑝𝑎
𝑞, for instance, we use 𝑝𝑎
𝑞𝛿to denote that 𝑞𝛿(𝑝, 𝑎).
1An SCC is inherently weak if it either contains no accepting states or, on the other hand, all
cycles of the SCC contain an accepting state.
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 119
Moreover, we extend 𝛿to sets of states 𝑃𝑄as 𝛿(𝑃, 𝑎)=Ð𝑝𝑃𝛿(𝑝, 𝑎), and to sets
of symbols ΓΣas 𝛿(𝑃, Γ)=Ð𝑎Γ𝛿(𝑃, 𝑎). We define the inverse transition function
as 𝛿1={𝑝𝑎
𝑞|𝑞𝑎
𝑝𝛿}. The notation 𝛿|𝑆for 𝑆𝑄is used to denote the
restriction of the transition function 𝛿 (𝑆×Σ×𝑆). Moreover, for 𝑞𝑄, we use A [𝑞]
to denote the BA (𝑄, 𝛿, {𝑞}, 𝑄𝐹𝛿𝐹).
Arun of Afrom 𝑞𝑄on an input word 𝛼is an infinite sequence 𝜌:𝜔𝑄that
starts in 𝑞and respects 𝛿, i.e., 𝜌0=𝑞and 𝑖0: 𝜌𝑖
𝛼𝑖
𝜌𝑖+1𝛿. Let inf𝑄(𝜌)denote
the states occurring in 𝜌infinitely often and inf 𝛿(𝜌)denote the transitions occurring in 𝜌
infinitely often. The run 𝜌is called accepting iff inf 𝑄(𝜌) 𝑄𝐹or inf 𝛿(𝜌) 𝛿𝐹.
A word 𝛼is accepted by Afrom a state 𝑞𝑄if there is an accepting run 𝜌of A
from 𝑞, i.e., 𝜌0=𝑞. The set LA(𝑞)={𝛼Σ𝜔| A accepts 𝛼from 𝑞}is called the
language of 𝑞(in A). Given a set of states 𝑅𝑄, we define the language of 𝑅as
LA(𝑅)=Ð𝑞𝑅LA(𝑞)and the language of Aas L(A ) =LA(𝐼). We say that a state
𝑞𝑄is useless iff LA(𝑞)=. If 𝛿𝐹=, we call Astate-based and if 𝑄𝐹=, we
call Atransition-based. In this paper, we fix a BA A=(𝑄, 𝛿, 𝐼, 𝑄 𝐹𝛿𝐹).
3 Complementing Büchi automata
In this section, we describe a generalization of the rank-based complementation of state-
based BAs presented by Schewe in [37] to our notion of transition/state-based BAs.
Proofs can be found in [16].
3.1 Run DAGs
First, we recall the terminology from [37] (which is a minor modification of the one
in [24]), which we use in the paper. Let the run DAG of Aover a word 𝛼be a DAG
(directed acyclic graph) G𝛼=(𝑉, 𝐸)containing vertices 𝑉and edges 𝐸such that
𝑉𝑄×𝜔s.t. (𝑞, 𝑖) 𝑉iff there is a run 𝜌of Afrom 𝐼over 𝛼with 𝜌𝑖=𝑞,
𝐸𝑉×𝑉s.t. ((𝑞 , 𝑖),(𝑞0, 𝑖0)) 𝐸iff 𝑖0=𝑖+1and 𝑞0𝛿(𝑞, 𝛼𝑖).
Given G𝛼as above, we will write (𝑝, 𝑖) G𝛼to denote that (𝑝, 𝑖) 𝑉. A vertex
(𝑝, 𝑖) 𝑉is called accepting if 𝑝is an accepting state and an edge ( (𝑞, 𝑖 ),(𝑞0, 𝑖0)) 𝐸
is called accepting if 𝑞𝛼𝑖
𝑞0is an accepting transition. A vertex 𝑣 G𝛼is finite if the
set of vertices reachable from 𝑣is finite, infinite if it is not finite, and endangered if it
cannot reach an accepting vertex or an accepting edge.
We assign ranks to vertices of run DAGs as follows: Let G0
𝛼=G𝛼and 𝑗=0. Repeat
the following steps until the fixpoint or for at most 2𝑛+1steps, where 𝑛=|𝑄|.
Set rank 𝛼(𝑣) 𝑗for all finite vertices 𝑣of G𝑗
𝛼and let G𝑗+1
𝛼be G𝑗
𝛼minus the
vertices with the rank 𝑗.
Set rank 𝛼(𝑣) 𝑗+1for all endangered vertices 𝑣of G𝑗+1
𝛼and let G𝑗+2
𝛼be G𝑗+1
𝛼
minus the vertices with the rank 𝑗+1.
Set 𝑗𝑗+2.
For all vertices 𝑣that have not been assigned a rank yet, we assign rank 𝛼(𝑣) 𝜔.
We define the rank of 𝛼, denoted as rank (𝛼), as max{rank 𝛼(𝑣) | 𝑣 G𝛼}and the
rank of A, denoted as rank (A), as max{rank (𝑤) | 𝑤Σ𝜔\ L ( A)}.
Lemma 1. If 𝛼L(A), then rank (𝛼) 2|𝑄|.
120 Vojtˇech Havlena, Ondˇrej Leng´al, Barbora ˇ
Smahl´ıkov´a
3.2 Rank-Based Complementation
In this section, we describe a construction for complementing BAs developed in the work
of Kupferman and Vardi [24]—later improved by Friedgut, Kupferman, and Vardi [14],
and by Schewe [37]—extended to our definition of BAs with accepting states and tran-
sitions (see [19] for a step-by-step introduction). The construction is based on the notion
of tight level rankings storing information about levels in run DAGs. For a BA Aand
𝑛=|𝑄|, a (level) ranking is a function 𝑓:𝑄 [2𝑛]such that 𝑓(𝑄𝐹) {0,2, . . . , 2𝑛},
i.e., 𝑓assigns even ranks to accepting states of A. For two rankings 𝑓and 𝑓0we define
𝑓𝑎
𝑆𝑓0iff for each 𝑞𝑆and 𝑞0𝛿(𝑞, 𝑎)we have 𝑓0(𝑞0) 𝑓(𝑞)and for each
𝑞00 𝛿𝐹(𝑞, 𝑎)it holds 𝑓0(𝑞00) bb 𝑓(𝑞)cc. The set of all rankings is denoted by R. For
a ranking 𝑓, the rank of 𝑓is defined as rank (𝑓)=max{𝑓(𝑞) | 𝑞𝑄}. We use 𝑓𝑓0
iff for every state 𝑞𝑄we have 𝑓(𝑞) 𝑓0(𝑞)and we use 𝑓<𝑓0iff 𝑓𝑓0and there
is a state 𝑞𝑄with 𝑓(𝑞)< 𝑓 0(𝑞). For a set of states 𝑆𝑄, we call 𝑓to be 𝑆-tight if
(i) it has an odd rank 𝑟, (ii) 𝑓(𝑆) {1,3, . . . , 𝑟 }, and (iii) 𝑓(𝑄\𝑆)={0}. A ranking is
tight if it is 𝑄-tight; we use Tto denote the set of all tight rankings.
The original rank-based construction [24] uses macrostates of the form (𝑆 , 𝑂, 𝑓 )to
track all runs of Aover 𝛼. The 𝑓-component contains guesses of the ranks of states
in 𝑆(which is obtained by the classical subset construction) in the run DAG and the
𝑂-set is used to check whether all runs contain only a finite number of accepting states.
Friedgut, Kupferman, and Vardi [14] improved the construction by having 𝑓consider
only tight rankings. Schewe’s construction [37] extends the macrostates to (𝑆, 𝑂 , 𝑓 , 𝑖)
with 𝑖𝜔representing a particular even rank such that 𝑂tracks states with rank 𝑖.
At the cut-point (a macrostate with 𝑂=) the value of 𝑖is changed to 𝑖+2modulo the
rank of 𝑓. Macrostates in an accepting run hence iterate over all possible values of 𝑖.
Formally, the complement of A=(𝑄, 𝛿, 𝐼, 𝑄𝐹𝛿𝐹)is given as the (state-based) BA
Schewe(A) =(𝑄0, 𝛿0, 𝐼 0, 𝑄 0
𝐹 ∅), whose components are defined as follows:
𝑄0=𝑄1𝑄2where
𝑄1=2𝑄and
𝑄2={( 𝑆 , 𝑂, 𝑓 , 𝑖 ) 2𝑄×2𝑄× T × {0,2, . . . , 2𝑛2} | 𝑓is 𝑆-tight,
𝑂𝑆𝑓1(𝑖)},
𝐼0={𝐼},
𝛿0=𝛿1𝛿2𝛿3where
𝛿1:𝑄1×Σ2𝑄1such that 𝛿1(𝑆, 𝑎)={𝛿(𝑆, 𝑎) },
𝛿2:𝑄1×Σ2𝑄2such that 𝛿2(𝑆, 𝑎)={ ( 𝑆0,, 𝑓 , 0) | 𝑆0=𝛿(𝑆, 𝑎),
𝑓is 𝑆0-tight}, and
𝛿3:𝑄2×Σ2𝑄2such that (𝑆0, 𝑂 0, 𝑓 0, 𝑖0) 𝛿3((𝑆 , 𝑂, 𝑓 , 𝑖 ), 𝑎)iff
𝑆0=𝛿(𝑆, 𝑎),
𝑓𝑎
𝑆𝑓0,
rank (𝑓)=rank (𝑓0),
and
if 𝑂=then 𝑖0=(𝑖+2)mod (rank (𝑓0) + 1)and 𝑂0=𝑓0−1(𝑖0), and
if 𝑂then 𝑖0=𝑖and 𝑂0=𝛿(𝑂, 𝑎) 𝑓0−1(𝑖); and
𝑄0
𝐹={∅} ( (2𝑄× {∅} × T × 𝜔) 𝑄2).
We call the part of the automaton with states from 𝑄1the waiting part (denoted as
Waiting), and the part corresponding to 𝑄2the tight part (denoted as Tight).
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 121
Theorem 2. Let Abe a BA. Then L(Schewe(A)) = Σ𝜔\ L (A).
The space complexity of Schewe’s construction for BAs matches the theoretical
lower bound O( (0.76𝑛)𝑛)given by Yan [43] modulo a quadratic factor O (𝑛2). Note that
our extension to BAs with accepting transitions does not increase the space complexity
of the construction.
𝑟𝑠𝑡
𝑏
𝑏
𝑏
𝑎
𝑎
𝑎
(a) BA Aover {𝑎, 𝑏}
{𝑟} {𝑟, 𝑠, 𝑡 } {𝑠, 𝑡 }
{𝑟:3, 𝑠:0, 𝑡:1},
{𝑟:3, 𝑠:2, 𝑡:1},
{𝑟:1, 𝑠:2, 𝑡:3},
{𝑟:1, 𝑠:0, 𝑡:0},{𝑟:1, 𝑠:0, 𝑡 :1},
{𝑟:1},
{𝑟:1, 𝑠:0, 𝑡:0},{𝑠, 𝑡 }
{𝑠:0, 𝑡:1},
𝑏
𝑏
𝑏
𝑏𝑏
𝑏
𝑎
𝑎
𝑏
𝑎
𝑎
𝑎
𝑏
𝑎, 𝑏
𝑏𝑏
𝑏𝑏
(b) A part of Schewe(A)
Fig. 1: Schewe’s complementation
Example 3. Consider the BA Aover
{𝑎, 𝑏}given in Fig. 1a. A part of
Schewe(A) is shown in Fig. 1b (we use
({ 𝑠:0, 𝑡:1},∅) to denote the macrostate
({ 𝑠, 𝑡 },,{𝑠↦→ 0, 𝑡 ↦→ 1},0)). We
omit the 𝑖-part of each macrostate since
the corresponding values are 0 for all
macrostates in the figure. Useless states
are covered by grey stripes. The full au-
tomaton contains even more transitions
from {𝑟}to useless macrostates of the
form ({𝑟:·, 𝑠:·, 𝑡:·},∅).ut
From the construction of Schewe(A),
we can see that the number of states is
affected mainly by sizes of macrostates
and by the maximum rank of A. In par-
ticular, the upper bound on the number
of states of the complement with the maximum rank 𝑟is given in the following lemma.
Lemma 4. For a BA Awith sufficiently many states 𝑛such that rank (A) =𝑟the
number of states of the complemented automaton is bounded by 2𝑛+(𝑟+𝑚)𝑛
(𝑟+𝑚)!where
𝑚=max{0,3 d 𝑟
2e}.
From Lemma 1we have that the rank of Ais bounded by 2|𝑄|. Such a bound
is often too coarse and hence Schewe(A) may contain many redundant states. De-
creasing the bound on the ranks is essential for a practical algorithm, but an optimal
solution is PSpace-complete [15]. The rest of this paper therefore proposes a framework
of lightweight techniques for decreasing the maximum rank bound and, in this way,
significantly reducing the size of the complemented BA.
3.3 Tight Rank Upper Bounds
Let 𝛼L(A). For 𝜔, we define the -th level of G𝛼as level 𝛼()={𝑞| (𝑞, )
G𝛼}. Furthermore, we use 𝑓𝛼
to denote the ranking of level of G𝛼. Formally,
𝑓𝛼
(𝑞)=(rank 𝛼((𝑞, )) if 𝑞level 𝛼(),
0otherwise. (1)
We say that the -th level of G𝛼is tight if for all 𝑘it holds that (i) 𝑓𝛼
𝑘is tight, and
(ii) rank (𝑓𝛼
𝑘)=rank (𝑓𝛼
). Let 𝜌=𝑆0𝑆1. . . 𝑆1(𝑆, 𝑂, 𝑓, 𝑖 ). . . be a run on a word
122 Vojtˇech Havlena, Ondˇrej Lenal, Barbora ˇ
Smahıkov´a
𝛼in Schewe(A). We say that 𝜌is a super-tight run [19] if 𝑓𝑘=𝑓𝛼
𝑘for each 𝑘.
Finally, we say that a mapping 𝜇: 2𝑄 R is a tight rank upper bound (TRUB) wrt 𝛼iff
𝜔:level 𝛼()is tight (∀𝑘:𝜇(level 𝛼(𝑘) ) 𝑓𝛼
𝑘).(2)
Informally, a TRUB is a ranking that gives a conservative (i.e., larger) estimate on
the necessary ranks of states in a super-tight run. We say that 𝜇is a TRUB iff 𝜇
is a TRUB wrt all 𝛼L(A). We abuse notation and use the term TRUB also for
a mapping 𝜇0: 2𝑄𝜔if the mapping inner (𝜇0)is a TRUB where inner (𝜇0)(𝑆)=
{𝑞↦→ 𝑚|𝑚=𝜇0(𝑆).
1if 𝑞𝑄𝐹else 𝑚=𝜇0(𝑆)} for all 𝑆2𝑄. ( .
is the monus
operator, i.e., minus with negative results saturated to zero.) Note that the mappings
𝜇𝑡={𝑆↦→ (2|𝑆\𝑄𝐹|.
1)}𝑆2𝑄and inner (𝜇𝑡)are trivial TRUBs.
The following lemma shows that we can remove from Schewe(A) macrostates
whose ranking is not covered by a TRUB (in particular, we show that the reduced
automaton preserves super-tight runs).
Lemma 5. Let 𝜇be a TRUB and Bbe a BA obtained from Schewe(A) by replacing
all occurrences of 𝑄2by 𝑄0
2={( 𝑆 , 𝑂, 𝑓 , 𝑖 ) | 𝑓𝜇(𝑆) }. Then, L (B) = Σ 𝜔\ L (A).
4 Elevator Automata
In this section, we introduce elevator automata, which are BAs having a particular
structure that can be exploited for complementation and semi-determinization; elevator
automata can be complemented in O (16𝑛)(cf. Lemma 10) space instead of 2O (𝑛log 𝑛),
which is the lower bound for unrestricted BAs, and semi-determinized in O (2𝑛)instead
of O(4𝑛)(cf. [16]). The class of elevator automata is quite general: it can be seen
as a substantial generalization of semi-deterministic BAs (SDBAs) [11,5]. Intuitively,
an elevator automaton is a BA whose strongly connected components are all either
deterministic or inherently weak.
Let A=(𝑄, 𝛿 , 𝐼, 𝑄 𝐹𝛿𝐹).𝐶𝑄is a strongly connected component (SCC) of A
if for any pair of states 𝑞, 𝑞 0𝐶it holds that 𝑞is reachable from 𝑞0and 𝑞0is reachable
from 𝑞.𝐶is maximal (MSCC) if it is not a proper subset of another SCC. An MSCC 𝐶is
trivial iff |𝐶|=1and 𝛿|𝐶=. The condensation of Ais the DAG cond (A) =(M,E)
where Mis the set of A’s MSCCs and E={(𝐶1, 𝐶2) | 𝑞1𝐶1,𝑞2𝐶2,𝑎
Σ:𝑞1
𝑎
𝑞2𝛿}. An MSCC is non-accepting if it contains no accepting state and no
accepting transition, i.e., 𝐶𝑄𝐹=and 𝛿|𝐶𝛿𝐹=. The depth of (M,E ) is defined
as the number of MSCCs on the longest path in (M,E).
We say that an SCC 𝐶is inherently weak accepting (IWA) iff every cycle in the
transition diagram of Arestricted to 𝐶contains an accepting state or an accepting
transition. 𝐶is inherently weak if it is either non-accepting or IWA, and Ais inherently
weak if all of its MSCCs are inherently weak. Ais deterministic iff |𝐼| 1and
|𝛿(𝑞, 𝑎)| 1for all 𝑞𝑄and 𝑎Σ. An SCC 𝐶𝑄is deterministic iff (𝐶, 𝛿|𝐶,,∅)
is deterministic. Ais a semi-deterministic BA (SDBA) if A[ 𝑞]is deterministic for every
𝑞𝑄𝐹 {𝑝𝑄|𝑠𝑎
𝑝𝛿𝐹, 𝑠 𝑄, 𝑎 Σ}, i.e., whenever a run in Areaches an
accepting state or an accepting transition, it can only continue deterministically.
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 123
0 1
2 3
4 5
¬𝑎
𝑎¬𝑎
𝑎
¬𝑏
𝑏¬𝑏
𝑏
¬𝑐
𝑐¬𝑐
𝑐
¬𝑎𝑏¬𝑎𝑏
¬𝑏𝑐¬𝑏𝑐
¬𝑎 ¬𝑏𝑐
¬𝑎 ¬𝑏𝑐
det
det
det
Fig. 2: The BA for LTL formula
GF(𝑎GF (𝑏GF𝑐)) is elevator
Ais an elevator (Büchi) automaton iff for
every MSCC 𝐶of Ait holds that 𝐶is (i) deter-
ministic, (ii) IWA, or (iii) non-accepting. In other
words, a BA is an elevator automaton iff every
nondeterministic SCC of Athat contains an ac-
cepting state or transition is inherently weak. An
example of an elevator automaton obtained from
the LTL formula GF (𝑎GF (𝑏GF𝑐)) is shown
in Fig. 2. The BA consists of three connected de-
terministic components. Note that the automaton
is neither semi-deterministic nor unambiguous.
The rank of an elevator automaton Adoes
not depend on the number of states (as in general
BAs), but only on the number of MSCCs and the
depth of cond (A). In the worst case, Aconsists of a chain of deterministic components,
yielding the upper bound on the rank of elevator automata given in the following lemma.
Lemma 6. Let Abe an elevator automaton such that its condensation has the depth 𝑑.
Then rank (A) 2𝑑.
4.1 Refined Ranks for Elevator Automata
Notice that the upper bound on ranks provided by Lemma 6can still be too coarse. For
instance, for an SDBA with three linearly ordered MSCCs such that the first two MSCCs
are non-accepting and the last one is deterministic accepting, the lemma gives us an
upper bound on the rank 6, while it is known that every SDBA has the rank at most 3
(cf. [5]). Another examples might be two deterministic non-trivial MSCCs connected
by a path of trivial MSCCs, which can be assigned the same rank.
Instead of refining the definition of elevator automata into some quite complex list of
constraints, we rather provide an algorithm that performs a traversal through cond (A)
and assigns each MSCC a label of the form type:rank that contains (i) a type and
(ii) a bound on the maximum rank of states in the component. The types of MSCCs that
we consider are the following:
T:trivial components,
IWA:inherently weak accepting components,
D:deterministic (potentially accepting) components, and
N:non-accepting components.
Note that the type in an MSCC is not given a priori but is determined by the
algorithm (this is because for deterministic non-accepting components, it is sometimes
better to treated them as Dand sometimes as N, depending on their neighbourhood).
In the following, we assume that Ais an elevator automaton without useless states
and, moreover, all accepting conditions on states and transitions not inside non-trivial
MSCCs are removed (any BA can be easily transformed into this form).
We start with terminal MSCCs 𝐶, i.e., MSCCs that cannot reach any other MSCC:
T1: If 𝐶is IWA, then we label it with IWA:0 .
T2: Else if 𝐶is deterministic accepting, we label it with D:2 .
124 Vojtˇech Havlena, Ondˇrej Lenal, Barbora ˇ
Smahıkov´a
IWA:
=max{𝐷, 𝑁+1, 𝑊}
𝐶:
D:𝐷N:𝑁IWA:𝑊
(a) 𝐶is IWA
D:
=max{𝐷+2, 𝑁+1, 𝑊+2,2}
𝐶:
D:𝐷
2
N:𝑁IWA:𝑊
2
(b) 𝐶is D
N:
=max{𝐷+1, 𝑁, 𝑊+1}
𝐶:
D:𝐷N:𝑁IWA:𝑊
(c) 𝐶is N
Fig. 3: Rules for assigning types and rank bounds to MSCCs. The symbols 2and 2
are interpeted as 0if all the corresponding edges from the components having rank 𝐷
and 𝑊, respectively, are deterministic; otherwise they are interpreted as 2. Transi-
tions between two components 𝐶1and 𝐶2are deterministic if the BA (𝐶 , 𝛿|𝐶,,∅) is
deterministic for 𝐶=𝛿(𝐶1,Σ)∩(𝐶1𝐶2).
𝑡:
𝐶:
D:𝐷N:𝑁IWA:𝑊
Fig. 4: Structure of elevator
ranking rules
(Note that the previous two options are complete due
to our requirements on the structure of A.) When
all terminal MSCCs are labelled, we proceed through
cond (A), inductively on its structure, and label non-
terminal components 𝐶based on the rules defined below.
The rules are of the form that uses the structure depicted in Fig. 4, where children nodes
denote already processed MSCCs. In particular, a child node of the form 𝑘:𝑘denotes
an aggregate node of all siblings of the type 𝑘with 𝑘being the maximum rank of these
siblings. Moreover, we use typemax{𝑒𝐷, 𝑒 𝑁, 𝑒𝑊}to denote the type 𝑗 {D,N,IWA}
for which 𝑒𝑗=max{𝑒𝐷, 𝑒𝑁, 𝑒𝑊}where 𝑒𝑖is an expression containing 𝑖(if there are
more such types, 𝑗is chosen arbitrarily). The rules for assigning a type 𝑡and a rank
to 𝐶are the following:
I1: If 𝐶is trivial, we set 𝑡=typemax{𝐷, 𝑁, 𝑊}and =max{𝐷, 𝑁, 𝑊}.
I2: Else if 𝐶is IWA, we use the rule in Fig. 3a.
I3: Else if 𝐶is deterministic accepting, we use the rule in Fig. 3b.
I4: Else if 𝐶is deterministic and non-accepting, we try both rules from Figs. 3b and 3c
and pick the rule that gives us a smaller rank.
I5: Else if 𝐶is nondeterministic and non-accepting, we use the rule in Fig. 3c.
{𝑟} {𝑟, 𝑠, 𝑡 } {𝑠, 𝑡 }
{𝑟:3, 𝑠:0, 𝑡:1},
{𝑟:3, 𝑠:2, 𝑡:1},
{𝑟:1, 𝑠:2, 𝑡:3},
{𝑟:1, 𝑠:0, 𝑡:0},{𝑟:1, 𝑠:0, 𝑡 :1},
{𝑟:1},
{𝑟:1, 𝑠:0, 𝑡:0},{𝑠, 𝑡 }
{𝑠:0, 𝑡:1},
𝑏
𝑏
𝑏
𝑏𝑏
𝑏
𝑎
𝑎
𝑏
𝑎
𝑎
𝑎
𝑏
𝑎, 𝑏
𝑏𝑏
𝑏
𝑏
Fig. 5: A part of Schewe(A). The TRUB
computed by elevator rules is used to prune
states outside the yellow area.
Then, for every MSCC 𝐶of A, we assign
each of its states the rank of 𝐶. We use
𝜒:𝑄𝜔to denote the rank bounds
computed by the procedure above.
Lemma 7. 𝜒is a TRUB.
Using Lemma 5, we can now use 𝜒
to prune states during the construction
of Schewe(A), as shown in the follow-
ing example.
Example 8. As an example, consider
the BA Ain Fig. 1a. The set of
MSCCs with their types is given as
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 125
{{𝑟}:N,{𝑠, 𝑡}:IWA}showing that BA Ais an elevator. Using the rules T1 and I4
we get the TRUB 𝜒={𝑟:1, 𝑠:0, 𝑡:0}. The TRUB can be used to prune the generated
states as shown in Fig. 5.ut
4.2 Efficient Complementation of Elevator Automata
In Section 4.1 we proposed an algorithm for assigning ranks to MSCCs of an elevator
automaton A. The drawback of the algorithm is that the maximum obtained rank is not
bounded by a constant but by the depth of the condensation of A. We will, however,
show that it is actually possible to change Aby at most doubling the number of states
and obtain an elevator BA with the rank at most 3.
Intuitively, the construction copies every non-trivial MSCC 𝐶with an accepting
state or transition into a component 𝐶, copies all transitions going into states in 𝐶to
also go into the corresponding states in 𝐶, and, finally, removes all accepting conditions
from 𝐶. Formally, let A=(𝑄, 𝛿, 𝐼, 𝑄𝐹𝛿𝐹)be a BA. For 𝐶𝑄, we use 𝐶to denote
a unique copy of 𝐶, i.e., 𝐶={𝑞|𝑞𝐶}s.t. 𝐶𝑄=. Let Mbe the set of MSCCs
of A. Then, the deelevated BA DeElev(A) =(𝑄0, 𝛿0, 𝐼 0, 𝑄 0
𝐹𝛿0
𝐹)is given as follows:
𝑄0=𝑄𝑄,
𝛿0:𝑄0×Σ2𝑄0where for 𝑞𝑄
𝛿0(𝑞, 𝑎)=𝛿(𝑞, 𝑎)∪(𝛿(𝑞, 𝑎)) and
𝛿0(𝑞, 𝑎)=(𝛿(𝑞, 𝑎) 𝐶)for 𝑞𝐶 M;
𝐼0=𝐼, and
𝑄0
𝐹=𝑄
𝐹and 𝛿0
𝐹={𝑞𝑎
𝑟|𝑞𝑎
𝑟𝛿𝐹} 𝛿0.
It is easy to see that the number of states of the deelevated automaton is bounded by 2|𝑄|.
Moreover, if Ais elevator, so is DeElev(A). The construction preserves the language
of A, as shown by the following lemma.
Lemma 9. Let Abe a BA. Then, L(A ) =L (DeElev( A)) .
Moreover, for an elevator automaton A, the structure of DeElev(A) consists of (after
trimming useless states) several non-accepting MSCCs with copied terminal deter-
ministic or IWA MSCCs. Therefore, if we apply the algorithm from Section 4.1 on
DeElev(A), we get that its rank is bounded by 3, which gives the following upper
bound for complementation of elevator automata.
Lemma 10. Let Abe an elevator automaton with suffficiently many states 𝑛. Then the
language Σ𝜔\ L(A) can be represented by a BA with at most O(16𝑛)states.
The complementation through DeElev(A) gives a better upper bound than the rank
refinement from Section 4.1 applied directly on A, however, based on our experience,
complementation through DeElev(A) behaves worse in many real-world instances.
This poor behaviour is caused by the fact that the complement of DeElev(A) can have
a larger Waiting and macrostates in Tight can have larger 𝑆-components, which can
yield more generated states (despite the rank bound 3). It seems that the most promising
approach would to be a combination of the approaches, which we leave for future work.
126 Vojtˇech Havlena, Ondˇrej Lenal, Barbora ˇ
Smahıkov´a
IWA:
=max{𝐷, 𝑁+1, 𝑊, 𝐺}
𝐶:
D:𝐷N:𝑁IWA:𝑊G:𝐺
(a) 𝐶is IWA
D:
=max{𝐷+2, 𝑁+1, 𝑊+2, 𝐺+2,2}
𝐶:
D:𝐷
2
N:𝑁IWA:𝑊
2
G:𝐺
(b) 𝐶is D
N:
=max{𝐷+1, 𝑁, 𝑊+1, 𝐺+1}
𝐶:
D:𝐷N:𝑁IWA:𝑊G:𝐺
(c) 𝐶is N
Fig. 6: Rules assigning types and rank bounds for non-elevator automata.
4.3 Refined Ranks for Non-Elevator Automata
The algorithm from Section 4.1 computing a TRUB for elevator automata can be
extended to compute TRUBs even for general non-elevator automata (i.e., BAs with
nondeterministic accepting components that are not inherently weak). To achieve this
generalization, we extend the rules for assigning types and ranks to MSCCs of elevator
automata from Section 4.1 to take into account general non-deterministic components.
For this, we add into our collection of MSCC types general components (denoted as G).
Further, we need to extend the rules for terminal components with the following rule:
T3: Otherwise, we label 𝐶with G:2|𝐶\𝑄𝐹|.
G:
=max{𝐷, 𝑁+1, 𝑊, 𝐺} + 2|𝐶\𝑄𝐹|
𝐶:
D:𝐷N:𝑁IWA:𝑊G:𝐺
Fig. 7: 𝐶is G
Moreover, we adjust the rules for assigning
a type 𝑡and a rank to 𝐶to the following (the
rule I1 is the same as for the case of elevator
automata):
I2I5:(We replace the corresponding rules for their counterparts including general
components from Fig. 6).
I6: Otherwise, we use the rule in Fig. 7.
Then, for every MSCC 𝐶of a BA A, we assign each of its states the rank of 𝐶. Again, we
use 𝜒:𝑄𝜔to denote the rank bounds computed by the adjusted procedure above.
Lemma 11. 𝜒is a TRUB.
5 Rank Propagation
𝜇0(𝑆)
𝜇(𝑅1)
𝑎1
𝜇(𝑅2)
𝑎2
· · · 𝜇(𝑅𝑚)
𝑎𝑚
Fig. 8: Rank propagation flow
In the previous section, we proposed a way, how to
obtain a TRUB for elevator automata (with gener-
alization to general automata). In this section, we
propose a way of using the structure of Ato re-
fine a TRUB using a propagation of values and thus
reduce the size of Tight. Our approach uses data
flow analysis [32] to reason on how ranks and rankings of macrostates of Schewe(A)
can be decreased based on the ranks and rankings of the local neighbourhood of the
macrostates. We, in particular, use a special case of forward analysis working on
the skeleton of Schewe(A), which is defined as the BA KA=(2𝑄, 𝛿0,,∅) where
𝛿0={𝑅𝑎
𝑆|𝑆=𝛿(𝑅, 𝑎) } (note that we are only interested in the structure of KAand
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 127
not its language; also notice the similarity of KAwith Waiting). Our analysis refines
a rank/ranking estimate 𝜇(𝑆)for a macrostate 𝑆of KAbased on the estimates for its
predecessors 𝑅1, . . . , 𝑅𝑚(see Fig. 8). The new estimate is denoted as 𝜇0(𝑆).
More precisely, 𝜇: 2𝑄Vis a function giving each macrostate of KAa value from
the domain V. We will use the following two value domains: (i) V=𝜔, which is used for
estimating ranks of macrostates (in the outer macrostate analysis) and (ii) V=R, which
is used for estimating rankings within macrostates (in the inner macrostate analysis). For
each of the analyses, we will give the update function up :(2𝑄V)×(2𝑄)𝑚+1V,
which defines how the value of 𝜇(𝑆)is updated based on the values of 𝜇(𝑅1), . . . , 𝜇(𝑅𝑚).
We then construct a system with the following equation for every 𝑆2𝑄:
𝜇(𝑆)=up (𝜇, 𝑆, 𝑅1, . . . , 𝑅𝑚)where {𝑅1, . . . , 𝑅𝑚}=𝛿0−1(𝑆, Σ).(3)
We then solve the system of equations using standard algorithms for data flow analysis
(see, e.g., [32, Chapter 2]) to obtain the fixpoint 𝜇. Our analyses have the important
property that if they start with 𝜇0being a TRUB, then 𝜇will also be a TRUB.
As the initial TRUB, we can use a trivial TRUB or any other TRUB (e.g., the output
of elevator state analysis from Section 4).
5.1 Outer Macrostate Analysis
We start with the simpler analysis, which is the outer macrostate analysis, which
only looks at sizes of macrostates. Recall that the rank 𝑟of every super-tight run in
Schewe(A) does not change, i.e., a super tight run stays in Waiting as long as needed
so that when it jumps to Tight, it takes the rank 𝑟and never needs to decrease it. We can
use this fact to decrease the maximum rank of a macrostate 𝑆in KA. In particular,
let us consider all cycles going through 𝑆. For each of the cycles 𝑐, we can bound the
maximum rank of a super-tight run going through 𝑐by 2𝑚1where 𝑚is the smallest
number of non-accepting states occurring in any macrostate on 𝑐(from the definition,
the rank of a tight ranking does not depend on accepting states). Then we can infer that
the maximum rank of any super-tight run going through 𝑆is bounded by the maximum
rank of any of the cycles going through 𝑆(since 𝑆can never assume a higher rank in
any super-tight run). Moreover, the rank of each cycle can also be estimated in a more
precise way, e.g. using our elevator analysis.
Since the number of cycles in KAcan be large2, instead of their enumeration, we em-
ploy data flow analysis with the value domain V=𝜔(i.e, for every macrostate 𝑆of KA,
we remember a bound on the maximum rank of 𝑆) and the following update function:
upout (𝜇, 𝑆, 𝑅1, . . . , 𝑅𝑚)=min{𝜇(𝑆),max{𝜇(𝑅1), . . . , 𝜇(𝑅𝑚)}}.(4)
Intuitively, the new bound on the maximum rank of 𝑆is taken as the smaller of the
previous bound 𝜇(𝑆)and the largest of the bounds of all predecessors of 𝑆, and the new
value is propagated forward by the data flow analysis.
2KAcan be exponentially larger than Aand the number of cycles in KAcan be exponential to
the size of KA, so the total number of cycles can be double-exponential.
128 Vojtˇech Havlena, Ondˇrej Lenal, Barbora ˇ
Smahıkov´a
𝑝 𝑞 𝑟
𝑠
𝑎
𝑎 𝑎
𝑎
𝑎
𝑎
(a) Aex
{𝑝}:1
{𝑝, 𝑞}:3
{𝑝, 𝑞, 𝑟 , 𝑠}:7
𝑎
𝑎
𝑎
(b) 𝜇0
{𝑝}:1
{𝑝, 𝑞}:1
{𝑝, 𝑞, 𝑟 , 𝑠}:7
𝑎
𝑎
𝑎
(c) 𝜇
out
Fig. 9: Example of outer macrostate anal-
ysis. (a)Aex (denotes accepting transi-
tions). The initial TRUB 𝜇0in (b) is refined
to 𝜇
out in (c).
Example 12. Consider the BA Aex in
Fig. 9a. When started from the initial
TRUB 𝜇0={{𝑝} ↦→ 1,{𝑝, 𝑞} ↦→
3,{𝑝, 𝑞 , 𝑟, 𝑠 } ↦→ 7}(Fig. 9b), outer
macrostate analysis decreases the max-
imum rank estimate for {𝑝, 𝑞}to 1,
since min{𝜇0({𝑝, 𝑞 },max{𝜇0({ 𝑝})} } =
min{3,1}=1. The estimate for
{𝑝, 𝑞 , 𝑟, 𝑠 }is not affected, because
min{7,max{1,7}} =7(Fig. 9c). ut
Lemma 13. If 𝜇is a TRUB, then 𝜇C{𝑆↦→ upout (𝜇, 𝑆, 𝑅1, . . . , 𝑅𝑚)} is a TRUB.
Corollary 14. When started with a TRUB 𝜇0, the outer macrostate analysis terminates
and returns a TRUB 𝜇
out .
5.2 Inner Macrostate Analysis
Our second analysis, called inner macrostate analysis, looks deeper into super-tight
runs in Schewe(A). In particular, compared with the outer macrostate analysis from
the previous section—which only looks at the ranks, i.e., the bounds on the numbers
in the rankings—, inner macrostate analysis looks at how the rankings assign concrete
values to the states of Ainside the macrostates.
Inner macrostate analysis is based on the following. Let 𝜌be a super-tight run of
Schewe(A) on 𝛼L ( A) and (𝑆, 𝑂 , 𝑓 , 𝑖)be a macrostate from Tight. Because 𝜌is
super-tight, we know that the rank 𝑓(𝑞)of a state 𝑞𝑆is bounded by the ranks of the
predecessors of 𝑞. This holds because in super-tight runs, the ranks are only as high as
necessary; if the rank of 𝑞were higher than the ranks of its predecessors, this would
mean that we may wait in Waiting longer and only jump to 𝑞with a lower rank later.
Let us introduce some necessary notation. Let 𝑓 , 𝑓 0 R be rankings (i.e., 𝑓 , 𝑓 0:𝑄
𝜔). We use 𝑓t𝑓0to denote the ranking {𝑞↦→ max{𝑓(𝑞), 𝑓 0(𝑞)} | 𝑞𝑄}, and
𝑓u𝑓0to denote the ranking {𝑞↦→ min{𝑓(𝑞), 𝑓 0(𝑞)} | 𝑞𝑄}. Moreover, we define
max-succ-rank𝑎
𝑆(𝑓)=max{𝑓0 R | 𝑓𝑎
𝑆𝑓0}and a function dec :R R such that
dec(𝜃)is the ranking 𝜃0for which
𝜃0(𝑞)=
𝜃(𝑞).
1if 𝜃(𝑞)=rank (𝜃)and 𝑞𝑄𝐹,
bb𝜃(𝑞).
1cc if 𝜃(𝑞)=rank (𝜃)and 𝑞𝑄𝐹,
𝜃(𝑞)otherwise.
(5)
Intuitively, max-succ-rank𝑎
𝑆(𝑓)is the (pointwise) maximum ranking that can be reached
from macrostate 𝑆with ranking 𝑓over 𝑎(it is easy to see that there is a unique such
maximum ranking) and dec(𝜃)decreases the maximum ranks in a ranking 𝜃by one
(or by two for even maximum ranks and accepting states).
The analysis uses the value domain V=R(i.e., each macrostate of KAis assigned
a ranking giving an upper bound on the rank of each state in the macrostate) and
the update function upin given in the right-hand side of the page. Intuitively, up in
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 129
1upin (𝜇, 𝑆, 𝑅1, . . . , 𝑅𝑚):
2foreach 1𝑖𝑚and 𝑎Σdo
3if 𝛿(𝑅𝑖, 𝑎)=𝑆then
4𝑔𝑎
𝑖max-succ-rank𝑎
𝑅𝑖(𝜇(𝑅𝑖))
5𝜃𝜇(𝑆) u Ã{𝑔𝑎
𝑖|𝑔𝑎
𝑖is defined};
6if rank (𝜃)is even then 𝜃dec(𝜃);
7return 𝜃;
updates 𝜇(𝑞)for every 𝑞𝑆
to hold the maximum rank com-
patible with the ranks of its pre-
decessors. We note line Line 6,
which makes use of the fact that
we can only consider tight rank-
ings (whose rank is odd), so we
can decrease the estimate using
the function dec defined above.
{𝑝:1, 𝑞:1} { 𝑝:7,𝑞:7,𝑟:7,𝑠:7}
{𝑝:6,𝑞:7,𝑟:7,𝑠:7}
{𝑝:6,𝑞:6,𝑟:7,𝑠:7}
{𝑝:6,𝑞:6,𝑟:6,𝑠:6}
{𝑝:5,𝑞:5,𝑟:5,𝑠:5}
dec
Example 15. Let us continue in Section 5.1 and per-
form inner macrostate analysis starting with the TRUB
{{ 𝑝:1},{𝑝:1, 𝑞:1},{𝑝:7, 𝑞:7, 𝑟 :7, 𝑠:7}} obtained from 𝜇
out .
We show three iterations of the algorithm for {𝑝, 𝑞 , 𝑟, 𝑠 }in
the right-hand side (we do not show {𝑝, 𝑞}except the first
iteration since it does not affect intermediate steps). We can
notice that in the three iterations, we could decrease the maxi-
mum rank estimate to {𝑝:6, 𝑞:6, 𝑟 :6, 𝑠:6}due to the accepting
transitions from 𝑟and 𝑠. In the last of the three iterations, when
all states have the even rank 6, the condition on Line 6would
become true and the rank of all states would be decremented
to 5using dec. Then, again, the accepting transitions from 𝑟and 𝑠would decrease the
rank of 𝑝to 4, which would be propagated to 𝑞and so on. Eventually, we would arrive to
the TRUB {𝑝:1, 𝑞:1, 𝑟:1, 𝑠:1}, which could not be decreased any more, since {𝑝:1, 𝑞:1}
forces the ranks of 𝑟and 𝑠to stay at 1. ut
Lemma 16. If 𝜇is a TRUB, then 𝜇C{𝑆↦→ upin (𝜇, 𝑆, 𝑅1, . . . , 𝑅𝑚)} is a TRUB.
Corollary 17. When started with a TRUB 𝜇0, the inner macrostate analysis terminates
and returns a TRUB 𝜇
in .
6 Experimental Evaluation
Used tools and evaluation environment. We implemented the techniques described in
the previous sections as an extension of the tool Ranker [18] (written in C++). Speaking
in the terms of [19], the heuristics were implemented on top of the RankerMaxR config-
uration (we refer to this previous version as RankerOld). We tested the correctness of
our implementation using Spot’s autcross on all BAs in our benchmark. We compared
modified Ranker with other state-of-the-art tools, namely, Goal [41] (implementing
Piterman [34], Schewe [37], Safra [36], and Fribourg [1]), Spot 2.9.3 [12] (im-
plementing Redziejowski’s algorithm [35]), Seminator 2 [4], LTL2dstar 0.5.4 [23],
and Roll [26]. All tools were set to the mode where they output an automaton with
the standard state-based Büchi acceptance condition. The experimental evaluation was
performed on a 64-bit GNU/Linux Debian workstation with an Intel(R) Xeon(R) CPU
E5-2620 running at 2.40 GHz with 32GiB of RAM and using a timeout of 5 minutes.
Datasets. As the source of our benchmark, we use the two following datasets: (i) random
containing 11,000 BAs over a two letter alphabet used in [40], which were randomly
130 Vojtˇech Havlena, Ondˇrej Lenal, Barbora ˇ
Smahıkov´a
10 100 1000 10000 100000
Ranker
10
100
1000
10000
100000
Schewe
(a) Ranker vs Schewe
10 100 1000 10000 100000
Ranker
10
100
1000
10000
100000
RankerOld
(b) Ranker vs RankerOld
Fig. 10: Comparison of the state space generated by our optimizations and other rank-
based procedures (horizontal and vertical dashed lines represent timeouts). Blue data
points are from random and red data points are from LTL. Axes are logarithmic.
generated via the Tabakov-Vardi approach [39], starting from 15 states and with var-
ious different parameters; (ii) LTL with 1,721 BAs over larger alphabets (up to 128
symbols) used in [4], which were obtained from LTL formulae from literature (221) or
randomly generated (1,500). We preprocessed the automata using Rabit [30] and Spot’s
autfilt (using the --high simplification level), transformed them to state-based ac-
ceptance BAs (if they were not already), and converted to the HOA format [2]. From
this set, we removed automata that were (i) semi-deterministic, (ii) inherently weak,
(iii) unambiguous, or (iv) have an empty language, since for these automata types there
exist more efficient complementation procedures than for unrestricted BAs [5,4,6,28].
In the end, we were left with 2,592 (random) and 414 (LTL)hard automata. We use all
to denote their union (3,006 BAs). Of these hard automata, 458 were elevator automata.
6.1 Generated State Space
In our first experiment, we evaluated the effectiveness of our heuristics for pruning the
generated state space by comparing the sizes of complemented BAs without postprocess-
ing. This use case is directed towards applications where postprocessing is irrelevant,
such as inclusion or equivalence checking of BAs.
We focused on a comparison with two less optimized versions of the rank-based com-
plementation procedure: Schewe (the version “Reduced Average Outdegree” from [37]
implemented in Goal under -m rank -tr -ro) and its optimization RankerOld. The
scatter plots in Fig. 10 compare the numbers of states of automata generated by Ranker
and the other algorithms and the upper part of Table 1gives summary statistics. Observe
that our optimizations from this paper drastically reduced the generated search space
compared with both Schewe and RankerOld (the mean for Schewe is lower than for
RankerOld due to its much higher number of timeouts); from Fig. 10b we can see that
the improvement was in many cases exponential even when compared with our previous
optimizations in RankerOld. The median (which is a more meaningful indicator with
the presence of timeouts) decreased by 44 % w.r.t. RankerOld, and we also reduced the
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 131
Table 1: Statistics for our experiments. The upper part compares various optimizations of
the rank-based procedure (no postprocessing). The lower part compares Ranker to other
approaches (with postprocessing). The left-hand side compares sizes of complement BAs
and the right-hand side runtimes of the tools. The wins and losses columns give the
number of times when Ranker was strictly better and worse. The values are given for
the three datasets as all (random :LTL)”. Approaches in Goal are labelled with G.
method mean median wins losses mean runtime [s] median runtime [s] timeouts
Ranker 3812 (4452 : 207) 79 (93 : 26) 7.83 (8.99 : 1.30) 0.51 (0.84 : 0.04) 279 (276 : 3)
RankerOld 7398 (8688 : 358) 141 (197 : 29) 2190 (2011: 179) 111 (107: 4) 9.37 (10.73 :1.99) 0.61 (1.04 :0.04) 365 (360 : 5)
Schewe G4550 (5495 : 665) 439 (774 : 35) 2640 (2315: 325) 55 (1: 54) 21.05 (24.28 : 7.80) 6.57 (7.39 : 5.21) 937 (928 :9)
Ranker 47 (52 : 18) 22 (27 : 10) 7.83 (8.99 : 1.30) 0.51 (0.84 : 0.04) 279 (276 : 3)
Piterman G73 (82 :22) 28 (34 :14) 1435 (1124 :311) 416 (360 :56) 7.29 (7.39 : 6.65) 5.99 (6.04 : 5.62) 14 (12: 2)
Safra G83 (91 : 30) 29 (35 :17) 1562 (1211 :351) 387 (350 : 37) 14.11 (15.05 :8.37) 6.71 (6.92 : 5.79) 172 (158 :14)
Spot 75 (85 :15) 24 (32 :10) 1087 (936 :151) 683 (501 :182) 0.86 (0.99: 0.06) 0.02 (0.02 : 0.02) 13 (13 :0)
Fribourg G91 (104 :13) 23 (31 :9) 1120 (1055 :65) 601 (376 :225) 17.79 (19.53 : 7.22) 9.25 (10.15: 5.48) 81 (80 :1)
LTL2dstar 73 (82 :21) 28 (34 :13) 1465 (1195 :270) 465 (383 :82) 3.31 (3.84 :0.11) 0.04 (0.05 :0.02) 136 (130 : 6)
Seminator 2 79 (91 :15) 21 (29 :10) 1266 (1131 :135) 571 (367 :204) 9.51 (11.25 : 0.08) 0.22 (0.39 : 0.02) 363 (362 :1)
Roll 18 (19 : 14) 10 (9: 11) 2116 (1858: 258) 569 (443 : 126) 31.23 (37.85 :7.28) 8.19 (12.23 :2.74) 1109 (1106 : 3)
number of timeouts by 23 %. Notice that the numbers for the LTL dataset do not differ
as much as for random, witnessing the easier structure of the BAs in LTL.
6.2 Comparison with Other Complementation Techniques
In our second experiment, we compared the improved Ranker with other state-of-the-
art tools. We were comparing sizes of output BAs, therefore, we postprocessed each
output automaton with autfilt (simplification level --high). Scatter plots are given
in Fig. 11, where we compare Ranker with Spot (which had the best results on average
from the other tools except Roll) and Roll, and summary statistics are in the lower
part of Table 1. Observe that Ranker has by far the lowest mean (except Roll) and the
third lowest median (after Seminator 2 and Roll, but with less timeouts). Moreover,
comparing the numbers in columns wins and losses we can see that Ranker gives strictly
better results than other tools (wins) more often than the other way round (losses).
In Fig. 11a see that indeed in the majority of cases Ranker gives a smaller BA than
Spot, especially for harder BAs (Spot, however, behaves slightly better on the simpler
BAs from LTL). The results in Fig. 11b do not seem so clear. Roll uses a learning-based
approach—more heavyweight and completely orthogonal to any of the other tools—and
can in some cases output a tiny automaton, but does not scale, as observed by the number
of timeouts much higher than any other tool. It is, therefore, positively surprising that
Ranker could in most of the cases still obtain a much smaller automaton than Roll.
Regarding runtimes, the prototype implementation in Ranker is comparable to Sem-
inator 2, but slower than Spot and LTL2dstar (Spot is the fastest tool). Implementa-
tions of other approaches clearly do not target speed. We note that the number of timeouts
of Ranker is still higher than of some other tools (in particular Piterman,Spot,Fri-
bourg); further state space reduction targeting this particular issue is our future work.
7 Related Work
BA complementation remains in the interest of researchers since their first introduction
by Büchi in [8]. Together with a hunt for efficient complementation techniques, the effort
has been put into establishing the lower bound. First, Michel showed that the lower bound
is 𝑛!(approx. (0.36𝑛)𝑛) [31] and later Yan refined the result to (0.76𝑛)𝑛[43].
132 Vojtˇech Havlena, Ondˇrej Leng´al, Barbora ˇ
Smahıkov´a
1 10 100 1000
Ranker
1
10
100
1000
Spot
(a) Ranker vs Spot
1 10 100 1000
Ranker
1
10
100
1000
ROLL
(b) Ranker vs Roll
Fig. 11: Comparison of the complement size obtained by Ranker and other state-of-the-
art tools (horizontal and vertical dashed lines represent timeouts). Axes are logarithmic.
The complementation approaches can be roughly divided into several branches.
Ramsey-based complementation, the very first complementation construction, where
the language of an input automaton is decomposed into a finite number of equivalence
classes, was proposed by Büchi and was further enhanced in [7]. Determinization-
based complementation was presented by Safra in [36] and later improved by Piterman
in [34] and Redziejowski in [35]. Various optimizations for determinization of BAs were
further proposed in [29]. The main idea of this approach is to convert an input BA into an
equivalent deterministic automaton with different acceptance condition that can be easily
complemented (e.g. Rabin automaton). The complemented automaton is then converted
back into a BA (often for the price of some blow-up). Slice-based complementation tracks
the acceptance condition using a reduced abstraction on a run tree [42,21]. A learning-
based approach was introduced in [27,26]. Allred and Ultes-Nitsche then presented
a novel optimal complementation algorithm in [1]. For some special types of BAs, e.g.,
deterministic [25], semi-deterministic [5], or unambiguous [28], there exist specific
complementation algorithms. Semi-determinization based complementation converts
an input BA into a semi-deterministic BA [11], which is then complemented [4].
Rank-based complementation, studied in [24,15,14,37,22], extends the subset con-
struction for determinization of finite automata by storing additional information in
each macrostate to track the acceptance condition of all runs of the input automaton.
Optimizations of an alternative (sub-optimal) rank-based construction from [24] go-
ing through alternating Büchi automata were presented in [15]. Furthermore, the work
in [22] introduces an optimization of Schewe, in some cases producing smaller au-
tomata (this construction is not compatible with our optimizations). As shown in [9],
the rank-based construction can be optimized using simulation relations. We identified
several heuristics that help reducing the size of the complement in [19], which are
compatible with the heuristics in this paper.
Acknowledgements. We thank anonymous reviewers for their useful remarks that helped
us improve the quality of the paper. This work was supported by the Czech Science
Foundation project 20-07487S and the FIT BUT internal project FIT-S-20-6427.
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 133
References
1. Allred, J.D., Ultes-Nitsche, U.: A simple and optimal complementation algorithm for Büchi
automata. In: Proceedings of the Thirty third Annual IEEE Symposium on Logic in Computer
Science (LICS 2018). pp. 46–55. IEEE Computer Society Press (July 2018)
2. Babiak, T., Blahoudek, F., Duret-Lutz, A., Klein, J., Křetínský, J., Müller, D., Parker, D.,
Strejček, J.: The Hanoi omega-automata format. In: Computer Aided Verification - 27th
International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceed-
ings, Part I. Lecture Notes in Computer Science, vol. 9206, pp. 479–486. Springer (2015).
https://doi.org/10.1007/978-3-319-21690-4_31
3. Blahoudek, F., Heizmann, M., Schewe, S., Strejček, J., Tsai, M.H.: Complementing semi-
deterministic büchi automata. In: Tools and Algorithms for the Construction and Analysis of
Systems. pp. 770–787. Springer Berlin Heidelberg, Berlin, Heidelberg (2016)
4. Blahoudek, F., Duret-Lutz, A., Strejček, J.: Seminator 2 can complement generalized Büchi
automata via improved semi-determinization. In: Proceedings of the 32nd International Con-
ference on Computer-Aided Verification (CAV’20). Lecture Notes in Computer Science, vol.
12225, pp. 15–27. Springer (Jul 2020)
5. Blahoudek, F., Heizmann, M., Schewe, S., Strejček, J., Tsai, M.: Complementing semi-
deterministic Büchi automata. In: Tools and Algorithms for the Construction and Analysis of
Systems - 22nd International Conference, TACAS 2016, Held as Part of the European Joint
Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands,
April 2-8, 2016, Proceedings. Lecture Notes in Computer Science, vol. 9636, pp. 770–787.
Springer (2016). https://doi.org/10.1007/978-3-662-49674-9_49
6. Boigelot, B., Jodogne, S., Wolper, P.: On the use of weak automata for deciding linear
arithmetic with integer and real variables. In: Automated Reasoning, First International Joint
Conference, IJCAR 2001, Siena, Italy, June 18-23, 2001, Proceedings. Lecture Notes in
Computer Science, vol. 2083, pp. 611–625. Springer (2001). https://doi.org/10.1007/3-540-
45744-5_50
7. Breuers, S., Löding, C., Olschewski, J.: Improved Ramsey-based Büchi complementation.
In: Proc. of FOSSACS’12. pp. 150–164. Springer (2012)
8. Büchi, J.R.: On a decision method in restricted second order arithmetic. In: Proc. of Inter-
national Congress on Logic, Method, and Philosophy of Science 1960. Stanford Univ. Press,
Stanford (1962)
9. Chen, Y., Havlena, V., Lengál, O.: Simulations in rank-based Büchi automata complementa-
tion. In: Programming Languages and Systems - 17th Asian Symposium, APLAS 2019, Nusa
Dua, Bali, Indonesia, December 1-4, 2019, Proceedings. Lecture Notes in Computer Science,
vol. 11893, pp. 447–467. Springer (2019). https://doi.org/10.1007/978-3-030-34175-6_23
10. Chen, Y., Heizmann, M., Lengál, O., Li, Y., Tsai, M., Turrini, A., Zhang, L.: Ad-
vanced automata-based algorithms for program termination checking. In: Proceedings of
the 39th ACM SIGPLAN Conference on Programming Language Design and Implemen-
tation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. pp. 135–150. ACM (2018).
https://doi.org/10.1145/3192366.3192405
11. Courcoubetis, C., Yannakakis, M.: Verifying temporal properties of finite-state probabilis-
tic programs. In: 29th Annual Symposium on Foundations of Computer Science, White
Plains, New York, USA, 24-26 October 1988. pp. 338–345. IEEE Computer Society (1988).
https://doi.org/10.1109/SFCS.1988.21950
12. Duret-Lutz, A., Lewkowicz, A., Fauchille, A., Michaud, T., Renault, É., Xu, L.: Spot 2.0 a
framework for LTL and 𝜔-automata manipulation. In: Automated Technology for Verification
and Analysis. pp. 122–129. Springer International Publishing, Cham (2016)
134 Vojtˇech Havlena, Ondˇrej Leng´al, Barbora ˇ
Smahıkov´a
13. Fogarty, S., Vardi, M.Y.: Büchi complementation and size-change termination. In: Proc. of
TACAS’09. pp. 16–30. Springer (2009)
14. Friedgut, E., Kupferman, O., Vardi, M.: Büchi complementation made tighter. International
Journal of Foundations of Computer Science 17, 851–868 (2006)
15. Gurumurthy, S., Kupferman, O., Somenzi, F., Vardi, M.Y.: On complementing non-
deterministic Büchi automata. In: Correct Hardware Design and Verification Methods,
12th IFIP WG 10.5 Advanced Research Working Conference, CHARME 2003, LAquila,
Italy, October 21-24, 2003, Proceedings. LNCS, vol. 2860, pp. 96–110. Springer (2003).
https://doi.org/10.1007/978-3-540-39724-3_10
16. Havlena, V., Lengál, O., Smahlíková, B.: Sky is not the limit: Tighter rank bounds for elevator
automata in Büchi automata complementation (technical report). CoRR abs/2110.10187
(2021), https://arxiv.org/abs/2110.10187
17. Havlena, V., Lengál, O., Šmahlíková, B.: Deciding S1S: Down the rabbit hole and through
the looking glass. In: Proceedings of NETYS’21. pp. 215–222. No. 12754 in LNCS, Springer
Verlag (2021). https://doi.org/10.1007/978-3-030-91014-3_15
18. Havlena, V., Lengál, O., Šmahlíková, B.: Ranker (2021), https://github.com/vhavlena/ranker
19. Havlena, V., Lengál, O.: Reducing (To) the Ranks: Efficient Rank-Based Büchi Automata
Complementation. In: Proc. of CONCUR’21. LIPIcs, vol. 203, pp. 2:1–2:19. Schloss
Dagstuhl, Dagstuhl, Germany (2021). https://doi.org/10.4230/LIPIcs.CONCUR.2021.2,
iSSN: 1868-8969
20. Heizmann, M., Hoenicke, J., Podelski, A.: Termination analysis by learning terminating
programs. In: Proc. of CAV’14. pp. 797–813. Springer (2014)
21. Kähler, D., Wilke, T.: Complementation, disambiguation, and determinization of Büchi au-
tomata unified. In: Proc. of ICALP’08. pp. 724–735. Springer (2008)
22. Karmarkar, H., Chakraborty, S.: On minimal odd rankings for Büchi complemen-
tation. In: Proc. of ATVA’09. LNCS, vol. 5799, pp. 228–243. Springer (2009).
https://doi.org/10.1007/978-3-642-04761-9_18
23. Klein, J., Baier, C.: On-the-fly stuttering in the construction of deterministic omega
-automata. In: Proc. of CIAA’07. LNCS, vol. 4783, pp. 51–61. Springer (2007).
https://doi.org/10.1007/978-3-540-76336-9_7
24. Kupferman, O., Vardi, M.Y.: Weak alternating automata are not that weak. ACM Trans.
Comput. Log. 2(3), 408–429 (2001). https://doi.org/10.1145/377978.377993
25. Kurshan, R.P.: Complementing deterministic Büchi automata in polynomial time. J. Comput.
Syst. Sci. 35(1), 59–71 (1987). https://doi.org/10.1016/0022-0000(87)90036-5
26. Li, Y., Sun, X., Turrini, A., Chen, Y., Xu, J.: ROLL 1.0: 𝜔-regular language learn-
ing library. In: Proc. of TACAS’19. LNCS, vol. 11427, pp. 365–371. Springer (2019).
https://doi.org/10.1007/978-3-030-17462-0_23
27. Li, Y., Turrini, A., Zhang, L., Schewe, S.: Learning to complement Büchi automata. In: Proc.
of VMCAI’18. pp. 313–335. Springer (2018)
28. Li, Y., Vardi, M.Y., Zhang, L.: On the power of unambiguity in Büchi complementation. In:
Proc. of GandALF’20. EPTCS, vol. 326, pp. 182–198. Open Publishing Association (2020).
https://doi.org/10.4204/EPTCS.326.12
29. Löding, C., Pirogov, A.: New optimizations and heuristics for determinization of büchi
automata. In: Automated Technology for Verification and Analysis. pp. 317–333. Springer
International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_18
30. Mayr, R., Clemente, L.: Advanced automata minimization. In: Proc. of POPL’13. pp. 63–74
(2013)
31. Michel, M.: Complementation is more difficult with automata on infinite words. CNET, Paris
15 (1988)
32. Nielson, F., Nielson, H.R., Hankin, C.: Principles of program analysis. Springer (1999).
https://doi.org/10.1007/978-3-662-03811-6
Sky Is Not the Limit: Tighter Rank Bounds in uchi Automata Complementation 135
33. Oei, R., Ma, D., Schulz, C., Hieronymi, P.: Pecan: An automated theorem prover for automatic
sequences using büchi automata. CoRR abs/2102.01727 (2021), https://arxiv.org/abs/2102.
01727
34. Piterman, N.: From nondeterministic Büchi and Streett automata to deterministic parity
automata. In: Proc. of LICS’06. pp. 255–264. IEEE (2006)
35. Redziejowski, R.R.: An improved construction of deterministic omega-automaton using
derivatives. Fundam. Informaticae 119(3-4), 393–406 (2012). https://doi.org/10.3233/FI-
2012-744
36. Safra, S.: On the complexity of 𝜔-automata. In: Proc. of FOCS’88. pp. 319–327. IEEE (1988)
37. Schewe, S.: Büchi complementation made tight. In: Proc. of STACS’09. LIPIcs, vol. 3, pp.
661–672. Schloss Dagstuhl (2009). https://doi.org/10.4230/LIPIcs.STACS.2009.1854
38. Sistla, A.P., Vardi, M.Y., Wolper, P.: The Complementation Problem for Büchi Automata with
Applications to Temporal Logic. Theoretical Computer Science 49(2-3), 217–237 (1987)
39. Tabakov, D., Vardi, M.Y.: Experimental evaluation of classical automata constructions. In:
Proc. of LPAR’05. pp. 396–411. Springer (2005)
40. Tsai, M.H., Fogarty, S., Vardi, M.Y., Tsay, Y.K.: State of Büchi complementation. In: Imple-
mentation and Application of Automata. pp. 261–271. Springer Berlin Heidelberg, Berlin,
Heidelberg (2011)
41. Tsai, M.H., Tsay, Y.K., Hwang, Y.S.: GOAL for games, omega-automata, and logics. In:
Computer Aided Verification. pp. 883–889. Springer Berlin Heidelberg, Berlin, Heidelberg
(2013)
42. Vardi, M.Y., Wilke, T.: Automata: From logics to algorithms. Logic and Automata 2, 629–736
(2008)
43. Yan, Q.: Lower bounds for complementation of 𝜔-automata via the full automata technique.
In: Automata, Languages and Programming. pp. 589–600. Springer Berlin Heidelberg, Berlin,
Heidelberg (2006)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which per-
mits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not in-
cluded in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need to ob-
tain permission directly from the copyright holder.
136 Vojtˇech Havlena, Ondˇrej Leng´al, Barbora ˇ
Smahıkov´a
On-The-Fly Solving for Symbolic Parity Games
Maurice Laveaux1() , Wieger Wesselink1, and Tim A.C. Willemse1,2
1Eindhoven University of Technology, Eindhoven, The Netherlands
2ESI (TNO), Eindhoven, The Netherlands
{m.laveaux, j.w.wesselink, t.a.c.willemse}@tue.nl
Abstract. Parity games can be used to represent many different kinds
of decision problems. In practice, tools that use parity games often rely
on a specification in a higher-order logic from which the actual game
can be obtained by means of an exploration. For many of these decision
problems we are only interested in the solution for a designated vertex in
the game. We formalise how to use on-the-fly solving techniques during
the exploration process, and show that this can help to decide the winner
of such a designated vertex in an incomplete game. Furthermore, we
define partial solving techniques for incomplete parity games and show
how these can be made resilient to work directly on the incomplete game,
rather than on a set of safe vertices. We implement our techniques for
symbolic parity games and study their effectiveness in practice, showing
that speed-ups of several orders of magnitude are feasible and overhead
(if unavoidable) is typically low.
1 Introduction
A parity game is a two-player game with an ω-regular winning condition, played
by players (‘even’) and (‘odd’) on a directed graph. The true complexity of
solving parity games is still a major open problem, with the most recent break-
throughs yielding algorithms running in quasi-polynomial time, see, e.g., [18,7].
Apart from their intriguing status, parity games pop up in various fundamental
results in computer science (e.g., in the proof of decidability of a monadic second-
order theory). In practice, parity games provide an elegant, uniform framework
to encode many relevant decision problems, which include model checking prob-
lems, synthesis problems and behavioural equivalence checking problems.
Often, a decision problem that is encoded as a parity game, can be answered
by determining which of the two players wins a designated vertex in the game
graph. Depending on the characteristics of the game, it may be the case that
only a fraction of the game is relevant for deciding which player wins a vertex.
For instance, deciding whether a transition system satisfies an invariant can be
encoded by a simple, solitaire (i.e., single player) parity game. In such a game,
player wins all vertices that are sinks (i.e., have no successors), and all states
leading to such sinks, so checking whether sinks are reachable from a designated
vertex suffices to determine whether this vertex is won by , too. Clearly, as soon
as a sink is detected, any further inspection of the game becomes irrelevant.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 137–155, 2022.
https://doi.org/10.1007/978-3-030-99527-0_8
A complicating factor is that in practice, the parity games that encode deci-
sion problems are not given explicitly. Rather, they are specified in some higher-
order logic such as a parameterised Boolean equation system, see, e.g. [11]. Ex-
ploring the parity game from such a higher-order specification is, in general,
time-and memory-consuming. To counter this, symbolic exploration techniques
have been proposed, see e.g. [19]. These explore the game graph on-the-fly and
exploit efficient symbolic data structures such as LDDs [13] to represent sets of
vertices and edges. Many parity game solving algorithms can be implemented
quite effectively using such data structures [20,28,29], so that in the end, explor-
ing the game graph often remains the bottleneck.
In this paper, we study how to combine the exploration of a parity game
and the on-the-fly solving of the explored part, with the aim to speed-up the
overall solving process. The central problem when performing on-the-fly solving
during the exploration phase is that we have to deal with incomplete information
when determining the winner for a designated vertex. Moreover, in the symbolic
setting, the exploration order may be unpredictable when advanced strategies
such as chaining and saturation [9] are used.
To formally reason about all possible exploration strategies and the artefacts
they generate, we introduce the concept of an incomplete parity game, and an
ordering on these. Incomplete parity games are parity games where for some
vertices not all outgoing edges are necessarily known. In practice, these could be
identified by, e.g., the todo queue in a classical breadth-first search. The extra
information captured by an incomplete parity game allows us to characterise
the safe set for a given player α. This is a set of vertices for which it can be
established that if player αwins the vertex, then she cannot lose the vertex if
more information becomes available. We prove an optimality result for safe sets,
which, informally, states that a safe set for player αis also the largest set with
this property (see Theorem 1).
The vertices won by player αin an α-safe set can be determined using a
standard parity game solving algorithm such as, e.g., Zielonka’s recursive al-
gorithm [31] or Priority Promotion [2]. However, these algorithms may be less
efficient as on-the-fly solvers. For this reason, we study three symbolic partial
solvers: solitaire winning cycle detection, forced winning cycle detection and fa-
tal attractors [17]. In particular cases, first determining the safe set for a player
and only subsequently solving the game using one of these partial solvers will
incur an additional overhead. As a final result, we therefore prove that all these
solvers can be (modified to) run on the incomplete game as a whole, rather than
on the safe set of a player (see Propositions 1-3).
As a proof of concept, we have implemented an (open source) symbolic tool
for the mCRL2 toolset [6], that explores a parity game specified by a parame-
terised Boolean equation system and solves these games on-the-fly. We report
on the effectiveness of our implementation on typical parity games stemming
from, e.g., model checking and equivalence checking problems, showing that it
can speed up the process with several orders of magnitude, while adding low
overhead if the entire game is needed for solving.
138 M. Laveaux, W. Wesselink and T.A.C. Willemse
Related Work. Our work is related to existing techniques for solving symbolic
parity games such as [20,19], as we extend these existing methods with on-the-
fly solving. Naturally, our work is also related to existing work for on-the-fly
model checking. This includes work for on-the-fly (explicit) model checking of
regular alternation-free modal mu-calculus formulas [23] and work for on-the-
fly symbolic model checking of RCTL [1]. Compared to these our method is
more general as it can be applied to the full modal mu-calculus (with data),
which subsumes RCTL and the alternation-free subset. Optimisations such as
the observation that checking LTL formulas of type AG reduces to reachability
checks [14] are a special case of our methods and partial solvers. Furthermore, our
methods are not restricted to model checking problems only and can be applied
to any parity game, including decision problems such as equivalence checking [8].
Furthermore, our method is agnostic to the exploration strategy employed.
Structure of the paper. In Section 2we recall parity games. In Section 3we
introduce incomplete parity games and show how partial solving can be applied
correctly. In Section 4we present several partial solvers that we employ for
on-the-fly solving. Finally, in Section 5we discuss the implementation of these
techniques and apply them to several practical examples. The omitted proofs for
the supporting lemmas can be found in [22].
2 Preliminaries
A parity game is an infinite-duration, two-player game that is played on a finite
directed graph. The objective of the two players, called even (denoted by ) and
odd (denoted by ), is to win vertices in the graph.
Definition 1. Aparity game is a directed graph G= (V, E, p, (V, V)), where
Vis a finite set of vertices, partitioned in sets Vand Vof vertices owned
by and , respectively;
EV×Vis the edge relation;
p:VNis a function that assigns a priority to each node.
Henceforth, let G= (V, E , p, (V, V)) be an arbitrary parity game. Throughout
this paper, we use αto denote an arbitrary player and ¯αdenotes the opponent.
We write vE to denote the set of successors {wV|(v , w)E}of vertex
v. The set sinks(G) is defined as the largest set UVsatisfying for all vU
that vE =;i.e.,sinks(G) is the set of all sinks: vertices without successors.
If we are only concerned with the sinks of player α, we write sinksα(G); i.e.,
sinksα(G) = Vαsinks(G). We write GU, for UV, to denote the subgame
(U, (U×U)E, p U,(VU, VU)), where pU(v) = p(v) for all vertices
vU.
Example 1. Consider the graph depicted in Figure 1, representing a parity game.
Diamond-shaped vertices are owned by player , whereas box-shaped vertices
are owned by player . The priority of a vertex is written inside the vertex.
Vertex u1is a sink owned by player .
On-The-Fly Solving for Symbolic Parity Games 139
2
u0
3
u1
0
u2
1
u3
2
u4
Fig. 1. An example parity game
Plays and strategies. The game is played as follows. Initially, a token is placed on
a vertex of the graph. The owner of a vertex on which the token resides gets to
decide the successor vertex (if any) that the token is moved to next. A maximal
sequence of vertices (i.e., an infinite sequence or a finite sequence ending in a
sink) visited by the token by following this simple rule is called a play. A finite
play πis won by player if the sink in which it ends is owned by player , and
it is won by player if the sink is owned by player . An infinite play πis won
by player if the minimal priority that occurs infinitely often along πis even,
and it is won by player otherwise.
A strategy σα:VVαVfor player αis a partial function that prescribes
where player αmoves the token next, given a sequence of vertices visited by the
token. A play v0v1. . . is consistent with a strategy σif and only if σ(v0. . . vi) =
vi+1 for all ifor which σ(v0. . . vi) is defined. Strategy σαis winning for player
αin vertex vif all plays consistent with σαand starting in vare won by α.
Player αwins vertex vif and only if she has a winning strategy σαfor vertex v.
The parity game solving problem asks to compute the set of vertices W, won
by player and the set W, won by player . Note that since parity games are
determined [31,24], every vertex is won by one of the two players. That is, the
sets Wand Wpartition the set V.
Example 2. Consider the parity game depicted in Figure 1. In this game, the
strategy σ, partially defined as σ(πu0) = u2and σ(πu2) = u0, for arbitrary
π, is winning for player in u0and u2. Player wins vertex u3using strategy
σ(πu3) = u4, for arbitrary π. Note that player is always forced to move the
token from u4to u3. Vertex u1is a sink, owned by player , and hence, won by
player .
Dominions. A strategy σαis said to be closed on a set of vertices UViff
every play, consistent with σαand starting in a vertex vUremains in U. If
player αhas a strategy that is closed on U, we say that the set Uis α-closed.
Adominion for player αis a set of vertices UVsuch that player αhas a
strategy σαthat is closed on Uand which is winning for α. Note that the sets
Wand Ware dominions for player and player , respectively, and, hence,
every vertex won by player αmust belong to an α-dominion.
Example 3. Reconsider the parity game of Figure 1. Observe that player has
a closed strategy on {u3, u4}, which is also winning for player . Hence, the
set {u3, u4}is an -dominion. Furthermore, the set {u2, u3, u4}is -closed.
However, none of the strategies for which {u2, u3, u4}is closed for player is
winning for her; therefore {u2, u3, u4}is not an -dominion.
140 M. Laveaux, W. Wesselink and T.A.C. Willemse
Predecessors, control predecessors and attractors. Let UVbe a set of vertices.
We write pre(G, U ) to denote the set of predecessors {vV| uU:u
vE}of Uin G. The control predecessor set of Ufor player αin G, denoted
cpreα(G, U), contains those vertices for which αis able to force entering Uin
one step. It is defined as follows:
cpreα(G, U)=(Vαpre(G, U)) (V¯α\(pre(G, V \U)sinks(G)))
Note that both pre and cpre are monotone operators on the complete lattice
(2V,). The α-attractor to Uin G, denoted Attrα(G, U ), is the set of vertices
from which player αcan force play to reach a vertex in U:
Attrα(G, U) = µZ.(Ucpreα(G, Z))
The α-attractor to Ucan be computed by means of a fixed point iteration,
starting at Uand adding α-control predecessors in each iteration until a stable
set is reached. We note that the α-attractor to an α-dominion Dis again an
α-dominion.
Example 4. Consider the parity game Gof Figure 1once again. The -control
predecessors of {u2}is the set {u0}. Note that since player can avoid moving
to u2from vertex u3by moving to vertex u4, vertex u3is not among the -
control predecessors of {u2}. The -attractor to {u2}is the set {u0, u2}, which
is the largest set of vertices for which player has a strategy to force play to
the set of vertices {u2}.
3 Incomplete Parity Games
In many practical applications that rely on parity game solving, the parity game
is gradually constructed by means of an exploration, often starting from an ‘ini-
tial’ vertex. This is, for instance, the case when using parity games in the context
of model checking or when deciding behavioural preorders or equivalences. For
such applications, it may be profitable to combine exploration and solving, so
that the costly exploration can be terminated when the winner of a particular
vertex of interest (often the initial vertex) has been determined. The example
below, however, illustrates that one cannot naively solve the parity game con-
structed so far.
Example 5. Consider the parity game Gin Figure 2, consisting of all vertices
and only the solid edges. This game could, for example, be the result of an
exploration starting from u4. Then G {u0, u1, u2, u3, u4, u5}is a subgame for
which we can conclude that all vertices form an -dominion. However, after
exploring the dotted edges, player can escape to vertex u4from vertex u5.
Consequently, vertices u4and u5are no longer won by player in the extended
game. Furthermore, observe that the additional edge from u3to u5does not
affect the previously established fact that player wins this vertex.
On-The-Fly Solving for Symbolic Parity Games 141
2
u0
3
u1
0
u2
1
u4
2
u5
2
u3
Fig. 2. A parity game where the dotted edges are not yet known.
To facilitate reasoning about games with incomplete information, we first intro-
duce the notion of an incomplete parity game.
Definition 2. An incomplete parity game is a structure = (G, I ), where Gis
a parity game (V, E, p, (V, V)), and IVis a set of vertices with potentially
unexplored successors. We refer to the set Ias the set of incomplete vertices;
the set V\Iis the set of complete vertices.
Observe that (G, ) is a ‘standard’ parity game. We permit ourselves to use
the notation for parity game notions such as plays, strategies, dominions, etcetera
also in the context of incomplete parity games. In particular, for = (G, I),
we will write pre(, U) and Attrα(, U ) to indicate pre(G, U) and Attrα(G, U ),
respectively. Furthermore, we define Uas the structure (GU, I U).
Intuitively, while exploring a parity game, we extend the set of vertices and
edges by exploring the incomplete vertices. Doing so gives rise to potentially
new incomplete vertices. At each stage in the exploration, the incomplete parity
game extends incomplete parity games explored in earlier stages. We formalise
the relation between incomplete parity games, abstracting from any particular
order in which vertices and edges are explored.
Definition 3. Let = ((V, E, p, (V, V)), I),= ((V, E, p,(V
, V
)), I)be
incomplete parity games. We write iff the following conditions hold:
(1) VV,VV
and VV
;
(2) EEand ((V\I)×V)EE;
(3) p=pV;
(4) IVI
Conditions (1) and (3) are self-explanatory. Condition (2) states that on the
one hand, no edges are lost, and, on the other hand, Ecan only add edges
from vertices that are incomplete: for complete vertices, Especifies no new
successors. Finally, condition (4) captures that the set of incomplete vertices I
cannot contain vertices that were previously complete. We note that the ordering
is reflexive, anti-symmetric and transitive.
Example 6. Suppose that = (G, I ) is the incomplete parity game depicted in
Figure 2, where Gis the game with all vertices and only the solid edges, and
I={u3, u5}. Then , where = (G, I) is the incomplete parity game
where Gis the depicted game with all vertices and both the solid edges and
dotted edges, and I=.
142 M. Laveaux, W. Wesselink and T.A.C. Willemse
Let us briefly return to Example 5. We concluded that the winner of vertex
u4(and also u5) changed when adding new information. The reason is that
player has a strategy to reach an incomplete vertex owned by her. Such an
incomplete vertex may present an opportunity to escape from plays that would
be non-winning otherwise. On the other hand, the incomplete vertex u3has
already been sufficiently explored to allow for concluding that this vertex is
won by player , even if more successors are added to u3. This suggests that
for some subset of vertices, we can decide their winner in an incomplete parity
game and preserve that winner in all future extensions of the game. We formally
characterise this set of vertices in the definition below.
Definition 4. Let = (G, I), with G= (V , E, p, (V, V)) be an incomplete
parity game. The α-safe vertices for , denoted by safeα(), is the set V\
Attr¯α(G, V¯αI).
Example 7. Consider the incomplete parity game of Example 6once more. We
have safe() = {u0, u1, u2, u3}and safe() = {u0, u1, u2, u4, u5}.
In the remainder of this section, we show that it is indeed the case that while
exploring a parity game, one can only safely determine the winners in the sets
safe() and safe(), respectively. More specifically, we claim (Lemma 1) that
all α-dominions found in safeα() are preserved in extensions of the game, and
(Lemma 2) the winner of vertices not in safeα() are not necessarily won by the
same player in extensions of the game.
Lemma 1. Given two incomplete games and such that . Any α-
dominion in safeα()is also an α-dominion in .
Example 8. Recall that in Example 7, we found that safe() = {u0, u1, u2, u3}.
Observe that in the incomplete parity game of Example 6, restricted to vertices
{u0, u1, u2, u3}, all vertices are won by player , and, hence, {u0, u1, u2, u3}is
an -dominion. Following Lemma 1we can indeed conclude that this remains an
-dominion in all extensions of , and, in particular, for the (complete) parity
game of Example 6.
Lemma 2. Let be an incomplete parity game. Suppose that Wis an α-
dominion in . If W⊆ safeα(), then there is an (incomplete) parity game
such that and all vertices in W\safeα()are won by ¯α.
As a corollary of the above lemma, we find that α-dominions that contain
vertices outside of the α-safe set are not guaranteed to be dominions in all
extensions of the incomplete parity game.
Corollary 1. Let be an incomplete parity game. Suppose that Wis an α-
dominion in . If W⊆ safeα(), then there is an (incomplete) parity game
such that and Wis not an α-dominion in .
The theorem below summarises the two previous results, claiming that the
sets safe() and safe() are the optimal subsets that can be used safely when
combining solving and the exploration of a parity game.
On-The-Fly Solving for Symbolic Parity Games 143
Theorem 1. Let = (G, I), with G= (V , E, p, (V, V)), be an incomplete
parity game. Define Wαas the union of all α-dominions in safeα(), and let
W?=V\(WW). Then W?is the largest set of vertices vfor which there
are incomplete parity games αand ¯αsuch that αand ¯αand vis
won by αin αand vis won by ¯αin ¯α.
Proof. Let , with G= (V , E, p, (V, V)) be an incomplete parity game. Pick
a vertex vW?. Suppose that in G, vertex vW?is won by player α. Let
α=. Then αand vis also won by αin α.
Next, we argue that there must be a game ¯αsuch that ¯αand vis
won by ¯αin ¯α. Since vW?is won by player αin G,vmust belong to an
α-dominion in G. Towards a contradiction, assume that vsafeα(). Then there
must also be a α-dominion containing vin Gsafeα(), since ¯αcannot escape
the set safeα(). But then vWα. Contradiction, so v /safeα(). So, vmust
be part of an α-dominion Din Gsuch that D⊆ safeα(). By Lemma 2, we find
that there is an incomplete parity game ¯αsuch that ¯αand all vertices in
D\safeα(), and vertex vDin particular, are won by ¯αin ¯α.
Finally, we argue that W?cannot be larger. Pick a vertex v /W?. Then there
must be some player αsuch that vWα, and, consequently, there must be an
α-dominion Dsafeα() such that vD. But then by Lemma 1, we find
that vis won by αin all incomplete parity games such that .
4 On-the-fly Solving
In the previous section we saw that for any solver solveα, which accepts a parity
game as input and returns an α-dominion Wα, a correct on-the-fly solving algo-
rithm can be obtained by computing Wα=solveα(safeα()) while exploring
an (incomplete) parity game . While this approach is clearly sound, computing
the set of safe vertices can be expensive for large state spaces and potentially
wasteful when no dominions are found afterwards. We next introduce safe at-
tractors which, we show, can be used to search for specific dominions without
first computing the α-safe set of vertices.
4.1 Safe Attractors
We start by observing that the α-attractor to a set Uin an incomplete parity
game does not make a distinction between the set of complete and incomplete
vertices. Consequently, it may wrongly conclude that αhas a strategy to force
play to Uwhen the attractor strategy involves incomplete vertices owned by ¯α.
We thus need to make sure that such vertices are excluded from consideration.
This can be achieved by considering the set of unsafe vertices V¯αIas potential
vertices that can be used by the other player to escape. We define the safe α-
attractor as the least fixed point of the safe control predecessor. The latter is
defined as follows:
spreα(, U)=(Vαpre(, U)) (V¯α\(pre(, V \U)sinks()I))
144 M. Laveaux, W. Wesselink and T.A.C. Willemse
Lemma 3. Let be an incomplete parity game. For all vertex sets Xsafeα()
it holds that cpreα(safeα(), X) = spreα(, X ).
The safe α-attractor to U, denoted SAttrα(, U ), is the set of vertices from
which player αcan force to safely reach Uin :
SAttrα(, U) = µZ.(Uspreα(, Z))
Lemma 4. Let be an incomplete parity game, and Xsafeα(). Then
Attrα(safeα(), X) = SAttrα(, X ).
In particular, we can conclude the following:
Corollary 2. Let be an incomplete parity game, and Xsafeα()be an
α-dominion. Then SAttrα(, X)is an α-dominion for all satisfying .
One application of the above corollary is the following: since on-the-fly solving is
typically performed repeatedly, previously found dominions can be expanded by
computing the safe α-attractor towards these already solved vertices. Another
corollary is the following, which states that complete sinks can be safely attracted
towards.
Corollary 3. Let = (G, I)be an incomplete parity game and let be such
that . Then SAttrα(,sinks¯α()\I)is an α-dominion in .
4.2 Partial Solvers
In practice, a full-fledged solver, such as Zielonka’s algorithm [31] or one of
the Priority Promotion variants [2], may be costly to run often while exploring
a parity game. Instead, cheaper partial solvers may be used that search for
a dominion of a particular shape. We study three such partial solvers in this
section, with a particular focus on solvers that lend themselves for parity games
that are represented symbolically using, e.g., BDDs [5], MDDs [25] or LDDs [13].
For the remainder of this section, we fix an arbitrary incomplete parity game
= ((V, E , p, (V, V)), I).
Winning solitaire cycles. A simple cycle in can be represented by a finite
sequence of distinct vertices v0v1. . . vnsatisfying v0vnE. Such a cycle is an
α-solitaire cycle whenever all vertices on that cycle are owned by player α.
Observe that if all vertices on an α-solitaire cycle have a priority that is of
the same parity as the owner α, then all vertices on that cycle are won by player
α. Formally, these are thus cycles through vertices in the set PαVα, where
P={vV\sinks()|p(v) mod 2 = 0}and P={vV\sinks()|p(v)
mod 2 = 1}. Let Cα
sol() represent the largest set of α-solitaire winning cycles.
Then Cα
sol() = νZ.(PαVαpre(, Z )).
On-The-Fly Solving for Symbolic Parity Games 145
Proposition 1. The set Cα
sol()is an α-dominion and we have Cα
sol()safeα().
Proof. We first prove that Cα
sol()safeα(). We show, by means of an induction
on the fixed point approximants Aiof the attractor, that Cα
sol()Attr¯α(, V ¯α
I) = . The base case follows immediately, as Cα
sol()A0=Cα
sol() =.
For the induction, we assume that Cα
sol()Ai=; we show that also Cα
sol()
((V¯αI)cpre¯α(, Ai)) = . First, observe that Cα
sol()Vα; hence, it suffices
to prove that Cα
sol()(Vα\(pre(, V \Ai)sinks()) = . But this follows
immediately from the fact that for every vertex v Cα
sol(), we have vPα
Vαpre(,Cα
sol()); more specifically, we have vE Cα
sol()=for all v Cα
sol().
The fact that Cα
sol() is an α-dominion follows from the fact that for every
vertex v Cα
sol(), there is some wvE Cα
sol(). This means that player α
must have a strategy that is closed on Cα
sol(). Since all vertices in Cα
sol() are of
the priority that is beneficial to α, this closed strategy is also winning for α.
Observe that winning solitaire cycles can be computed without first computing
the α-safe set. Parity games that stand to profit from detecting winning solitaire
cycles are those originating from verifying safety properties.
Winning forced cycles. In general, a cycle in safeα(), through vertices in P
can contain vertices of both players, providing player an opportunity to break
the cycle if that is beneficial to her. Nevertheless, if breaking a cycle always
inadvertently leads to another cycle through P, then we may conclude that all
vertices on these cycles are won by player . We call these cycles winning forced
cycles for player . A dual argument applies to cycles through P. Let Cα
for()
represent the largest set of vertices that are on winning forced cycles for player
α. More formally, we define Cα
for() = νZ.(Pαsafeα()cpreα(, Z )).
Lemma 5. The set Cα
for()is an α-dominion and we have Cα
for()safeα().
A possible downside of the above construction is that it again requires to first
compute safeα(), which, in particular cases, may incur an additional overhead.
Instead, we can compute the same set using the safe control predecessor. We
define Cα
sfor() = νZ.(Pαspreα(, Z)).
Proposition 2. We have Cα
for() = Cα
sfor().
Proof. Let τ(Z) = Pαspreα(, Z ). We use set inclusion to show that Cα
for() is
indeed a fixed point of τ.
ad Cα
for()τ(Cα
for()). Pick a vertex v Cα
for(). By definition of Cα
for(),
we have vPαsafeα()cpreα(,Cα
for()). Observe that safeα()
cpreα(,Cα
for()) = safeα()cpreα(safeα(),Cα
for()). But then, since
Cα
for()safeα(), we find, by Lemma 3, that cpreα(safeα(),Cα
for()) =
spreα(,Cα
for()). Hence, vPαspreα(,Cα
for()) = τ(Cα
for()).
ad Cα
for()τ(Cα
for()). Again pick a vertex vτ(Cα
for()). Then v
Pαspreα(,Cα
for()). Since Cα
for()safeα(), by Lemma 3, we again have
spreα(,Cα
for()) = cpreα(safeα(),Cα
for()). But then it must be the case
that vsafeα(). Moreover, cpreα(safeα(),Cα
for()) cpreα(,Cα
for()).
So vPαsafeα()cpreα(,Cα
for()) = Cα
for().
146 M. Laveaux, W. Wesselink and T.A.C. Willemse
We show next that for any Z=τ(Z), we have Z Cα
for(). Let Zbe such. We first
show that for every vZVα, there is some wvEZ, and for every vZV¯α,
we have v /sinks(), v /Iand vE Z. Pick vZVα. Then vτ(Z)Vα=
PαVαspreα(, Z)pre(, Z ). But then vE Z=. Next, let vZV¯α.
Then vτ(Z)V¯α=PαV¯αspreα(, Z )V¯α\(pre(, V \Z)sinks()I).
So v /pre(, V \Z)sinks()I. Consequently, vE Z,v /sinks() and
v /I.
Since for every vZVα, we have vE Z=, there must be a strategy
for player αto move to another vertex in Z. Let σbe this strategy. Moreover,
since for all vZV¯αwe have vE Z, we find that σis closed on Zand since
Zsinks() = , strategy σinduces forced cycles. Moreover, since ZPα, we
can conclude that all vertices in Zare on winning forced cycles.
Finally, we must argue that Zsafeα(). But this follows from the fact that
ZV¯αI=, and, hence, also ZAttr¯α(, V ¯αI) = . Since Zis contained
within Pαsafeα(), we find that Z Cα
for().
Fatal attractors. Both solitaire cycles and forced cycles utilise the fact that the
parity winning condition becomes trivial if the only priorities that occur on
a play are of the parity of a single player. Fatal attractors [17] were originally
conceived to solve parts of a game using algorithms that have an appealing worst-
case running time; for a detailed account, we refer to [17]. While ibid. investigates
several variants, the main idea behind a fatal attractor is that it identifies cycles
in which the priorities are non-decreasing until the dominating priority of the
attractor is (re)visited. We focus on a simplified (and cheaper) variant of the
psolB algorithm of [17], which is based on the concept of a monotone attractor,
which, in turn, relies on the monotone control predecessor defined below, where
Pc={vV|p(v)c}:
Mcpreα(, Z, U, c) = Pccpreα(, Z U)
The monotone attractor for a given priority is then defined as the least fixed point
of the monotone control predecessor for that priority, formally MAttrα(, U, c) =
µZ.Mcpreα(, Z, U, c). A fatal attractor for priority cis then the largest set of
vertices closed under the monotone attractor for priority c;i.e.,Fα(, c) =
νZ.(P=csafeα()MAttrα(safeα(), Z, c)), where P=c=Pc\Pc+1.
Lemma 6 (See [17], Theorem 2). For even c, we have that MAttr(
safeα(),F(, c), c)safe()and MAttr(safeα(),F(, c), c)is an -
dominion. If cis odd then we have MAttr(safeα(),F(, c), c)safe()
and MAttr(safeα(),F(, c), c)is an -dominion.
Our simplified version of the psolB algorithm, here dubbed solBcomputes
fatal attractors for all priorities in descending order, accumulating and -
dominions and extending these dominions using a standard or -attractor.
This can be implemented using a simple loop over these priorities.
In line with the previous solvers, we can also modify this solver to employ
a safe monotone control predecessor, which uses a construction that is similar
On-The-Fly Solving for Symbolic Parity Games 147
in spirit to that of the safe control predecessor. Formally, we define the safe
monotone control predecessor as follows:
sMcpreα(, Z, U, c) = Pcspreα(, Z U)
The corresponding safe monotone α-attractor, denoted sMAttrα(, U, c), is de-
fined as follows: sMAttrα(, U, c) = µZ.sMcpreα(, Z, U, c). We define the safe
fatal attractor for priority cas the set Fα
s(, c) = νZ.(P=csMAttrα(, Z, c)).
Proposition 3. Let be an incomplete parity game. We have F
s(, c) =
F(, c)for even cand for odd cwe have F
s(, c) = F(, c).
Similar to algorithm solB, the algorithm solB
scomputes safe fatal attrac-
tors for priorities in descending order and collects the safe-α-attractor extended
dominions obtained this way.
5 Experimental Results
We experimentally evaluate the techniques of Section 4. For this, we use games
stemming from practical model checking and equivalence checking problems.
Our experiments are run, single-threaded, on an Intel Xeon 6136 CPU @ 3 GHz
PC. The sources for these experiments can be obtained from the downloadable
artefact [21].
5.1 Implementation
We have implemented a symbolic exploration technique for parity games in the
mCRL2 toolset [6]. Our tool exploits techniques such as read and write depen-
dencies [20,4], and uses sophisticated exploration strategies such as chaining and
saturation [9]. We use MDD-like data structures [25] called List Decision Dia-
grams (LDDs), and the corresponding Sylvan implementation [13], to represent
parity games symbolically. Sylvan also offers efficient implementations for set
operations and relational operations, such as predecessors, facilitating the im-
plementation of attractor computations, the described (partial) solvers, and a
full solver based on Zielonka’s recursive algorithm [31], which remains one of the
most competitive algorithms in practice, both explicitly and symbolically [28,12].
For the attractor set computation we have also implemented chaining to deter-
mine (multi-)step α-predecessors more efficiently.
For all three on-the-fly solving techniques of Section 4, we have implemented
1) a variant that runs the standard (partial) solver on the α-safe subgame and
removes the found dominion using the standard attractor (within that subgame),
and 2) a variant that uses (partial) solvers with the safe attractors. Moreover,
we also conduct experiments using the full solver running on an α-safe subgame.
An important design aspect is to decide how the exploration and the on-the-fly
solving should interleave. For this we have implemented a time based heuristic
that keeps track of the time spent on solving and exploration steps. The time
148 M. Laveaux, W. Wesselink and T.A.C. Willemse
measurements are used to ensure that (approximately) ten percent of total time
is spent on solving by delaying the next call to the solver. We do not terminate
the partial solver when it requires more time, and thus it is only approximate.
As a result of this heuristic, cheap solvers will be called more frequently than
more expensive (and more powerful) ones, which may cause the latter to explore
larger parts of the game graph.
5.2 Cases
Table 1provides an overview of the models and a description of the property
that is being checked. The properties are written in the modal µ-calculus with
data [15]. For the equivalence checking case we have mutated the original model
to introduce a defect. For each property, we indicate the nesting depth (ND) and
alternation depth [10] and whether the parity game is solitaire (Yes/No). The
nesting depth indicates how many different priorities occur in the resulting game;
for our encoding this is at most ND+2 (the additional ones encode constants
true and false’). The alternation depth is an indication of a game’s complexity
due to alternating priorities.
Table 1. Models and formulas.
Model Ref. Prop. Result ND AD Sol. Description
SWP [30] 1 false 1 1 Y No error transition
2false 3 3 N Infinitely often enabled then infinitely often taken
WMS [27] 1 false 1 1 Y Job failed to b e done
2false 1 1 Y No zombie jobs
3true 3 2 Y A job can become alive again infinitely often
4false 2 2 N Branching bisimulation with a mutation
BKE [3] 1 true 1 1 Y No secret leaked
2false 2 1 N No deadlock
CCP [26] 1 false 2 1 N No deadlo ck
2false 2 1 N After access there is always accessover possible
PDI n/a 1 true 2 1 N Controller reaches state before it can connect again
2false 2 1 N Connection impermissible can always happen or we
establish a connection
3false 3 1 N When connected move to not ready for connection and
do not establish a connection until it is allowed again
4true 2 1 N The interlocking moves to the state connection closed
before it is allowed to succesfully establish a connection
We use MODEL-ito indicate the parity game belonging to model MODEL
and property i. Models SWP, BKE and CCP are protocol specifications. The
model PDI is a specification of a EULYNX SCI-LX SySML interface model that
is used for a train interlocking system. Finally, WMS is the specification of a
workload management system used at CERN. Using tools in mCRL2 [6], we have
converted each model and property combination into a so-called parameterised
Boolean equation systems [16], a higher-level logic that can be used to represent
the underlying parity game.
Parity games SWP-1, WMS-1, WMS-2 and BKE-1 encode typical safety
properties where some action should not be possible. In terms of the alternation-
free modal mu-calculus with regular expressions, such properties are of the shape
On-The-Fly Solving for Symbolic Parity Games 149
[true.a]false. These properties are violated exactly when the vertex encoding
false can be reached. Parity games SWP-2, WMS-3 and WMS-4 are more
complex properties with alternating priorities, where WMS-4 encodes branching
bisimulation using the theory presented in [8]. The parity games BKE-2 and
CCP-1 encode a ‘no deadlock’ property given by a formula which states that
after every path there is at least one outgoing transition. Finally, CCP-2 and
all PDI cases contain formulas with multiple fixed points that yield games with
multiple priorities but no (dependent) alternation.
Table 2. Experiments with parity games where on-the-fly solving cannot terminate
early. All run times are in seconds. The number of vertices is given in millions. Memory
is given in gigabytes. Bold-faced numbers indicate the lowest value.
Game Strategy Vertices (106) Explore (s) Solve (s) Total (s) Mem (GB)
BKE-1 full 40 640 65 705 14
solitaire 40/40 629/615 153/100 782/715 15/15
cycles 40/40 635/644 149/160 785/804 15/15
fatal 40/40 624/625 152/164 776/789 15/15
partial 40 651 147 798 15
PDI-1 full 114 27 0.1 28 2
solitaire 114/114 28/27 4/0 33/28 2/2
cycles 114/114 29/28 7/7 36/35 2/2
fatal 114/114 28/28 4/7 32/35 2/2
partial 114 28 9 37 2
PDI-4 full 474 286 0 287 2
solitaire 474/474 284/281 46/14 331/295 2/2
cycles 474/474 284/287 92/91 376/378 2/2
fatal 474/474 285/283 80/91 365/374 2/2
partial 474 286 64 350 2
5.3 Results
In Tables 2and 3we compare the on-the-fly solving strategies presented in
Section 4. In the ‘Strategy’ column we indicate the on-the-fly solving strategy
that is used. Here full refers to a complete exploration followed by solving with
the Zielonka recursive algorithm. We use solitaire to refer to solitaire winning
cycle detection, cycles for forced winning cycle detection, fatal to refer to fatal
attractors and finally partial for on-the-fly solving with a Zielonka solver on safe
regions. For solvers with a standard variant and a variant that utilises the safe
attractors the first number indicates the result of applying the (standard) solver
on safe vertices, and the second number (following the slash /’) indicates the
result when using the solver that utilises safe attractors.
The column ‘Vertices’ indicates the number of vertices explored in the game.
In the next columns we indicate the time spent on exploring and solving specif-
ically and the total time in seconds. We exclude the initialisation time that is
common to all experiments. Finally, the last column indicates memory used by
the tool in gigabytes. We report the average of 5 runs and have set a timeout
(indicated by ) at 1200 seconds per run. Table 2contains all benchmarks that
require a full exploration of the game graph, providing an indication of the over-
150 M. Laveaux, W. Wesselink and T.A.C. Willemse
Table 3. Experiments with parity games in which at least one partial solver terminates
early. All run times are in seconds. The number of vertices is given in millions. For
solvers with two variants the first number indicates the result of applying the solver
on safe vertices, and following the slash / the result when using the solver that uses
safe attractors. Memory is given in gigabytes. Bold-faced numbers indicate the lowest
value.
Game Strategy Vertices (106) Explore (s) Solve (s) Total (s) Mem (GB)
SWP-1 full 13304 n/a
solitaire 15.1/0.4 8.5/1.4 27.3/0.1 35.8/1.5 2.8/1.5
cycles 25.2/0.9 12.3/1.8 42.7/1.0 55.0/2.8 3.2/1.5
fatal 15.1/0.4 9.0/1.3 29.4/0.4 38.4/1.7 3.1/1.5
partial 27.1 13.1 50.4 63.5 3.6
SWP-2 full 1987 n/a
solitaire 1631/1987 /163/11 / /
cycles 1774/1774 /154/91 / /
fatal 0.007/0.007 0.9/0.9 0.4/0.2 1.3/1.0 1.4/1.2
partial 0.007 0.9 0.4 1.3 1.4
WMS-1 full 270 2.8 0.4 3.3 0.2
solitaire 270/240 2.8/2.5 0.8/0.4 3.6/2.9 0.3/0.2
cycles 270/270 2.9/3.2 0.8/8.0 3.7/11.2 0.3/0.5
fatal 270/270 2.6/3.2 0.8/8.5 3.4/11.7 0.3/0.5
partial 270 2.7 0.8 3.5 0.3
WMS-2 full 317 3.3 0.3 3.6 0.2
solitaire 7/7 0.2/0.2 1.0/0.5 1.2/0.8 0.1/0.1
cycles 7/66 0.2/0.8 1.0/2.7 1.2/3.4 0.1/0.2
fatal 7/66 0.2/0.7 1.0/2.9 1.3/3.6 0.1/0.2
partial 7 0.2 1.1 1.3 0.1
WMS-3 full 317 2.6 0.1 2.7 0.2
solitaire 317/317 2.6/2.6 0.4/0.3 3.1/2.9 0.2/0.2
cycles 317/317 2.7/2.7 0.4/0.6 3.1/3.3 0.2/0.2
fatal 5/1 0.2/0.1 0.5/0.1 0.7/0.2 0.1/0.1
partial 5 0.2 0.3 0.5 0.1
WMS-4 full 366 n/a
solitaire 0.03/0.03 38/38 0.8/0.1 39/38 2/2
cycles 0.03/0.03 37/37 0.8/0.3 38/37 2/2
fatal 0.03/0.03 37/37 0.8/0.3 38/37 2/2
partial 0.03 37 0.7 38 2
BKE-2 full 119 942 36.5 979 28
solitaire 0.0007/0.0001 0.2/0.1 0.0/0.0 0.2/0.2 0.9/0.9
cycles 0.0007/0.0003 0.2/0.2 0.0/0.0 0.2/0.2 0.9/0.9
fatal 0.0007/0.0003 0.2/0.2 0.0/0.0 0.2/0.2 0.9/0.9
partial 0.0007 0.2 0.0 0.2 0.9
CCP-1 full 0.4 28 4.2 32 2
solitaire 0.003/0.003 1.0/1.0 0.1/0.1 1.1/1.1 2/2
cycles 0.003/0.003 1.0/1.0 0.1/0.1 1.1/1.1 2/2
fatal 0.006/0.003 1.3/1.1 0.1/0.1 1.4/1.2 1.5/1.5
partial 0.003 1.0 0.1 1.1 1.5
CCP-2 full 0.9 35 33 68 1.7
solitaire 0.02/0.007 1.6/1.1 0.2/0.0 1.8/1.1 1.5/1.5
cycles 0.02/0.007 1.9/1.1 0.2/0.1 2.1/1.2 1.5/1.5
fatal 0.02/0.007 1.6/1.2 0.2/0.1 1.8/1.3 1.5/1.5
partial 0.02 1.6 0.2 1.8 1.5
PDI-2 full 229 31 12 43 2
solitaire 229/229 33/32 34/12 67/45 2/2
cycles 30/30 15/14 3/5 17/19 2/2
fatal 30/30 15/15 3/5 18/19 2/2
partial 123 23 29 51 2
PDI-3 full 436 228 8 236 2
solitaire 436/436 230/228 36/32 266/260 2/2
cycles 78/162 65/102 19/64 84/166 2/2
fatal 75/84 64/67 19/23 83/90 2/2
partial 110 82 30 112 2
On-The-Fly Solving for Symbolic Parity Games 151
head in cases where this is unavoidable; Table 3contains all benchmarks where
at least one of the partial solvers allows exploration to terminate early.
For games SWP-1, WMS-1, WMS-2 in Table 3we find that solitaire, and in
particular the safe attractor variant, is able to determine the solution the fastest.
Also, for all entries in Table 2this is the solver with the least overhead. Next, we
observe that for cases such as WMS-1 and PDI-3 using the safe attractor variants
of the solvers can be detrimental. Our observation is that first computing safe
sets (especially using chaining) can be quick when most vertices are owned by
one player and one priority and the computation of the safe attractor, which uses
the more difficult safe control predecessor is more involved in such cases. There
are also cases WMS-3, WMS-4, CCP-1 and CCP-2 where the safe attractor
variants are faster and these cases all have multiple priorities. In cases where
these solvers are slow (for example PDI-3) we also observe that more states are
explored before termination, because the earlier mentioned time based heuristic
results in calling the solver significantly less frequently.
For parity games SWP-2 and WMS-3 only fatal and partial are able to find
a solution early, which shows that more powerful partial solvers can be useful.
From Table 2and the cases in which the safe attractor variants perform poorly
we learn that the partial solvers can, as expected, cause overhead. This overhead
is in our benchmarks on average 30 percent, but when it terminates early it can
be very beneficial, achieving speed-ups of up to several orders of magnitude.
6 Conclusion
In this work we have developed the theory to reason about on-the-fly solving
of parity games, independent of the strategy that is used to explore games. We
have introduced the notion of safe vertices, shown their correctness, proven an
optimality result, and we have studied partial solvers and shown that these can
be made to run without determining the safe vertices first; which can be useful
for on-the-fly solving. Finally, we have demonstrated the practical purpose of our
method and observed that solitaire winning cycle detection with safe attractors
is almost always beneficial with minimal overhead, but also that more powerful
partial solvers can be useful.
Based on our experiments, one can make an educated guess which partial
solver to select in particular cases; we believe that this selection could even be
steered by analysing the parameterised Boolean equation system representing the
parity game. It would furthermore be interesting to study (practical) improve-
ments for the safe attractors, and their use in Zielonka’s recursive algorithm.
Acknowledgements We would like to thank Jeroen Meijer and Tom van Dijk
for their help regarding the Sylvan library when implementing our prototype.
This work was supported by the TOP Grants research programme with project
number 612.001.751 (AVVA), which is (partly) financed by the Dutch Research
Council (NWO).
152 M. Laveaux, W. Wesselink and T.A.C. Willemse
References
1. Beer, I., Ben-David, S., Landver, A.: On-the-fly model checking of RCTL formulas.
In: Hu, A., Vardi, M. (eds.) CAV. LNCS, vol. 1427, pp. 184–194. Springer (1998).
https://doi.org/10.1007/BFb0028744
2. Benerecetti, M., Dell’Erba, D., Mogavero, F.: Solving parity games via
priority promotion. Formal Methods Syst. Des. 52(2), 193–226 (2018).
https://doi.org/10.1007/s10703-018-0315-1
3. Blom, S., Groote, J.F., Mauw, S., Serebrenik, A.: Analysing the BKE-security
protocol with µCRL. Electron. Notes Theor. Comput. Sci. 139(1), 49–90 (2005).
https://doi.org/10.1016/j.entcs.2005.09.005
4. Blom, S., van de Pol, J., Weber, M.: LTSmin: Distributed and symbolic reachability.
In: Touili, T., Cook, B., Jackson, P.B. (eds.) CAV. LNCS, vol. 6174, pp. 354–359.
Springer (2010). https://doi.org/10.1007/978-3-642-14295-6 31
5. Bryant, R.E.: Symbolic Boolean manipulation with ordered binary-
decision diagrams. ACM Comput. Surv. 24(3), 293–318 (1992).
https://doi.org/10.1145/136035.136043
6. Bunte, O., Groote, J.F., Keiren, J.J.A., Laveaux, M., Neele, T., de Vink, E.P.,
Wesselink, W., Wijs, A., Willemse, T.A.C.: The mCRL2 toolset for analysing
concurrent systems - improvements in expressivity and usability. In: Vojnar,
T., Zhang, L. (eds.) TACAS. LNCS, vol. 11428, pp. 21–39. Springer (2019).
https://doi.org/10.1007/978-3-030-17465-1 2
7. Calude, C.S., Jain, S., Khoussainov, B., Li, W., Stephan, F.: Deciding parity games
in quasipolynomial time. In: Hatami, H., McKenzie, P., King, V. (eds.) STOC. pp.
252–263. ACM (2017). https://doi.org/10.1145/3055399.3055409
8. Chen, T., Ploeger, B., van de Pol, J., Willemse, T.A.C.: Equivalence checking
for infinite systems using parameterized Boolean equation systems. In: Caires, L.,
Vasconcelos, V.T. (eds.) CONCUR. LNCS, vol. 4703, pp. 120–135. Springer (2007).
https://doi.org/10.1007/978-3-540-74407-8 9
9. Ciardo, G., Marmorstein, R.M., Siminiceanu, R.: The saturation algorithm for
symbolic state-space exploration. Int. J. Softw. Tools Technol. Transf. 8(1), 4–25
(2006). https://doi.org/10.1007/s10009-005-0188-7
10. Cleaveland, R., Klein, M., Steffen, B.: Faster model checking for the modal mu-
calculus. In: von Bochmann, G., Probst, D.K. (eds.) CAV. LNCS, vol. 663, pp.
410–422. Springer (1992). https://doi.org/10.1007/3-540-56496-9 32
11. Cranen, S., Luttik, B., Willemse, T.A.C.: Proof graphs for parameterised Boolean
equation systems. In: D’Argenio, P.R., Melgratti, H.C. (eds.) CONCUR. LNCS,
vol. 8052, pp. 470–484. Springer (2013). https://doi.org/10.1007/978-3-642-40184-
8 33
12. van Dijk, T.: Oink: An implementation and evaluation of modern parity game
solvers. In: Beyer, D., Huisman, M. (eds.) TACAS. LNCS, vol. 10805, pp. 291–308.
Springer (2018). https://doi.org/10.1007/978-3-319-89960-2 16
13. van Dijk, T., van de Pol, J.: Sylvan: multi-core framework for deci-
sion diagrams. Int. J. Softw. Tools Technol. Transf. 19(6), 675–696 (2017).
https://doi.org/10.1007/s10009-016-0433-2
14. Eir´ıksson, ´
A.T., McMillan, K.L.: Using formal verification/analysis methods on
the critical path in system design: A case study. In: Wolper, P. (ed.) CAV. LNCS,
vol. 939, pp. 367–380. Springer (1995). https://doi.org/10.1007/3-540-60045-0 63
15. Groote, J.F., Willemse, T.A.C.: Model-checking processes with data. Sci. Comput.
Program. 56(3), 251–273 (2005). https://doi.org/10.1016/j.scico.2004.08.002
On-The-Fly Solving for Symbolic Parity Games 153
16. Groote, J.F., Willemse, T.A.C.: Parameterised Boolean equation systems. Theor.
Comput. Sci. 343(3), 332–369 (2005). https://doi.org/10.1016/j.tcs.2005.06.016
17. Huth, M., Kuo, J.H., Piterman, N.: Fatal attractors in parity games. In:
Pfenning, F. (ed.) FOSSACS. LNCS, vol. 7794, pp. 34–49. Springer (2013).
https://doi.org/10.1007/978-3-642-37075-5 3
18. Jurdzi´nski, M., Lazi´c, R.: Succinct progress measures for solving
parity games. In: LICS. pp. 1–9. IEEE Computer Society (2017).
https://doi.org/10.1109/LICS.2017.8005092
19. Kant, G., van de Pol, J.: Efficient instantiation of parameterised
Boolean equation systems to parity games. In: Wijs, A., Bosnacki, D.,
Edelkamp, S. (eds.) GRAPHITE. EPTCS, vol. 99, pp. 50–65 (2012).
https://doi.org/10.4204/EPTCS.99.7
20. Kant, G., van de Pol, J.: Generating and solving symbolic parity games. In:
Bosnacki, D., Edelkamp, S., Lluch-Lafuente, A., Wijs, A. (eds.) GRAPHITE.
EPTCS, vol. 159, pp. 2–14 (2014). https://doi.org/10.4204/EPTCS.159.2
21. Laveaux, M.: Downloadable sources for the case study (2022).
https://doi.org/10.5281/zenodo.5896966
22. Laveaux, M., Wesselink, W., Willemse, T.A.C.: On-the-fly solving for symbolic
parity games. CoRR abs/2201.09607 (2022), https://arxiv.org/abs/2201.09607
23. Mateescu, R., Sighireanu, M.: Efficient on-the-fly model-checking for regular
alternation-free mu-calculus. Sci. Comput. Program. 46(3), 255–281 (2003).
https://doi.org/10.1016/S0167-6423(02)00094-1
24. McNaughton, R.: Infinite games played on finite graphs. Ann. Pure Appl. Logic
65(2), 149–184 (1993). https://doi.org/10.1016/0168-0072(93)90036-D
25. Miller, D.M.: Multiple-valued logic design tools. In: ISMVL. pp. 2–11. IEEE Com-
puter Society (1993). https://doi.org/10.1109/ISMVL.1993.289589
26. Pang, J., Fokkink, W.J., Hofman, R.F.H., Veldema, R.: Model checking a cache
coherence protocol of a java DSM implementation. J. Log. Algebraic Methods
Program. 71(1), 1–43 (2007). https://doi.org/10.1016/j.jlap.2006.08.007
27. Remenska, D., Willemse, T.A.C., Verstoep, K., Templon, J., Bal, H.E.:
Using model checking to analyze the system behavior of the LHC
production grid. Future Gener. Comput. Syst. 29(8), 2239–2251 (2013).
https://doi.org/10.1016/j.future.2013.06.004
28. Sanchez, L., Wesselink, W., Willemse, T.A.C.: A comparison of BDD-based par-
ity game solvers. In: Orlandini, A., Zimmermann, M. (eds.) GandALF. EPTCS,
vol. 277, pp. 103–117 (2018). https://doi.org/10.4204/EPTCS.277.8
29. Stasio, A.D., Murano, A., Vardi, M.Y.: Solving parity games: Explicit vs symbolic.
In: ampeanu, C. (ed.) CIAA. LNCS, vol. 10977, pp. 159–172. Springer (2018).
https://doi.org/10.1007/978-3-319-94812-6 14
30. Tanenbaum, A.S., Wetherall, D.: Computer networks, 5th Edition. Pearson (2011),
https://www.worldcat.org/oclc/698581231
31. Zielonka, W.: Infinite games on finitely coloured graphs with applications to
automata on infinite trees. Theor. Comput. Sci. 200(1-2), 135–183 (1998).
https://doi.org/10.1016/S0304-3975(98)00009-7
154 M. Laveaux, W. Wesselink and T.A.C. Willemse
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
On-The-Fly Solving for Symbolic Parity Games 155
Equivalence Checking
Distributed Coalgebraic Partition Refinement
Fabian Birkmann , Hans-Peter Deifel,? , and Stefan Milius??
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
{fabian.birkmann,hans-peter.deifel,stefan.milius}@fau.de
Abstract.
Partition refinement is a method for minimizing automata
and transition systems of various types. Recently we have developed a
partition refinement algorithm and the tool CoPaR that is generic in the
transition type of the input system and matches the theoretical run time
of the best known algorithms for many concrete system types. Genericity
is achieved by modelling transition types as functors on sets and systems
as coalgebras. Experimentation has shown that memory consumption is
a bottleneck for handling systems with a large state space, while running
times are fast. We have therefore extended an algorithm due to Blom
and Orzan, which is suitable for a distributed implementation to the
coalgebraic level of genericity, and implemented it in CoPaR. Experiments
show that this allows to handle much larger state spaces. Running times
are low in most experiments, but there is a significant penalty for some.
1Introduction
Minimization is an important and basic algorithmic task on state-based systems,
concerned with reducing the state space as much as possible while retaining
the system’s behaviour. It is used for equivalence checking of systems and as a
subtask in model checking tools in order to handle larger state spaces and thus
mitigate the state-explosion problem.
We focus on the task of identifying behaviourally equivalent states modulo
bisimilarity. For classic labelled transitions systems this notion obeys the principle
‘states
s
and
t
are bisimilar if for every transition
sa
s0
, there exists a transition
ta
t0
with
s0
and
t0
bisimilar’, and symmetrically for transitions from
t
.
Bisimilarity is a rather fine-grained branching-time notion of equivalence (cf. [17]);
it is widely used and preserves all properties expressible as
µ
-calculus formulas.
Moreover, it has been generalized to yield equivalence notions for many other
types of state-based systems and automata.
Due to the above principle, bisimilarity is defined by a fixed point, to be
understood as a greatest fixed point and is hence approximable from above.
This is used by partition refinement algorithms: The initial partition considers
all states tentatively equivalent is then iteratively refined using observations
?
Supported by the Deutsche Forschungsgemeinschaft (DFG) within the Re-
search and Training Group 2475 “Cybercrime and Forensic Computing”
(393541319/GRK2475/1-2019)
?? Supported by Deutsche Forschungsgemeinschaft (DFG) under project MI 717/7-1.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 159–177, 2022.
https://doi.org/10.1007/978-3-030-99527-0_9
about the states until a fixed point is reached. Consequently, such procedures
run in polynomial time and can also be efficiently implemented, in contrast to
coarser system equivalences such as trace equivalence and language equivalence
of nondeterministic systems which are PSPACE-complete [23]. This makes mini-
mization under bisimilarity interesting even in cases where the main equivalence
is linear-time, such as for automata.
Efficient partition refinement algorithms exist for various systems: Kanellakis
and Smolka provide a minimization algorithm with run time
O(m·n)
for labelled
transition systems with
n
states and
m
transitions. Even faster algorithms have
been developed over the past 50 years for many types of systems. For example,
Hopcroft’s algorithm for minimizing deterministic automata has run time in
O(n·log n)
[21]; it was later generalized to variable input alphabets, with run time
O(n·|A| · log n)
[18,24]. The Paige-Tarjan algorithm minimizes transition systems
in time
O((m+n)·log n)
[31], and generalizations to labelled transition systems
have the same time complexity [13,22,36]. For the minimization of weighted
systems (a.k.a. lumping), Valmari and Franchescini [38] have developed a simple
O((m+n)·log n)
algorithm for systems with rational weights. Buchholz [10] gave
an algorithm for weighted automata, and Högberg et al. [20] one for (bottom-up)
weighted trees automata, both with run time in O(m·n).
In previous work [16,42], we have provided an efficient partition refinement
algorithm, which is generic in the system type, captures all the above system
types, and matches or, in some cases even improves on the run time complexity of
the respective specialized algorithms. Subsequently, we have shown how to extend
the generic complexity analysis to weighted tree automata and implemented the
algorithm in the tool CoPaR [11,41], again matching the previous best run time
complexity and improving it in the case of weighted tree automata with weights
from a non-cancellative monoid. The algorithm is based on ideas of Paige and
Tarjan, which leads to its efficiency. Genericity is achieved by modelling state
based systems as coalgebras, following the paradigm of universal coalgebra [34],
in which the transitions structure of systems is encapsulated by a set functor.
The algorithm and tool are modular in the sense that functors can be built
from a preimplemented set of basic functors by standard set constructions such
as cartesian product, disjoint union and functor composition. The tool then
automatically derives a parser for input coalgebras of the composed type and
provides a corresponding partition refinement implementation off the shelf. In
addition, new basic functors
F
may easily be added to the set of basic functors by
implementing a simple refinement interface for them plus a parser for encoded
F
-
coalgebras. Our experiments with the tool have shown that run time scales
well with the size of systems. However, memory usage becomes a bottleneck
with growing system size, a problem that has previously also been observed by
Valmari [37] for partition refinement. One strategy to address this is to distribute
the algorithm across multiple computers, which store and process only a part
of the state space and communicate via message passing. For ordinary labelled
transition systems and Markov systems this has been investigated in a series
of papers by Blom and Orzan [4
9] who were also motivated to mitigate the
memory bottleneck of sequential partition refinement algorithms.
160 F. Birkmann, H.-P. Deifel, S. Milius
Our contribution in this paper is an extension of CoPaR by an efficient dis-
tributed partition algorithm in coalgebraic generality. Like in Blom and Orzan’s
work, our algorithm is a distributed version of a simple but effective algorithm
called “the naive method” [23], or “the final chain algorithm” in coalgebraic
generality [25,42]. We first generalize signature refinement introduced by Blom
and Orzan to the level of coalgebras. We also combine generalized signatures (Sec-
tion 3) with the previous encodings of set functors and their coalgebras
[11,41]
via
the new notion of a signature interface (Definition 3.1). This is a key idea to make
coalgebraic signature refinement and the final chain algorithm implementable in a
tool like CoPaR. In addition, we demonstrate how signature interfaces of functors
can be combined (Construction 3.3and Proposition 3.4) along standard functor
constructions. This yields a similar modularity principle than for the previous
sequential algorithm. However, this is a new feature for signature refinement
and also, to our knowledge, for the final chain algorithm. Consequently, our
distributed, modular and generic implementation of the final chain algorithm is
new (already as sequential algorithm).
We also provide experiments demonstrating its scalability and show that much
larger state spaces can indeed be handled. Our benchmarks include weighted tree
automata for non-cancellative monoids, a type of system for which our previous
sequential implementation is heavily limited by its memory requirements. For
those systems the running times of the distributed algorithm are even faster then
those of the sequential algorithm. In a second set of benchmarks stemming from
the PRISM benchmark suite [27] we again show that larger systems can now be
handled; however, for some of these there is a penalty in run time.
Related work. Balcazar et al. [1] have proved that the problem of bisimilarity
checking for labelled transition systems is
P
-complete, which implies that it is
hard to parallelize efficiently. Nevertheless, parallel algorithms have been proposed
by Rajasekaran and Lee [33]. These are designed for shared memory machines
and hence do not distribute RAM requirements over multiple machines.
Symbolic techniques are an orthogonal approach to reduce memory usage of
partition refinement algorithms and have been explored e.g. by Wimmer et al. [40]
and van Dijk and de Pol [15].
Two other orthogonal extensions of the generic coalgebraic minimization and
CoPaR have been presented in recent work. First a non-trivial extension computes
(1) reachable states and (2) the transition structure of the minimized systems [12].
Second, Wißmann et al. [43] have shown how to compute distinguishing formulas
in a Hennessy-Milner style logic for a pair of behaviourally inequivalent states.
2Preliminaries
Our algorithmic framework and the tool CoPaR [41,42] are based on modelling
state-based systems abstractly as coalgebras for a (set) functor that encapsulates
the transition type, following the paradigm of universal coalgebra [34]. We now
recall some standard notations for sets and maps and basic notions and examples
in coalgebra. We fix a singleton set
1 = {∗}
; for every set
X
we have a unique
map
!: X1
and the identity map
idX:XX
. We denote composition of
Distributed Coalgebraic Partition Refinement 161
maps by
()·()
, in applicative order. Given maps
f:XA
,
g:XB
we
define
hf, gi:XA×B
by
hf, gi(x)=(f(x), g(x))
. The type of transitions of
states in a system is modelled by a set functor
F
. Informally,
F
assigns to every
set
X
a set
F X
of structured collections of elements of
X
, and an
F
-coalgebra is
a map
c:SF S
which assigns to every state
sS
in a system a structured
collection
c(s)F S
of successor states of
s
. The functor
F
also determines a
canonical notion of behavioural equivalence of states of a coalgebra; this arises
by stipulating that morphisms of coalgebras are behaviour preserving maps.
Definition 2.1.
Afunctor
F:Set Set
assigns to each set
X
a set
F X
and
to each map
f:XY
a map
F f :F X F Y
, preserving identities and
composition (
FidX=idF X
,
F(g·f) = F g ·F f
). An
F
-coalgebra
(S, c)
consists of
a set
S
of states and a transition structure
c:SF S
. A morphism
h: (S, c)
(S0, c0)
of
F
-coalgebras is a map
h:SS0
that preserves the transition structure,
i.e.
F h ·c=c0·h
. Two states
s, t S
of a coalgebra
c:SF S
are behavioural ly
equivalent (st) if there exists a coalgebra morphism hwith h(s) = h(t).
Example 2.2.
We mention several types of systems which are instances of the
general notion of coalgebra and the ensuing notion of behavioural equivalence.
All these are possible input systems for our tool CoPaR.
(1)
Transition systems. The finite powerset functor
Pω
maps a set
X
to the
set
PωX
of all finite subsets of
X
, and a map
f:XY
to the map
Pωf=
f[]: PωX PωY
taking direct images. Coalgebras for
Pω
are finitely branching
(unlabelled) transition systems. Two states are behaviourally equivalent iff they
are (strongly) bisimilar in the sense of Milner [29,30] and Park [32]. Similarly,
finitely branching labelled transition systems with label alphabet
A
are coalgebras
for the functor F X =Pω(A×X).
(2)
Deterministic automata. For an input alphabet
A
, the functor given by
F X = 2 ×XA
, where
2 = {0,1}
, sends a set
X
to the set of pairs of boolean
values and functions
AX
. An
F
-coalgebra
(S, c)
is a deterministic automaton
(without an initial state). For each state
sS
, the first component of
c(s)
determines whether
s
is a final state, and the second component is the successor
function
AS
mapping each input letter
aA
to the successor state of
s
under input letter
a
. States
s, t S
are behaviourally equivalent iff they accept
the same language in the usual sense.
(3)
Weighted tree automata simultaneously generalize tree automata and weight-
ed (word) automata. Inputs of such automata stem from a finite signature
Σ
,
i.e. a finite set of input symbols, each with a prescribed natural number, its
arity.Weights are taken from a commutative monoid
(M, +,0)
. A (bottom-up)
weighted tree automaton (WTA) (over
M
with inputs from
Σ
) consists of a finite
set
S
of states, an output map
f:SM
, and for each
k0
, a transition map
µk:ΣkMSk×S
, where
Σk
denotes the set of
k
-ary input symbols in
Σ
; the
maximum arity of symbols in Σis called the rank.
Every signature
Σ
gives rise to its associated polynomial functor, also de-
noted
Σ
, which assigns to a set
X
the set
`nNΣn×Xn
, where
`
denotes disjoint
union (coproduct). Further, for a given monoid
(M, +,0)
the monoid-valued func-
tor
M()
sends a set
X
to the set of maps
f:XM
that are finitely supported,
162 F. Birkmann, H.-P. Deifel, S. Milius
i.e.
f(x)=0
for almost all
xX
. Given a map
f:XY
,
M(f):M(X)M(Y)
sends a map
v:XM
in
M(X)
to the map
y7→ PxX,f (x)=yv(x)
, correspond-
ing to the standard image measure construction.
Weighted tree automata are coalgebras for the composite functor
F X =
M×M(ΣX )
; indeed, given a coalgebra
c=hc1, c2i:SM×M(ΣS)
, its first
component
c1
is the output map, and the second component
c2
is equivalent to
the family of transitions maps µkdescribed above.
As proven by Wißmann et al. [41, Prop. 6.6], the coalgebraic behavioural
equivalence is precisely backward bisimulation of weighted tree automata as
introduced by Högberg et al. [20, Def. 16].
(4)
The bag functor
B:Set Set
sends a set
X
to the set of all finite multisets
(or bags) over
X
. This is the special case of the monoid-valued functor for the
monoid
(N,+,0)
. Accordingly,
B
-coalgebras are weighted transition systems
with positive integers as weights, or they may be regarded as finitely branching
transition systems where multiple transitions between a pair of states are allowed.
Behavioural equivalence coincides with weighted (or strong) bisimilarity.
(5)
Markov chains. The finite distribution functor
Dω
is a subfunctor of the
monoid-valued functor
R()
for the usual monoid of addition on the real numbers.
It maps a set
X
to the set of all finite probability distributions on
X
. That
means that
DωX
is the set of all finitely supported maps
d:X[0,1]
such that
PxXd(x)=1. The action of Dωon maps is the same as that of R().
As shown by Rutten and de Vink [35], coalgebras
c:S(DωS+ 1)A
are
precisely Larsen and Skou’s probabilistic transition systems [28] (aka. labelled
Markov chains [14]) with the label alphabet
A
. In fact, for each state
sS
and action label
aA
, that state either cannot perform an
a
-action (when
c(s)(a)1
) or the distribution
c(s)(a)
determines for every state
tC
the
probability with which stransitions to twith an a-action.
Coalgebraic behavioural equivalence is precisely probabilistic bisimilarity in
the sense of Larsen and Skou, see Rutten and de Vink [35, Cor. 4.7].
(6)
Markov decision processes are systems which feature both non-deterministic
and probabilistic branching. They are coalgebras for composite functors such as
Pω(A× Dω())
or
Pω(Dω(A×())
(simple/general Segala systems); Bartels et
al. [2] list further functors for various species of probabilistic systems.
Encodings. To supply coalgebras as inputs to CoPaR and in order to speak
about the size of a coalgebra in terms of states and transitions, we need
Definition 2.3
[12, Def. 3.1]. An encoding of a set functor
F
consists of a
set
A
of labels and a family of maps
[X:F X B(A×X)
, one for every set
X
,
such that the map hF!, [Xi:F X F1× B(A×X)is injective.
The encoding of a coalgebra
c:SF S
is
hF!, [Si · c:SF1× B(A×S)
.
For
sS
we write
sa
t
whenever
(a, t)
is contained in the bag
[S(c(s))
. The
number of states and edges of a given encoded input coalgebra are
n=|S|
and
m=PsS|[S(c(s))|, respectively, where |b|=PxXb(x)for a bag b:XN.
An encoding of a set functor
F
specifies how
F
-coalgebras are represented as
directed graphs, and the required injectivity ensures that different coalgebras
have different encodings.
Distributed Coalgebraic Partition Refinement 163
Example 2.4.
We recall a few key examples of encodings used by CoPaR [42];
for the required injectivity, see [12, Prop. 3.3].
(1)
For the finite powerset functor
Pω
one takes a singleton label set
A= 1
and
[X:PωX B(1 ×X)is the obvious inclusion: [X(U)(, x)=1iff xUX.
(2)
For the monoid-valued functor
M()
we take labels
A=M
, and the map
[X:M(X) B(M×X)is given by [X(t)(m, x)=1if t(x) = m6= 0 and 0else.
(3)
As a special case, the bag functor
B
has labels
A=N
, and the map
[X:BX B(N×X)is given by [X(t)(n, x)=1if t(x) = nand 0else.
Remark 2.5.(1)
Readers familiar with category theory may wonder about the
naturality of encodings
[X
. It turns out [12] that in almost all instances, our
encodings are not natural transformations, except for polynomial functors. As
shown in op. cit., all our encodings satisfy a property called uniformity, which
implies that they are subnatural transformations [12, Prop. 3.15].
(2)
Having an encoding of a set functor
F
does not imply a reduction of the
problem of minimizing
F
-coalgebras to that of coalgebras for
B(A× )
. In fact,
the behavioural equivalence of
F
-coalgebras and coalgebras for
B(A× )
may
be very different unless [Xis natural, which is not the case for most encodings.
Functors in CoPaR can be combined by product, coproduct or composition,
leading to modularity. But in order to automatically handle combined functors,
our tool crucially depends on the ability to form products and coproducts of
encodings [41,42]. We refrain from going into technical details, but note for
further use that given a pair of functors
F1, F2
with encodings
Ai, [X,i
one
obtains encodings for the functors
F1×F2
(cartesian product) and
F1+F2
(disjoint union) with the label set A=A1+A2.
Input syntax and processing. We briefly recall the input format of CoPaR
and how inputs are processed; for more details see [41, Sec. 3.1]. CoPaR accepts
input files representing a finite
F
-coalgebra. The first line of an input file specifies
the functor Fwhich is written as a term according to the following grammar:
T::= X| PωT| B T| DωT|M(T)|Σ
Σ::= C|T+T|T×T|TAC::= N|A A ::= {s1, . . . , sn} | n, (1)
where
nN
denotes the set
{0, . . . , n 1}
, the
sk
are strings subject to the usual
conventions for variable names (a letter or an underscore character followed by
alphanumeric characters or underscore), exponents
FA
are written
F^A
, and
M
is one of the monoids
(Z,+,0)
,
(R,+,0)
,
(C,+,0)
,
(Pω(64),,)
(the monoid
of
64
-bit words with bitwise
or
), and
(N,max,0)
(the additive monoid of the
tropical semiring). Note that
C
effectively ranges over at most countable sets,
and
A
over finite sets. A term
T
determines a functor
F:Set Set
in the evident
way, with Xinterpreted as the argument.
The remaining lines of an input file specify a finite coalgebra
c:SF S
. Each
line has the form
s:t
for a state
sS
, and
t
represents the element
c(s)F S
.
The syntax for
t
depends on the specified functor
F
and follows the structure of
164 F. Birkmann, H.-P. Deifel, S. Milius
DX
q: {p: 0.5, r: 0.5}
p: {q: 0.4, r: 0.6}
r: {r: 1}
q
p
r
1
2
1
2
2
5
3
5
1
(a) Markov chain
{f,n} x X^{a,b}
q: (n, {a: p, b: r})
p: (n, {a: q, b: r})
r: (f, {a: q, b: p})
q
p
r
ab
ab
a
b
(b) Deterministic finite automaton
Fig. 1: Examples of input files with encoded coalgebras [41]
the term
T
defining
F
; the details are explained in [41, Sec. 3.1.2]. Fig. 1from
op. cit. shows two coalgebras and the corresponding input files.
After reading the functor term
T
,CoPaR builds a parser for the functor-
specific input format and then parses the input coalgebra given in that format
into an intermediate format which internally represents the encoding of the
input coalgebra (Definition 2.3). For composite functors the parsed coalgebra
then undergoes a substantial amount of preprocessing, which also affects how
transitions are counted; see [41, Sec. 3.5] for more details.
3Coalgebraic Partition Refinement
As mentioned in the introduction, the sequential partition refinement algorithm
previously implemented in CoPaR is based on ideas used in the Paige-Tarjan
algorithm [31] for transition systems. However, as has been mentioned by Blom
and Orzan [8], the Paige-Tarjan algorithm carefully selects the block of states to
split in each iteration, and the data structures used for this selection take a lot of
memory and require modification to allow a distributed implementation. Hence,
Blom and Orzan have built their distributed algorithm from a rather simple
sequential partition refinement algorithm based on what Kanellakis and Smolka
refer to as the naive method [23]. We now recall this algorithm and subsequently
show how it can be adapted to the coalgebraic level of generality.
Signature Refinement. Given a finite labelled transition system with the state
set
S
, a partition on
S
may be presented by a function
π:SN
, i.e. two states
s, t S
lie in the same block of the partition iff
π(s) = π(t)
. The signature of a
state sSis the set of outgoing transitions to blocks of π:
sigπ(s) = {(a, π(t)) |sa
t}⊆Pω(A×N).(2)
Asignature refinement step then refines
π
by putting
s, t S
into different blocks
iff
sigπ(s)6=sigπ(t)
. Concretely, we put
πnew(s) = hash(sigπ(s))
using a perfect,
deterministic hash function
hash
. The signature refinement algorithm (Fig. 2)
starts with a trivial initial partition on
S
and repeats the refinement step until
the partition stabilizes, i.e. until two subsequent partitions have the same size.
Coalgebraic Signature Refinement. Regarding a labelled transition system
as a coalgebra
c:S Pω(A×S)
(Example 2.2(1)), signatures are obtained by
postcomposing the transition structure with the partition under the functor:
sigπ=Sc
Pω(A×S)Pω(A×π)
Pω(A×N).(3)
Distributed Coalgebraic Partition Refinement 165
Variables : old and new partitions represented by π, πnew :SNwith sizes
l, lnew, resp.; set Hfor counting block numbers;
1foreach sSdo
2πnew(s)0;
3end
4lnew 1;
5while l6=lnew do
6ππnew, H ;
7foreach sSdo
8πnew(s)hash(sigπ(s));
9HH {πnew(s)};
10 end
11 llnew;
12 lnew |H|;
13 end
Fig. 2: Signature refinement for labelled transition systems
The generalisation to coalgebras for arbitrary
F
is immediate: the signature
of a state of an
F
-coalgebra
c:SF S
w.r.t. a partition
π
is given by the
function
sigπ=F π ·c
. In the refinement step of the above algorithm two states
are identified by the next partition if they have the same signatures currently:
πnew(s) = πnew (t) sigπ(s) = sigπ(t) (F π)(c(s)) = (F π )(c(t)).(4)
Hence, the algorithm in fact simply applies
F()·c
to the initial partition
corresponding to the trivial quotient
!: S1
until stability is reached. Note that
this is precisely the Final Chain Algorithm by König and Küpper [25, Alg. 3.2]
computing behavioural equivalence of a given
F
-coalgebra. Its correctness thus
proves correctness of the coalgebraic signature refinement which is the algorithm
in Fig. 2with
sigπ=F π ·c
. Since we represent functors and their coalgebras by
encodings we use an interface to Fto compute signatures based on encodings.
Definition 3.1.
Given a functor
F
with encoding
A, [X
, a signature interface
consists of a function
sig :F1× B(A×N)FN
such that for every finite set
S
and every partition π:SNwe have
F π =F S hF!,[Si
F1× B(A×S)F1×B(A×π)
F1× B(A×N)sig
FN.(5)
Given a coalgebra
c:SF S
, a state
sS
and a partition
π:SN
, the two
arguments of
sig
should be understood as follows. The first argument is the value
F!(c(s)) F1
, which intuitively provides an observable output of the state
s
.
The second argument is the bag
B(A×π)([S(c(s))
formed by those pairs
(a, n)
of labels
a
and numbers
n
of blocks of the partition
π
to which
s
has an edge;
that is, that bag contains one pair
(a, n)
for each edge
sa
s0
where
π(s0) = n
.
Thus, when supplied with these inputs,
sig
correctly computes the signature of
s
;
indeed, to see this, precompose equation (5) with the coalgebra structure c.
Example 3.2.(1)
The constant functor
!C
has the label set
A=
, so we have
B( × N)
=1
, and we define the function
sig :C× B( × N)C
by
sig(c, ) = c
.
166 F. Birkmann, H.-P. Deifel, S. Milius
(2)
The powerset functor
Pω
has the label set
A= 1
, and we define the function
sig :Pω1× B(1 ×N) PωNby sig(z, b) = {n:b(, n)6= 0}.
(3)
The monoid-valued functor
R()
has the label set
A=R
, and we define the
function sig :R× B(R×N)R(N)by sig(z, b)(n) = Σ{r|b(r, n)6= 0}.
Next we show how signature interfaces can be combined by products (
×
) and
coproducts (
+
). This is the key to the modularity of the implementation (be it
distributed or sequential) of the coalgebraic signature refinement in CoPaR.
Construction 3.3.
Given a pair of functors
F1, F2
with encodings
Ai, [X,i
and
signature interfaces
sigi
, we put
A=A1+A2
and define the following functions:
(1)
for the product functor
F=F1×F2
we take
sig :F1×B(A×N)F1N×F2N,
sig(t, b) = sig1(pr1(t),filter1(b)),sig2(pr2(t),filter2(b)).
Here,
pri:F1Fi1
is the projection map and
filteri:B(A×N) B(Ai×N)
is
given by filteri(b)(a, n) = b(inia, n), where ini:FiNFNis the injection map.
(2) for the coproduct functor F=F1+F2we take
sig :F1× B(A×N)F1N+F2N,sig(init, b) = ini(sigi(t, filteri(b))).
Proposition 3.4.
The functions
sig
defined in Construction 3.3yield signature
interfaces for the functors F1×F2and F1+F2, respectively.
As a consequence of this result, it suffices to implement signature interfaces
only for basic functors according to the grammar in
(1)
, i.e. the trivial identity
and constant functors as well as the functors
Pω
,
B
,
Dω
and the supported
monoid-valued functors
M()
. Signature interfaces of products, coproducts and
exponents, being a special form of product, are derived using Construction 3.3.
Functor composition can be reduced to these constructions by a technique
called desorting [42, Sec. 8.2], which transforms a coalgebra of a composite functor
into a coalgebra for a coproduct of basic functors whose signature interfaces can
then be combined by
+
(see also [41, Sec. 3.5]). As for the previous Paige-Tarjan
style algorithm, this leads to the modularity in the functor of the coalgebraic
signature refinement algorithm: signature interfaces for composed functors are
automatically derived in CoPaR. Moreover, a new basic functor
F
may be added
by implementing a signature interface for
F
, effectively extending the grammar
of supported functors in (1) by a clause F T .
4The Distributed Algorithm
Our distributed algorithm for coalgebraic signature refinement is a generalization
of Blom and Orzan’s original algorithm [8] to coalgebras. We highlight differences
to op. cit. at the end of this section.
We assume a distributed high-bandwidth cluster of
W
workers
w1, . . . , wW
that is failure-free, i.e. nodes do not crash, messages do not get lost and between
two nodes the order of messages is preserved. The communication is based on
non-blocking send operations and blocking receive operations. Messages are triples
of the form
(from,to,data)
, where the
data
field may be structured and will often
contain a tag to simplify interpretation.
Distributed Coalgebraic Partition Refinement 167
Description. The distributed algorithm is based on the sequential algorithm
presented in Fig. 2, using a distributed hashtable to keep track of the partition.
As for the sequential algorithm, the input consists of an
F
-coalgebra
(S, c)
with
|S|=n
states. We split the state space evenly among the workers as a
preprocessing step. We write
Si
with
|Si|=n/W
for the set of states of worker
wi
.
The input for worker
wi
is the encoding of that part of the transition structure of
the input coalgebra which is needed to compute the signatures of the states in
Si
.
This information is presented to
wi
as the list of all outgoing edges of states of
Si
in the encoding of the coalgebra
(S, c)
, i.e. the list of all
sa
t
with
sSi
(cf. Definition 2.3). We refer to the block number
π(s)
of a state
sS
as its ID.
After processing the input, the algorithm runs in two phases. In the Initializa-
tion Phase (Fig. 3) the workers exchange update demands about the IDs stored
in the distributed hashtable. If
wi
has an edge
sa
s0
into some state
s0
of
wj
,
then during refinement
wi
needs to be kept up to date about the ID of
s0
and thus
instructs
wj
to do so. Worker
wj
remembers this information by storing
wi
in
the set
Ins0={wi| sSi, a A. s a
s0}
of incoming edges of
s0
(lines 1416).
Hence, for each edge
sa
s0
with
sSi
and
s0Sj
, worker
wi
sends a message
to wj, informing wjto add wito Ins0(lines 58).
Variables : Set Vof visited states; process count d;
for each sSia list Insof workers with an edge into s
1V , d 0;
2foreach sSido
3Ins[];
4end
5foreach edge ss0of wiwith
s06∈ Vdo
6VV {s0};
7send(wi, wj, s0);
8end
9foreach 1jWdo
10 send(wi, wj,DONE);
11 end
12 waitFor(d=W);
13 return([Ins|sSi]);
14 on receive (wk, wi, s)do
15 Ins(wk:: Ins);
16 end
17 on receive (_,_,DONE)do
18 dd+ 1;
19 end
Fig. 3: Initialization Phase of worker wi
The main phase is the Refinement Phase (Fig. 4), mimicking the refinement
loop of the undistributed algorithm. In each iteration all workers compute their
part of the new partition, i.e. the IDs
hs=hash(sigπ(s))
for each of their states
sSi
(line 5). In addition, every worker
wi
is responsible for sending the
computed ID of
sSi
to workers in
Ins
that need it for computation of their
own signatures in the next iteration (lines 69). The IDs are also sent to a
designated worker
counterOf(hs)
(lines 1012). This ensures that IDs are counted
precisely once at the end of the round when the partition size is computed after
all messages have been received (lines 1417). The actual counting (line 19) is a
168 F. Birkmann, H.-P. Deifel, S. Milius
Variables : Old, respectively new partitions π, πnew with sizes l, lnew ;
finished workers d;ID-counting set H;
1πnew 0!, l 1, lnew 0, H ;
2while l6=lnew do
3llnew, π πnew;
4foreach sSido
5πnew(s)hash(sigπ(s));
6foreach wjInsdo
7send(wi, wj,
8hUPD, s, πnew(s)i);
9end
10 send(wi,
11 counterOf(πnew(s)),
12 hCOUNT, πnew(s)i);
13 end
14 foreach 1jWdo
15 send(wi, wj,DONE);
16 end
17 waitFor(d=W);
18 llnew;
19 lnew distribSum(sizeOf(H));
20 synchronize;
21 end
22 on receive
(wk, wi,(UPD, s, hs)) do
23 πnew(s)hs;
24 end
25 on receive
(wk, wi,(COUNT, hs)) do
26 HH {hs};
27 end
28 on receive (_, wi,DONE)do
29 dd+ 1;
30 end
Fig. 4: Refinement Phase of worker wi
primitive operation in the MPI library, for an explicit
O(log W)
algorithm using
messages see e.g. Blom and Orzan [8, Fig. 6]. Finally, the workers synchronize
before starting the next iteration (line 20). The refinement phase stops if two
consecutive partitions have the same size (line 2).
Correctness. The Initialization Phase (Fig. 3) terminates since every worker
reaches line 10, sends DONE to all workers and thus also receives it
(lines 1719)
a total of
W
times, allowing it to progress past line 12. An analogous argument
proves termination of every iteration of the Refinement Phase (Fig. 4). The
sequential algorithm is correct, hence we know the loop of the refinement phase
terminates when all IDs are computed and counted correctly, since then the
distributed and the sequential algorithm compute precisely the same partitions.
To show that the signatures are computed correctly, we note that if all DONE
messages have been received in a round, then, by order-preservation of messages,
all messages sent previously in this round have also been received. This ensures
that no workers are missing from the lists
Ins
computed in the Initialization Phase
and that during the Refinement Phase new IDs are sent to all concerned workers
(Fig. 4, lines 68). This establishes correctness of the signature computation, and
the signatures coincide on all workers since we assume that the hash function is
deterministic. Finally, the use of the
counterOf
function (line 11) ensures that
each ID is included in the counting set of exactly one worker. Thus, the distributed
sum of the sizes of all counting sets is equal to the size of the partition.
Distributed Coalgebraic Partition Refinement 169
Complexity. Let us assume that not only states, but also outgoing transitions
are distributed evenly among the workers, i.e. every worker has about
m/W
outgoing transitions. In the Initialization Phase, the loop sending messages runs
in
O(m
W)
and receiving takes
O(W·n
W) = O(n)
, since for worker
wi
every other
worker
wj
might have an edge into every state in
Si
. Both are executed in parallel
so in total the phase runs in
O(max( m
W, n)) = O(m
W+n)
. In the Refinement
Phase, we assume the run time of computing signatures and their hashes is linear
in the number of edges. Then the loop for computing and hashing (
O(m
W)
) and
counting (
O(n
W)
) signatures runs in total in
O(m+n
W)
, since it is performed by
all workers independently. Each worker receives at most
m/W
ID-updates each
round and the partition size is computable in
O(W)
giving the complexity of one
refinement step in
O(m+n
W)
. As many as
n
iterations might be needed for a total
complexity of O(m
W+n) + n· O(n+m
W) = Omn+n2
W+n.
Remark 4.1.
The above analysis assumes that signature interfaces are imple-
mented with a linear run time in their input bag. This could in fact be theoretically
realized for all basic functors (whence also for their combinations) currently im-
plemented in CoPaR, which would involve using bucket sort for the grouping of
bag elements by the target block (second component), e.g. for monoid-valued
functors. However, since the table used in bucket sort would be very large (the
size of the last partition) and memory conscience is our main motivation, we
opted for an implementation using a standard nlog nsorting algorithm instead.
Implementation details. CoPaR is implemented in Haskell. We were able
to reuse, with only minor adjustments, major parts of the code base of CoPaR
dedicated to the representation and processing of coalgebras. This includes the
implemented functors and their encodings together with the corresponding parser
and preprocessing algorithms (see Section 2). As explained in Section 3the
sequential Paige-Tarjan-style algorithm of CoPaR was not used; we implemented
an additional “algorithmic frontend” to our “coalgebraic backend”. To compute
signatures during the Refinement Phase, each functor implements the signature
interface (Definition 3.1), which is written in Haskell as follows:
class Hashable (Signature f) => SignatureInterface f where
type Signature f :: Type
sig :: F1f -> [( Lab el f , Int )] -> Signature f
We require in the second line a type
Signature f
, that serves as an implementa-
tion-specific datatype representation of
FN
. In the type of
sig
, the types
f,Label f
and
F1 f
correspond to the name of
F
, its label type and the set
F1
, respectively.
Example 4.2.
The Haskell-implementation of the signature interface for the
finite power set functor Pωfrom Example 3.2(2)is as follows:
data Px=Px already defined in CoPaR
type instance Label P = () also already defined
instance SignatureInterface P where
type Signature P = Set Int
170 F. Birkmann, H.-P. Deifel, S. Milius
sig :: F1f -> [((), Int )] -> Set Int
sig _ = setFromList . map snd
Signature interfaces for the other basic functors according to the grammar in
(1)
are implemented similarly. For combined functors CoPaR automatically derives
their signature interface based on Construction 3.3.
In the algorithm itself, each worker runs three threads in parallel: The first
thread is for computing, the second one is for sending and the third one is for
receiving signatures. This allows us to keep calls to the MPI interface separated
from (pure) signature computation, simplifying logic and allowing the workers
to scatter the ID of one state while simultaneously computing the signature of
the next one to ensure that neither signature computation nor network traffic
become bottlenecks. For inter-thread communication and synchronization we rely
on Haskell’s software transactional memory [19] to ease concurrent programming,
e.g. to avoid race conditions.
Comparison to Blom and Orzan’s algorithm. We now discuss a few
differences of our algorithm to Blom and Orzan’s original one [8].
In Blom and Orzan’s algorithm for LTSs the sets
Ins
of
sSi
are in fact lists
and contain worker
wk
a total of
r
times if there exist
r
edges from states in
Sk
to
s
. This induces a redundancy in messages of ID updates, since
wi
sends
r
(instead of one) messages with the ID of
s
to
wk
. If the LTS has an average
fanout of
f
then each worker has
t=n/W ·f
outgoing transitions; this is the
number of ID updates received every round. Since there are only
n
states, at most
n/t =W/f
of those messages are necessary. In our scenario, we have
Wf
for
large coalgebras, hence the overhead becomes massive; e.g. for
W= 10, f = 100
already
90%
of all ID messages are redundant. We use sets instead of lists for
Ins
to avoid this redundancy.
Signature computation and communication do not proceed simultaneously in
Blom and Orzan’s original algorithm. However, in their optimized version [9] and
in Blom et al.’s algorithm for state labelled continuous-time Markov chains [4]
they do.
Another difference of our implementation is that we decided to hash the
signatures directly on the workers of the respective states while Blom and Orzan
decided to first send the signatures to some dedicated hashing worker who is
then (uniquely) responsible for hashing, i.e. computing a new ID. This method
allows to compute new IDs in constant time. However, for more complex functors
supported by CoPaR, sending signatures could result in very large messages, so we
opted for minimizing network traffic at the cost of slower signature computation.
5Evaluation
To illustrate the practical utility and scalability of the algorithm and its im-
plementation in CoPaR, we report on a number of benchmarks performed on
a selection of randomly generated and real world data. In previous evaluations
of sequential CoPaR [41], we were limited by the 16GB RAM of a standard
workstation. Here we demonstrate that our distributed implementation fulfills its
Distributed Coalgebraic Partition Refinement 171
main objective of handling larger systems without lifting the memory restriction
per process. All benchmarks were run on a high performance computing cluster
consisting of nodes with two Xeon 2660v2“Ivy Bridge” chips (10 cores per
chip + SMT) with 2.2GHz clock rate and 64GB RAM. The nodes are connected
by a fat-tree InfiniBand interconnect fabric with 40 GBit/s bandwidth. Most
execution runs were performed using 32 workers on 8nodes, resulting in 4worker
processes per node. No process used more than 16GB RAM. Execution times of
the sequential algorithm were taken using one node of the cluster. No times are
given for executions that ran out of 16GB memory previously [41]; those were
not run on the cluster.
Weighted Tree Automata. In previous work [41], we have determined the size
of the largest weighted tree automata for different parameters that the sequential
version of CoPaR could handle in 16GB of RAM. Here, we demonstrate that the
distributed version can indeed overcome these memory constraints and process
much larger inputs.
Recall from Example 2.2that weighted tree automata are coalgebras for the
functor
F X =M×M(ΣX)
. For these benchmarks, we use
ΣX = 4×Xr
with rank
r {1,...,5}
and the monoids
(2,,0)
(available as the finite powerset functor
in CoPaR),
(N,max,0)
and
(Pω(64),,)
. To generate a random automaton
with
n
states, we uniformly chose
k= 50 ·n
transitions from the set of all possible
transitions (using an efficient sampling algorithm by Vitter [39]) resulting in a
coalgebra encoding with
n0= 51 ·n
states and
m= (r+ 1) ·k
edges. We took
care to restrict the state and transition weights to at most 50 different monoid
elements in each example, to avoid the situation where all states are already
distinguished in the first iteration of the algorithm.
2324252627
28
29
210
211
Workers used
Mem. per Worker [MB]
50
100
150
200
Computation time [s]
Table 1lists results for both the
sequential and distributed implemen-
tation when run on the same input.
These are the largest WTAs for their
respective rank and monoid that se-
quential CoPaR could handle using at
most 16GB of RAM [41]. In contrast,
the distributed implementation uses
less than 1GB per worker for those
examples and is thus able to handle
much larger inputs. Incidentally, the
distributed implementation is also faster despite the overhead incurred by network
communication. This can partly be attributed to the input-parsing stage, which
does not need inter-worker synchronization and is thus perfectly parallelizable.
To test the scaling properties of the distributed algorithm, we ran CoPaR with
the same input WTA but a varying number of worker processes. For this we chose
the WTA for the monoid
(2,,0)
with
ΣX = 4 ×X5
having 86852 states with
4342600 transitions and file size 186MB. The figure on the right above depicts
the maximum memory usage per worker and the overall running time. The results
show that both data points scale nicely with up to 32 workers, but while the
172 F. Birkmann, H.-P. Deifel, S. Milius
running time even increases when using up to 128 workers, the memory usage per
worker (the main motivation for this work) continues to decrease significantly.
Monoid r k n Mem. (MB) Time (s) Seq. Time (s)
(Pω(64),,)
5 4630750 92615 849 61 511
4 4171550 83431 663 52 642
3 4721250 94425 639 59 528
2 6704100 134082 675 76 471
1 7605350 152107 642 79 566
3 47212500 944250 6786 675
(N,max,0)
5 4722550 94451 871 61 445
4 4643950 92879 754 56 463
3 5039950 100799 628 64 391
2 5904200 118084 633 74 403
1 7845650 156913 677 82 438
3 50399500 1007990 5644 645
(2,,0)
5 4342600 86852 701 71 537
4 4624550 92491 728 67 723
3 6710350 134207 825 113 689
2 6900000 138000 715 129 467
1 7743150 154863 621 160 449
3 65000000 1300000 7092 1377
Table 1: Maximally manageable WTAs for sequential CoPaR; “Mem. and “Time”
are the memory and time required for the distributed algorithm and are the
maximum over all workers. “Seq. Time” is the time needed by sequential CoPaR.
PRISM Models. Finally, we show how our distributed partition refinement
implementation performs on models from the benchmark suite [27] of the PRISM
model checker [26]. These model (aspects of) real-world protocols and are thus
a good fit to evaluate how CoPaR performs on inputs that arise in practice.
Specifically, we use the fms and wlan_time_bounded families of systems. These
are continuous time Markov chains, regarded as coalgebras for
F X =R(X)
, and
Markov decision processes regarded as coalgebras for
F X =N× Pω(N×(DωX))
,
respectively. Again, our translation to coalgebras took care to force a coarse
initial partition in the algorithm.
The results in Table 2show that the distributed implementation is again able
to handle larger systems than sequential CoPaR in 16GB of RAM per process.
For the fms benchmarks, the distributed implementation is again faster than the
sequential one. However, this is not the case for the wlan examples. The larger
run times might be explained by the much higher number of iterations of the
refinement phase (
i
-column of the table). This means that only few states are
distinguished in each phase, and thus signatures are re-computed more often and
more network traffic is incurred.
Distributed Coalgebraic Partition Refinement 173
Model n m Mem. (MB) Time (s) iSeq. Time (s)
fms (n=4)35910 237120 13 2 4 4
fms (n=5)152712 1111482 62 8 5 17
fms (n=6)537768 4205670 163 26 5 68
fms (n=7)1639440 13552968 514 84 5 232
fms (n=8)4459455 38533968 1690 406 7
wlan_tb (K=0)582327 771088 90 297 306 39
wlan_tb (K=1)1408676 1963522 147 855 314 105
wlan_tb (K=2)1632799 5456481 379 2960 374
Table 2: Benchmarks on PRISM models:
n
and
m
are the numbers of states and
edges of the input coalgebra;
i
is the number of refinement steps (iterations). The
other columns are analogous to Table 1.
6Conclusions and Future Work
We have presented a new and simple partition refinement algorithm in coalgebraic
genericity which easily lends itself to a distributed implementation. Our algorithm
is based on König and Küpper’s final chain algorithm [25] and Blom and Orzan’s
signature refinement algorithm for labelled transition systems [8]. We have
provided a distributed implementation in the tool CoPaR. Like the previous
sequential Paige-Tarjan style partition refinement algorithm, our new algorithm
is modular in the system type. This is made possible by combining signature
interfaces by product and coproduct, which is used by CoPaR for handling
combined type functors. Experimentation has shown that with the distributed
algorithm CoPaR can handle larger state spaces in general. Run times stay low for
weighted tree automata, whereas we observed severe penalties on some models
from the PRISM benchmark suite.
An additional optimization of the coalgebraic signature refinement algorithm
should be possible using Blom and Orzan’s idea [9] to mark in each iteration
those states whose signatures can change in the next iteration and only recompute
signatures for those states in the next round. This might mitigate the run time
penalties we have seen in some of the PRISM benchmarks.
Further work on CoPaR concerns symbolic techniques: we have a prototype
sequential implementation of the coalgebraic signature refinement algorithm
where state spaces are represented using BDDs. In a subsequent step it could be
investigated whether this can be distributed. In another direction the distributed
algorithm might be extended to compute distinguishing formulas, as recently
achieved for the sequential algorithm [43], for which there is also an implemented
prototype. Finally, there is still work required to integrate all these new fea-
tures, i.e. distribution, distinguishing formulas, reachability and computation of
minimized systems, into one version of CoPaR.
Data Availability Statement The software CoPaR and the input files that
were used to produce the results in this paper are available for download [3]. The
latest version of CoPaR can be obtained at https://git8.cs.fau.de/software/copar.
174 F. Birkmann, H.-P. Deifel, S. Milius
References
1.
Balcazar, J., Gabarro, J., Santha, M.: Deciding bisimilarity is
P
-complete. Form.
Asp. Comput. 4(6A), 638648 (1992)
2.
Bartels, F., Sokolova, A., de Vink, E.: A hierarchy of probabilistic system
types. In: Coalgebraic Methods in Computer Science, CMCS 2003. Elec-
tron. Notes Theor. Comput. Sci., vol. 82, pp. 5775. Elsevier (2003)
3.
Birkmann, F., Deifel, H.P., Milius, S.: Software and Benchmarks for Distributed Coal-
gebraic Partition Refinement (Jan 2022). https://doi.org/10.5281/zenodo.5907084
4.
Blom, S., Haverkort, B.R., Kuntz, M., van de Pol, J.: Distributed Markovian
bisimulation reduction aimed at CSL model checking. In: Proceedings of the 7th
International Workshop on Parallel and Distributed Methods in verifiCation (PDMC
2008). Electron. Notes Theor. Comput. Sci., vol. 220, pp. 3550. Elsevier (2008)
5.
Blom, S., Orzan, S.: A distributed algorithm for strong bisimulation reduction of
state spaces. In: Brim, L., Grumberg, O. (eds.) Proc. Parallel and Distributed Model
Checking (PDMC). Electron. Notes Theor. Comput. Sci., vol. 68, pp. 523538.
Elsevier (2002)
6.
Blom, S., Orzan, S.: Distributed branching bisimulation reduction of state spaces.
In: Sokolsky, O., Viswanathan, M. (eds.) Proc. Parallel and Distributed Model
Checking (PDMC). Electron. Notes Theor. Comput. Sci., vol. 89, pp. 99113.
Elsevier (2003)
7.
Blom, S., Orzan, S.: Distributed state space minimization. In: Arts, T., Fokkink,
W. (eds.) Proc. Eighth International Workshop on Formal Methods for Industrial
Critical Systems (FMICS). Electron. Notes Theor. Comput. Sci., vol. 80, pp. 109
123. Elsevier (2003)
8.
Blom, S., Orzan, S.: A distributed algorithm for strong bisimulation reduction of
state spaces. International Journal on Software Tools for Technology Transfer 7(1),
7486 (2005). https://doi.org/10.1007/s10009-004-0159-4
9.
Blom, S., Orzan, S.: Distributed state space minimization. International Jour-
nal on Software Tools for Technology Transfer 7(3), 280291 (Jun 2005).
https://doi.org/10.1007/s10009-004-0185-2
10.
Buchholz, P.: Bisimulation relations for weighted automata. Theoret. Comput. Sci.
393,109123 (2008)
11.
Deifel, H.P., Milius, S., Schröder, L., Wißmann, T.: Generic partition refinement and
weighted tree automata. In: ter Beek et al., M. (ed.) Proc. International Symposium
on Formal Methods (FM). Lecture Notes Comput. Sci., vol. 11800, pp. 280297.
Springer (2019)
12.
Deifel, H.P., Milius, S., Wißmann, T.: Coalgebra encoding for efficient minimization.
In: Kobayashi, N. (ed.) Proc. 6th International Conference on Formal Structures
for Computation and Deduction (FSCD). LIPIcs, vol. 195, pp. 28:128:19. Schloss
Dagstuhl (2021)
13.
Derisavi, S., Hermanns, H., Sanders, W.: Optimal state-space lumping in Markov
chains. Inf. Process. Lett. 87(6), 309315 (2003)
14.
Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled markov
processes. Inform. Comput. 179(2), 163193 (2002)
15.
van Dijk, T., van de Pol, J.: Multi-core symbolic bisimulation minimisation. Inter-
national Journal on Software Tools for Technology Transfer 20(2), 157177 (Apr
2018). https://doi.org/10.1007/s10009-017-0468-z,http://link.springer.com/10.
1007/s10009-017-0468-z
16.
Dorsch, U., Milius, S., Schröder, L., Wißmann, T.: Efficient coalgebraic partition
refinement. In: Meyer, R., Nestmann, U. (eds.) Proc. 28th International Conference
Distributed Coalgebraic Partition Refinement 175
on Concurrency Theory (CONCUR). LIPIcs, vol. 85, pp. 28:128:16. Schloss
Dagstuhl (2017)
17.
van Glabbeek, R.: The linear time branching time spectrum I; the semantics
of concrete, sequential processes. In: Bergstra, J., Ponse, A., Smolka, S. (eds.)
Handbook of Process Algebra, pp. 399. Elsevier (2001)
18.
Gries, D.: Describing an algorithm by Hopcroft. Acta Informatica 2,97109 (1973)
19.
Harris, T., Marlow, S., Peyton Jones, S.: Composable memory transac-
tions. In: PPoPP 05: Proceedings of the tenth ACM SIGPLAN sympo-
sium on Principles and practice of parallel programming. pp. 4860. ACM
Press (January 2005), https://www.microsoft.com/en-us/research/publication/
composable-memory-transactions/
20.
Högberg (Björklund), J., Maletti, A., May, J.: Bisimulation minimisation for
weighted tree automata. In: Developments in Language Theory, 11th International
Conference, DLT 2007, Turku, Finland, July 3-6,2007, Proceedings. Lecture Notes
Comput. Sci., vol. 4588, pp. 229241. Springer (2007). https://doi.org/10.1007/978-
3-540-73208-2
21. Hopcroft, J.: An nlog nalgorithm for minimizing states in a finite automaton. In:
Theory of Machines and Computations. pp. 189196. Academic Press (1971)
22.
Huynh, D., Tian, L.: On some equivalence relations for probabilistic processes.
Fund. Inform. 17,211234 (1992)
23.
Kanellakis, P.C., Smolka, S.A.: CCS expressions, finite state processes,
and three problems of equivalence. Inform. Comput. 86(1), 4368 (1990).
https://doi.org/10.1016/0890-5401(90)90025-D
24.
Knuutila, T.: Re-describing an algorithm by Hopcroft. Theoret. Comput. Sci. 250,
333363 (2001)
25.
König, B., Küpper, S.: Generic partition refinement algorithms for coalgebras and
an instantiation to weighted automata. In: Theoretical Computer Science, IFIP
TCS 2014. Lecture Notes Comput. Sci., vol. 8705, pp. 311325. Springer (2014)
26.
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic
real-time systems. In: Computer Aided Verification, CAV 2011. LNCS, vol. 6806,
pp. 585591. Springer (2011)
27.
Kwiatkowska, M.Z., Norman, G., Parker, D.: The PRISM benchmark suite. In:
Ninth International Conference on Quantitative Evaluation of Systems, QEST 2012,
London, United Kingdom, September 17-20,2012. pp. 203204. IEEE Computer
Society (2012). https://doi.org/10.1109/QEST.2012.14
28.
Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inform. Comput.
94(1), 128 (1991)
29.
Milner, R.: A Calculus of Communicating Systems, Lecture Notes Comput. Sci.,
vol. 92. Springer (1980)
30.
Milner, R.: Communication and Concurrency. International Series in Computer
Science, Prentice Hall (1989)
31.
Paige, R., Tarjan, R.: Three partition refinement algorithms. SIAM J. Comput.
16(6), 973989 (1987)
32.
Park, D.: Concurrency on automata and infinite sequences. In: Deussen, P. (ed.)
Proc. Conf. on Theoretical Computer Science. Lecture Notes Comput. Sci., vol. 104,
pp. 167183 (1981)
33.
Rajasekaran, S., Lee, I.: Parallel algorithms for relational coarsest parti-
tion problems. IEEE Trans. Parallel Distributed Syst. 9(7), 687699 (1998).
https://doi.org/10.1109/71.707548
34.
Rutten, J.: Universal coalgebra: a theory of systems. Theoret. Comput. Sci. 249,
380 (2000)
176 F. Birkmann, H.-P. Deifel, S. Milius
35.
Rutten, J., de Vink, E.: Bisimulation for probabilistic transition systems: a coalge-
braic approach. Theoret. Comput. Sci. 221,271293 (1999)
36.
Valmari, A.: Bisimilarity minimization in
O(mlog n)
time. In: Applications and
Theory of Petri Nets, PETRI NETS 2009. Lecture Notes Comput. Sci., vol. 5606,
pp. 123142. Springer (2009)
37.
Valmari, A.: Simple bisimilarity minimization in o(m log n) time. Fundam. Inform.
105(3), 319339 (2010). https://doi.org/10.3233/FI-2010-369
38.
Valmari, A., Franceschinis, G.: Simple
O(mlog n)
time Markov chain lumping. In:
Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2010.
Lecture Notes Comput. Sci., vol. 6015, pp. 3852. Springer (2010)
39.
Vitter, J.S.: An efficient algorithm for sequential random sampling. ACM Trans.
Math. Softw. 13(1), 5867 (1987). https://doi.org/10.1145/23002.23003
40.
Wimmer, R., Herbstritt, M., Hermanns, H., Strampp, K., Becker, B.: Sigref
A Symbolic Bisimulation Tool Box. In: Hutchison, D., Kanade, T., Kittler, J.,
Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan,
C., Steffen, B., Sudan, M., Terzopoulos, D., Tygar, D., Vardi, M.Y., Weikum, G.,
Graf, S., Zhang, W. (eds.) Automated Technology for Verification and Analysis,
vol. 4218, pp. 477492. Springer Berlin Heidelberg, Berlin, Heidelberg (2006).
https://doi.org/10.1007/11901914_35
41.
Wißmann, T., Deifel, H.P., Milius, S., Schröder, L.: From generic partition refinement
to weighted tree automata minimization. Form. Asp. Comput. 33,695727 (2021)
42.
Wißmann, T., Dorsch, U., Milius, S., Schröder, L.: Efficient and modular coalgebraic
partition refinement. Log. Methods Comput. Sci. 16(1), 8:18:63 (2020)
43.
Wißmann, T., Milius, S., Schröder, L.: Explaining behavioural inequivalence generi-
cally in quasilinear time. In: Haddad, S., Varacca, D. (eds.) Proc. 32nd International
Conference on Concurrency Theory (CONCUR). LIPIcs, vol. 203, pp. 31:132:18.
Schloss Dagstuhl (2021)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Distributed Coalgebraic Partition Refinement 177
From Bounded Checking to Verification of
Equivalence via Symbolic Up-to Techniques
Vasileios Koutavas1, Yu-Yang Lin1() , and Nikos Tzevelekos2
1Trinity College Dublin, Dublin, Ireland {Vasileios.Koutavas,linhouy}@tcd.ie
2Queen Mary University of London, London, UK nikos.tzevelekos@qmul.ac.uk
Abstract.
We present a bounded equivalence verification technique
for higher-order programs with local state. This technique combines
fully abstract symbolic environmental bisimulations similar to symbolic
game semantics, novel up-to techniques, and lightweight state invariant
annotations. This yields an equivalence verification technique with no
false positives or negatives. The technique is bounded-complete, in that
all inequivalences are automatically detected given large enough bounds.
Moreover, several hard equivalences are proved automatically or after
being annotated with state invariants. We realise the technique in a tool
prototype called Hobbit and benchmark it with an extensive set of new
and existing examples. Hobbit can prove many classical equivalences
including all Meyer and Sieber examples.
Keywords: Contextual equivalence ·bounded model checking ·symbolic
bisimulation ·up-to techniques ·operational game semantics.
1 Introduction
Contextual equivalence is a relation over program expressions which guaran-
tees that related expressions are interchangeable in any program context. It
encompasses verification properties like safety and termination. It has attracted
considerable attention from the semantics community (cf. the 2017 Alonzo Church
Award), and has found its main applications in the verification of cryptographic
protocols [4], compiler correctness [26] and regression verification [10,11,9,17].
In its full generality, contextual equivalence is hard as it requires reasoning
about the behaviour of all program contexts, and becomes even more difficult in
languages with higher-order features (e.g. callbacks) and local state. Advances in
bisimulations [
16
,
29
,
3
], logical relations [
1
,
13
,
15
] and game semantics [
18
,
25
,
8
,
20
]
have offered powerful theoretical techniques for hand-written proofs of contextual
equivalence in higher-order languages with state. However, these advancements
have yet to be fully integrated in verification tools for contextual equivalence
in programming languages, especially in the case of bisimulation techniques.
Existing tools [12,24,14] only tackle carefully delineated language fragments.
This publication has emanated from research supported in part by a grant from
Science Foundation Ireland under Grant number 13/RC/2094_2.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 178–195, 2022.
https://doi.org/10.1007/978-3-030-99527-0_10
In this paper we aim to push the frontier further by proposing a bounded
model checking technique for contextual equivalence for the entirety of a higher-
order language with local state (Sec. 3). This technique, realised in a tool called
Hobbit,
3
automatically detects inequivalent program expressions given sufficient
bounds, and proves hard equivalences automatically or semi-automatically.
Our technique uses a labelled transition system (LTS) for open expressions
in order to express equivalence as a bisimulation. The LTS is symbolic both for
higher-order arguments (Sec. 4), similarly to symbolic game models [
8
,
20
] and
derived proof techniques [
3
,
15
], and first-order ones (Sec. 6), adopting established
techniques (e.g. [
6
]) and tools such as Z3 [
23
]. This enables the definition of a fully
abstract symbolic environmental bisimulation, the bounded exploration of which
is the task of the Hobbit tool. Full abstraction guarantees that our tool finds all
inequivalences given sufficient bounds, and only reports true inequivalences. As
is corroborated by our experiments, this makes Hobbit a practical inequivalence
detector, similar to traditional bounded model checking [
2
] which has been proved
an effective bug detection technique in industrial-scale C code [6,7,30].
However, while proficient in bug finding, bounded model checking can rarely
prove the absence of errors, and in our setting prove an equivalence: a bound
is usually reached before all—potentially infinite—program runs are explored.
Inspired by hand-written equivalence proofs, we address this challenge by propos-
ing two key technologies: new bisimulation up-to techniques, and lightweight user
guidance in the form of state invariant annotations. Hence we increase signifi-
cantly the number of equivalences proven by Hobbit, including for example all
classical equivalences due to Meyer and Sieber [21].
Up-to techniques [
28
] are specific to bisimulation and concern the reduction
of the size of bisimulation relations, oftentimes turning infinite transition systems
into finite ones by focusing on a core part of the relation. Although extensively
studied in the theory of bisimulation, up-to techniques have not been used in
practice in an equivalence checker. We specifically propose three novel up-to
techniques: up to separation and up to re-entry (Sec. 5), dealing with infinity in
the LTS due to the higher-order nature of the language, and up to state invariants
(Sec. 7), dealing with infinity due to state updates. Up to separation allows us
to reduce the knowledge of the context the examined program expressions are
running in, similar to a frame rule in separation logic. Up to re-entry removes the
need of exploring unbounded nestings of higher-order function calls under specific
conditions. Up to state invariants allows us to abstract parts of the state and
make finite the number of explored configurations by introducing state invariant
predicates in configurations.
State invariants are common in equivalence proofs of stateful programs, both
in handwritten (e.g. [
16
]) and tool-based proofs. In the latter they are expressed
manually in annotations (e.g. [
9
]) or automatically inferred (e.g. [
14
]). In Hobbit
we follow the manual approach, leaving heuristics for automatic invariant inference
for future work. An important feature of our annotations is the ability to express
relations between the states of the two compared terms, enabled by the up to
3
Higher Order Bounded BIsimulation Tool (Hobbit), https://github.com/LaifsV1/Hobbit.
From Bounded Checking to Verification of Equivalence 179
state invariants technique. This leads to finite bisimulation transition systems in
examples where concrete value semantics are infinite state.
The above technologies, combined with standard up-to techniques, transform
Hobbit from a bounded checker into an equivalence prover able to reason about
infinite behaviour in a finite manner in a range of examples, including classical
example equivalences (e.g. all in [
21
]) and some that previous work on up-to
techniques cannot algorithmically decide [
3
] (cf. Ex. 22). We have benchmarked
Hobbit on examples from the literature and newly designed ones (Sec. 8). Due
to the undecidable nature of contextual equivalence, up-to techniques are not
exhaustive: no set of up-to techniques is guaranteed to finitise all examples.
Indeed there are a number of examples where the bisimulation transition system
is still infinite and Hobbit reaches the exploration bound. For instance, Hobbit
is not able to prove examples with inner recursion and well-bracketing properties,
which we leave to future work. Nevertheless, our approach provides a contextual
equivalence tool for a higher-order language with state that can prove many
equivalences and inequivalences which previous work could not handle due to
syntactic restrictions and other limitations (Sec. 9).
Related work Our paper marries techniques from environmental bisimulations
up-to [16,29,28,3] with the work on fully abstract game models for higher-order
languages with state [
18
,
8
,
20
]. The closest to our technique is that of Biernacki et
al. [
3
], which introduces up-to techniques for a similar symbolic LTS to ours, albeit
with symbolic values restricted to higher-order types, resulting in infinite LTSs in
examples such as Ex. 21, and with inequivalence decided outside the bisimulation
by (non-)termination, precluding the use up-to techniques in examples such as
Ex. 22. Close in spirit is the line of research on logical relations [
1
,
13
,
15
] which
provides a powerful tool for hand-written proofs of contextual equivalence. Also
related are the tools Hector [
12
] and Coneqct [
24
], and SyTeCi [
14
], based
on game semantics and step-indexed logical relations respectively (cf. Sec. 9).
2 High-Level Intuitions
Contextual equivalence requires that two program expressions lead to the same
observable result in any program context these may be fed in. Instead of working
directly with this definition, we can translate programs into a semantic model
that is fully abstract, reducing contextual equivalence to semantic equality.
The semantic model we use is that of Game Semantics [
18
]. We model programs
as formal interactions between two players: a Proponent (corresponding to the
program) and an Opponent (standing for any program context). Concretely, these
interactions are sets of traces produced from a Labelled Transition System (LTS),
the nodes and labels of which are called configurations and moves respectively.
The LTS captures the interaction of the program with its environment, which
is realised via function applications and returns: moves can be questions (i.e.
function applications) or answers (returns), and belong to proponent or opponent.
E.g. a program calling an external function will issue a proponent question, while
the return of the external function will be an opponent answer. In the examples
that follow, moves that correspond to the opponent shall be underlined.
180 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
Nf1(); 0
0
f2(); 0
0
· · ·
retgappg, f1app(f1,()) appg, f2
ret()
app(f2,())
ret()ret0ret0
appg, f3
Mx1: 0 x1: 0
x1: 0
· · ·
x1: 0
x2: 0
x1: 0
x2: 0
x1: 0
x2: 0
· · ·
· · ·
retgappg, f1app(f1,()) appg, f2
ret()
app(f2,())
ret()
appg, f3
ret0ret0
NC1
retgappg, f1app(f1,()) appg, f2
ret()
ret0
appg, f2
appg, f2MC1x1: 0 x1: 0
x1: 0x1: 0
retgappg, f1app(f1,()) appg, f2
ret()
ret0
appg, f2
appg, f2
Fig. 1. Sample LTS’s modelling expressions in Section 2.
Example 1.
Consider the expression
N
=
(fun f -> f (); 0)
of type (
unit
unit
)
int
. Evaluating
N
leads to a function
g
being returned (i.e.
g
is
λf.f
(); 0).
When
g
is called with some input
f1
, it will always return 0 but in the process it
may call the external function
f1
. The call to
f1
may immediately return or it
may call gagain (i.e. reenter), and so on. The LTS for Nis as in Fig. 1(top).
Given two expressions
M, N
, checking their equivalence will amount to check-
ing bisimulation equivalence of their (generally infinite) LTS’s. Our checking
routine performs a bounded analysis that aims to either find a finite counterex-
ample and thus prove inequivalence, or build a bisimulation relation that shows
the equivalence of the expressions. The former case is easier as it is relatively
rapid to explore a bisimulation graph up to a given depth. The latter one is
harder, as the target bisimulation can be infinite. To tackle part of this infinity,
we use three novel up-to techniques for environmental bisimulation.
Up-to techniques roughly assert that if a core set of configurations in the
bisimulation graph explored can be proven to be part of a relation satisfying a
definition that is more permissive than standard bisimulation, then a superset
of configurations forms a proper bisimulation relation. This has the implication
that a bounded analysis can be used to explore a finite part of the bisimulation
graph to verify potentially infinitely many configurations. As there can be no
complete set of up-to techniques, the pertaining question is how useful they are
in practice. In the remainder of this section we present the first of our up-to
techniques, called up to separation, via an example equivalence. The intuition
behind this technique comes from Separation Logic and amounts to saying that
functions that access separate regions of the state can be explored independently.
As a corollary, a function that manipulates only its own local references may be
explored independently of itself, i.e. it suffices to call it once.
From Bounded Checking to Verification of Equivalence 181
Loc:l, k Var:x, y, z Const:c
Type:T::= bool |int |unit |TT|T1. . . Tn
Exp:e, M, N ::= v|(e)|op(e)|e e |if ethen eelse e|ref l=vin e|!l|l:= e|let(x) = ein e
Val:u, v ::= c|x|fixf(x).e |(v)
ECxt:E::= [·]T|(v , E, e)|op(v, E, e)|E e |v E |l:= E|if Ethen eelse e|let(x) = Ein e
Cxt:D::= [·]i,T |e|(
D)|op(
D)|D D |l:= D|if Dthen Delse D|fixf(x).D
|ref l=Din D|let(x) = Din D
St:s, t Loc fin
Val
s;op(c) s;wif oparith (c) = w
s;(fixf(x).e)v s;e[v/x][fixf(x).e/f ]
s;let(x) = (v)in e s;e[v/x]
s;ref l=vin e s[l7→ v] ; eif l∈ dom(s)
s; !l s;vif s(l) = v
s;l:= v s[l7→ v] ;()
s;if cthen e1else e2 s;eiif (c, i) {(tt,1),(,2)}
s;E[e] s;E[e]if s;e s;e
Fig. 2. Syntax and reduction semantics of the language λimp.
Example 2.
Consider
M
=
(fun f -> ref x=0in f (); !x)
and
N
from
Ex. 1. The LTS corresponding to
M
and
N
are shown in Fig. 1(middle and
top). Regarding
M
, we can see that opponent is always allowed to reenter the
proponent function
g
, which creates a new reference
xn
each time. This makes
each configuration unique, which prevents us from finding cycles and thus finitise
the bisimulation graph. Moreover, both the LTS for
M
and
N
are infinite because
of the stack discipline they need to adhere to when O issues reentrant calls.
With separation, however, we could prune the two LTS’s as in Fig. 1(bottom).
We denote the configurations after the first opponent call as
C1
. Any opponent
call after
C1
leads to a configuration which differs from
C1
either by a state
component that is not accessible anymore and can thus be separated, or by a
stack component that can be similarly separated. Hence, the LTS’s that we need
to consider are finite and thus the expressions are proven equivalent.
3 Language and Semantics
We develop our technique for the language
λimp
, a simply typed lambda calculus
with local state whose syntax and reduction semantics are shown in Fig. 2. Ex-
pressions (
Exp
) include the standard lambda expressions with recursive functions
(
fixf
(
x
)
.e
), together with location creation (
ref l=vin e
), dereferencing (
!l
), and
assignment (
l:= e
), as well as standard base type constants (
c
) and operations
(
op
(
e
)). Locations are mapped to values, including function values, in a store (
St
).
We write
·
for the empty store and let
(
χ
)denote the set of free locations in
χ
.
The language
λimp
is simply-typed with typing judgements of the form
;
Σ
e
:
T
, where
is a type environment (omitted when empty),
Σ
a store typing and
T
a value type (
Type
);
Σs
is the typing of store
s
. The rules of the type system are
standard and omitted here. Values consist of boolean, integer, and unit constants,
182 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
functions and arbitrary length tuples of values. To keep the presentation of our
technique simple we do not include reference types as value types, effectively
keeping all locations local. Exchange of locations between expressions can be
encoded using get and set functions. In Ex. 22 we show the encoding of a classic
equivalence with location exchange between expressions and their context. Future
work extensions to our technique to handle location types can be informed from
previous work [18,14].
The reduction semantics is by small-step transitions between configurations
containing a store and an expression,
s;e⟩→⟨s;e
, defined using single-hole
evaluation contexts (
ECxt
) over a base relation
. Holes [
·
]
T
are annotated with
the type
T
of closed values they accept, which we may omit to lighten notation.
Beta substitution of
x
with
v
in
e
is written as
e
[
v/x
]. We write
s;e
to denote
s;e t;v
for some
t
,
v
. We write
χ
to mean a syntactic sequence, and
assume standard syntactic sugar from the lambda calculus. In our examples we
assume an ML-like syntax and implementation of the type system, which is also
the concrete syntax of Hobbit.
We consider environments
ΓNfin
Val
which map natural numbers to
closed values. The concatenation of two such environments
Γ1
and
Γ2
, written
Γ1, Γ2
is defined when
dom
(
Γ1
)
dom
(
Γ2
) =
. We write (
i1v1,...,invn
)for a
concrete environment mapping
i1, . . . , in
to
v1, . . . , vn
, respectively. When indices
are unimportant we omit them and treat Γenvironments as lists.
General contexts
D
contain multiple, non-uniquely indexed holes [
·
]
i,T
, where
T
is the type of value that can replace the hole. Notation
D
[
Γ
]denotes the
context
D
with each hole [
·
]
i,T
replaced with
Γ
(
i
), provided that
idom
(
Γ
)and
ΣΓ
(
i
) :
T
, for some
Σ
. We omit hole types where possible and indices when all
holes in
D
are annotated with the same
i
. In the latter case we write
D
[
v
]instead
of
D
[(
iv
)] and allow to replace all holes of
D
with a closed expression
e
, written
D
[
e
]. We assume the Barendregt convention for locations, thus replacing context
holes avoids location capture. Standard contextual equivalence [22] follows.
Definition 3
(Contextual Equivalence). Expressions
e1
:
T
and
e2
:
T
are contextually equivalent, written as
e1e2
, when for all contexts
D
such that
D[e1] : unit and D[e2] : unit we have ⟨· ;D[e1] iff ⟨· ;D[e2] .
4 LTS with Symbolic Higher-Order Transitions
Our Labelled Transition System (LTS) has symbolic transitions for both higher-
order and first-order transitions. For simplicity we first present our LTS with
symbolic higher-order and concrete first-order transitions. We develop our theory
and most up-to techniques on this simpler LTS. We then show its extension with
symbolic first-order transitions and develop up to state invariants which relies on
this extension. We extend the syntax with abstract function names α:
Val:u, v, w ::= c|fixf(x).e |(v)|αT
Abstract function names
αT
are annotated with the type
T
of function they
represent, omitted where possible; an(χ)is the set of abstract names in χ.
From Bounded Checking to Verification of Equivalence 183
PropApp :A;Γ;K;s;E[α v]app(α,D)
A;Γ, Γ ;E[·], K ;s;·⟩ if (D , Γ )ulpatt(v)
PropRet :A;Γ;K;s;vret(D)
A;Γ, Γ ;K;s;·⟩ if (D , Γ )ulpatt(v)
OpApp :A;Γ;K;s;·⟩ app(i,D[α])
Aα ;Γ;K;s;eif ΣsΓ(i) : TT
and (D, α)ulpatt(T)
and Γ(i)D[α]e
OpRet :
A;Γ;E[·]T, K ;s;·⟩ ret(D[α])
Aα ;Γ;K;s;E[D[α]]if (D , α)ulpatt(T)
Tau :A;Γ;K;s;eτ
A;Γ;K;s;eif s;e s;e
Response :Cη
⟨⊥⟩ if η=
Term :A;Γ;·;s;·⟩
⟨⊥⟩
Fig. 3. The Labelled Transition System.
We define our LTS (shown in Fig. 3) by opponent and proponent call and
return transitions, based on Game Semantics [
18
]. Proponent transitions are
the moves of an expression interacting with its context. Opponent transitions
are the moves of the context surrounding this expression. These transitions are
over proponent and opponent configurations
A;Γ;K;s;e
and
A;Γ;K;s;·⟩
,
respectively. In these configurations:
Ais a set of abstract function names been used so far in the interaction;
Γis an environment indexing proponent functions known to opponent;4
Kis a stack of proponent continuations, created by nested proponent calls;
sis the store containing proponent locations;
eis the expression reduced in proponent configurations; ˆedenotes eor ·.
In addition, we introduce a special configuration
⟨⊥⟩
which is used in order to
represent expressions that cannot perform given transitions (cf. Remark 6). We
let a trace be a sequence of app and ret moves (i.e. labels), as defined in Fig. 3.
For the LTS to provide a fully abstract model of the language, it is necessary
that functions which are passed as arguments or return values from proponent to
opponent be abstracted away, as the actual syntax of functions is not directly
observable in λimp. This is achieved by deconstructing such values vto:
an ultimate pattern
D
(cf. [
19
]), which is a context obtained from
v
by
replacing each function in vwith a distinct numbered hole; together with
an environment Γmapping indices of these holes to values, and D[Γ] = v.
We let
ulpatt
(
v
)contain all such pairs (
D, Γ
)for
v
; e.g.:
ulpatt
((
λx.e1,
5)) =
{
( ([
·
]
i,
5)
,
[
iλx.e1
] )
|for any i}.
We extend
ulpatt
to types through the use of
symbolic function names:
ulpatt
(
T
)is the largest set of pairs (
D, Γ
)such that
D[Γ] : T, where rng(Γ) = α
T, and Ddoes not contain functions.
4thus, Γis encoding the environment of Environmental Bisimulations (e.g. [16])
184 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
In Fig. 3, proponent application and return transitions (
PropApp
,
PropRet
)
use ultimate pattern matching for values and accumulate the functions generated
by the proponent in the
Γ
environment of the configuration, leaving only their
indices on the label of the transition itself. Opponent application and return
transitions (
OpApp
,
OpRet
) use ultimate pattern matching for types to generate
opponent-generated values which can only contain abstract functions. This elimi-
nates the need for quantifying over all functions in opponent transitions but still
includes infinite quantification over all base values. Symbolic first-order values in
Sec. 6will obviate the latter.
At opponent application the following preorder performs a beta reduction when
opponent applies a concrete function. This technicality is needed for soundness.
Definition 4
(
). For application
v u
we write
v u e
to mean
e
=
α u
, when
v=α; and e=e[u/x][fixf(x).e/f], when v=fixf(x).e.
In our LTS,
C
ranges over configurations and
η
over transition labels;
η
=
means
τ
, when
η
=
τ
, and
τ
=η
τ
=
otherwise. Standard weak (bi-)simulation follows.
Definition 5
(Weak Bisimulation). Binary relation
R
is a weak simulation
when for all
C1RC2
and
C1
η
C
1
, there exists
C
2
such that
C2
η
=C
2
and
C
1R
C
2
. If
R
,
R1
are weak simulations then
R
is a weak bisimulation. Similarity (
)
and bisimilarity (
)are the largest weak simulation and bisimulation, respectively.
Remark 6.
Any proponent configuration that cannot match a standard bisimula-
tion transition challenge can trivially respond to the challenge by transitioning
into
⟨⊥⟩
by the
Response
rule in Fig. 3. By the same rule, this configuration can
trivially perform all transitions except a special termination transition, labelled
with
. However, regular configurations that have no pending proponent calls
(
K
=
·
), can perform the special termination transition (
Term
rule), signalling
the end of a complete trace, i.e. a completed computation. This mechanism
allows us to encode complete trace equivalence, which coincides with contextual
equivalence [
18
], as bisimulation equivalence. In a bisimulation proof, if a propo-
nent configuration is unable to match a bisimulation transition with a regular
transition, it can still transition to
⟨⊥⟩
where it can simulate every transition of
the other expression, apart from
leading to a complete trace.
Our mechanism for treating unmatched transitions has the benefit of enabling
us to use the standard definition of bisimulation over our LTS. This is in contrast
to previous work [
3
,
15
], where termination/non-termination needed to be proven
independently or baked in the simulation conditions. More importantly, our
approach allows us to use bisimulation up-to techniques even when one of the
related configurations diverges, which is not possible in previous symbolic LTSs
[18,15,3], and is necessary in examples such as Ex. 22.
Definition 7
(Bisimilar Expressions). Expressions
e1
:
T
and
e2
:
T
are
bisimilar, written e1e2, when ⟨· ;·;·;·;e1 ⟨· ;·;·;·;e2.
From Bounded Checking to Verification of Equivalence 185
Theorem 8 (Soundness and Completeness). e1e2iff e1e2.
As a final remark, the LTS presented in this section is finite state only for a small
number of trivial equivalence examples. The following section addresses sources
of infinity in the transition systems through bisimulation up-to techniques.
5 Up-to Techniques
We start by the definition of a sound up-to technique.
Definition 9
(Weak Bisimulation up to
f
).
R
is a weak simulation up to
f
when for all
C1RC2
and
C1
η
C
1
, there is
C
2
with
C2
η
=C
2
and
C
1f(R)C
2
.
If
R
,
R1
are weak simulations up to
f
then
R
is a weak bisimulation up to
f
.
Definition 10
(Sound up-to technique). A function
f
is a sound up-to
technique when for any Rwhich is a simulation up to fwe have R(
).
Hobbit employs the standard techniques: up to identity, up to garbage
collection, up to beta reductions and up to name permutations. Here we present
two novel up-to techniques: up to separation and up to reentry.
Up to Separation Our experience with Hobbit has shown that one of the
most effective up-to techniques for finitising bisimulation transition systems is
the novel up to separation which we propose here. The intuition of this technique
is that if different functions operate on disjoint parts of the store, they can be
explored in disjoint parts of the bisimulation transition system. Taken to the
extreme, a function that does not contain free locations can be applied only
once in a bisimulation test as two copies of the function will not interfere with
each other, even if they allocate new locations after application. To define up to
separation we need to define a separating conjunction for configurations.
Definition 11
(Stack Interleaving). Let
K1
,
K2
be lists of evaluation contexts
from
ECxt
(Fig. 2); we define the interleaving operation
K1#
kK2
inductively,
and write
K1#K2
to mean
K1#
kK2
for unspecified
k
. We let
·#··
=
·
and:
E1, K1#(1,
k)K2=E1,(K1#
kK2)K1#(2,
k)E2, K2=E2,(K1#
kK2).
Definition 12 (Separating Conjuction). Let C1=A1;Γ1;K1;s1; ˆe1and
C2=A2;Γ2;K2;s2; ˆe2be well-formed configurations. We define:
C11
kC2
def
=A1A2;Γ1, Γ2;K1#
kK2;s1, s2; ˆe1when ˆe2=·
C12
kC2
def
=A1A2;Γ1, Γ2;K1#
kK2;s1, s2; ˆe2when ˆe1=·
provided dom(s1)dom(s2) = . We let C1C2denote i,
k. C1i
kC2.
The function sep provides the up to separation technique; it is defined as:
UpTo
C1RC2C3RC4
C1i
kC3sep(R)C2i
kC4
UpTo⊕⊥L
C1R ⟨⊥⟩ C3RC4
C1C3sep(R)⟨⊥⟩
UpTo⊕⊥R
C1RC2C3R ⟨⊥⟩
C1C3sep(R)⟨⊥⟩
Soundness follows by extending [
28
,
27
] with a weaker, sufficient proof obligation.
186 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
Lemma 13. Function sep is a sound up-to technique.
Many example equivalences have a finite transition system when using up to
separation in conjunction with the simple techniques of the preceding section.
Example 14.
The following is a classic example equivalence from Meyer and
Sieber [
21
]. The following expressions are equivalent at type (
unit unit
)
unit
.
M=fun f -> ref x=0in f () N=fun f -> f ()
For both functions, after initial application of the function by the opponent,
the proponent calls
f
, growing the stack
K
in the two configurations. At that
point the opponent can apply the same functions again. The LTS of both
M
and
N
is thus infinite because
K
can grow indefinitely, and so is a bisimulation
proving this equivalence. It is additionally infinite because the opponent can keep
applying the initial function applications even after these return. However, if
we apply the up-to separation technique immediately after the first opponent
application, the
Γ
environments become empty, and thus no second application of
the same functions can happen. The LTS thus becomes trivially small. Note that
no other up to technique is needed here. Hobbit applies up-to separation after
every opponent application transition and explores the configuration containing
the application expression and the smallest possible
Γ
; this does not lead to
false-negative (or false-positive) results.
Example 15.
This example is due to Bohr and Birkedal [
5
] and includes a non-
synchronised divergence.
M=fun f ->
ref l1 = false in ref l2 = false in
f (fun () -> if !l1 then _bot_else l2 := true);
if !l2 then _bot_else l1 := true
N=fun f -> f (fun () -> _bot_)
Note that
_bot_
is a diverging computation. This is a hard example to prove using
environmental bisimulation even with up to techniques, requiring quantification
over contexts within the proof. However, with up-to separation after the opponent
applies the initial functions, the
Γ
environments are emptied, thus leaving only
one application of
M
and
N
that needs to be explored by the bisimulation.
Applications of the inner function provided as argument to
f
only leads to a small
number of reachable configurations. Hobbit can indeed prove this equivalence.
Up to Proponent Function Re-entry The higher-order nature of
λimp
and
its LTS allows infinite nesting of opponent and proponent calls. Although up
to separation avoids those in a number of examples, here we present a second
novel up-to technique, which we call up to proponent function re-entry (or simply,
up to re-entry). This technique has connections to the induction hypothesis in
the definition of environmental bisimulations in [
16
]. However up to re-entry
is specifically aimed at avoiding nested calls to proponent functions, and it is
designed to work with our symbolic LTS. In combination with other techniques
this eliminates the need to consider configurations with unbounded stacks
K
in
many classical equivalences, including those in [21].
From Bounded Checking to Verification of Equivalence 187
UpToReentry
C1=A;Γ1;K1;s1;·⟩ R A;Γ2;K2;s2;·⟩ =C2
η , C, A, Γ
1, Γ
2, s
1, s
2.(app(i, _)∈ {η}and
A;Γ1;·;s1;·⟩ app(i,C)
η
→≍ A;Γ
1;·;s
1;·⟩ and
A;Γ2;·;s2;·⟩ app(i,C)
η
→≍ A;Γ
2;·;s
2;·⟩
implies Γ
1=Γ1and Γ
2=Γ2and s1=s
1and s2=s
2
C1
app(i,C)
η
app(i,C)
A;Γ1;K
1, K1;s1;e
1
C2
app(i,C)
η
app(i,C)
A;Γ2;K
2, K2;s2;e
2
A;Γ1;K
1, K1;s1;e
1reent(R)A;Γ2;K
2, K2;s2;e
2
Fig. 4. Up to Proponent Function Re-entry (omitting rules for -configurations).
Up to re-entry is realised by function
reent
in Fig. 4. The intuition of this up-to
technique is that if the application of related functions at
i
in the
Γ
environments
has no potential to change the local stores (up to garbage collection, encoded by
(
)) or increase the
Γ
environments, then there are no additional observations to
be made by nested calls to the
i
-functions, thus configurations reached by such
nested calls are added to the relation by this up-to technique. Soundness follows
similarly to up-to separation.
In Hobbit we require the user to flag the functions to be considered for the
up to re-entry technique. This annotation is later combined with state invariant
annotations, as they are often used together. Inequivalences found while using
the up to re-entry and state invariant annotations could be false-negatives due
to incorrect user annotations. Hobbit ensures that no such false-negatives are
reported by re-running discovered inequivalences with these two techniques off.
Below is an example where the state invariant needed is trivial and up to
separation together with up to re-entry are sufficient to prove the equivalence.
Example 16.
M=ref x=0in fun f -> f (); !x N=fun f -> f (); 0
This is like Ex. 2except the reference in
M
is created outside of the function
body. The LTS for this is as follows. Labels ⟨•; !x1are continuations.
Mx1: 0 x1: 0 x1: 0
⟨•; !x1
· · ·
x1: 0
⟨•; !x1
x1: 0
⟨•; !x1;⟨•; !x1
· · ·
· · ·
retgappg, f1app(f1,()) appg, f2
ret()
app(f2,())
ret()
appg, f3
Again, the opponent is allowed to reenter
g
as before. With up-to reentry, however,
the opponent skips nested calls to gas these do not modify the state.
188 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
Mx1: 0
s1
x1: 0 x1= 0
⟨•; !x1
x1: 0
⟨•; !x1
x1: 0
s1
retgappg, f1app(f1,()) ret()ret0
appg, f2
N
mirrors the above LTS without the
x1
reference and with continuation
⟨•
; 0
.
6 Symbolic First-Order Transitions
We extend
λimp
constants (
Const
) with a countable set of symbolic constants
ranged over by
κ
. We define symbolic environments
σ::= · |
(
κ e
)
, σ
, where
is either =or
=, and
e
is an arithmetic expression over constants, and interpret
them as conjunctions of (in-)equalities, with the empty set interpreted as .
Definition 17
(Satisfiability). Symbolic environment
σ
is satisfiable if there
exists an assignment
δ
, mapping the symbolic constants of
σ
to actual constants,
such that δσ is a tautology; we then write δσ.
We extend reduction configurations with a symbolic environment
σ
, written as
σ s;e
. These constants are implicitly annotated with their type. We modify
the reduction semantics from Fig. 2to consider symbolic constants:
σ s;op(c)σ(κ=op(c)) s;κif κfresh
σ s;if κthen e1else e2σ(κ=tt) s;e1if σ(κ=tt)is sat.
σ s;if κthen e1else e2σ(κ=) s;e2if σ(κ=)is sat.
All other reduction semantics rules carry the
σ
. The LTS from Sec. 4is modified
to operate over configurations of the form
σC
or
· ⟨⊥⟩
. We let
e
C
range over
both forms of configurations. All LTS rules for proponent transitions simply carry
the
σ
; rule
Tau
may increase
σ
due to the inner reduction. Opponent transitions
generate fresh symbolic constants, instead of actual constants: labels
app
(
i, D
[
α
])
and
ret
(
D
[
α
]) in rules
OpApp
and
OpRet
of Fig. 3, respectively, contain
D
with
symbolic, instead of concrete constants. We adapt (bi-)simulation as follows.
Definition 18.
Binary relation
R
on symbolic configurations is a weak simula-
tion when for all
e
C1R
e
C2and
e
C1
η1
e
C
1,
e
C
2such that
e
C2
η2
==
e
C
2and
e
C
1R
e
C
2(
e
C
1.σ,
e
C
2)is sat. δ. δ |= (
e
C
1.σ,
e
C
2) =δη1=δη2
Lemma 19. (σ1C1)
(σ2C2)iff for all δ|=σ1, σ2we have δC1
δC2.
Corollary 20 (Soundness, Completeness). (· C1)
(· C2)iff C1
C2.
The up-to techniques we have developed in previous sections apply unmodified to
the extended LTS as the techniques do not involve symbolic constants, with the
exception of up to beta which requires adapting the definition of a beta move to
consider all possible
δ
. The introduction of symbolic first-order transitions allows
us to prove many interesting first-order examples, such as the equivalence of
From Bounded Checking to Verification of Equivalence 189
bubble sort and insertion sort, an example borrowed from Hector [
12
] (omitted
here, see the Hobbit distribution). Below is a simpler example showing the
equivalence of two integer swap functions which, by leveraging Z3 [
23
], Hobbit
is able to prove.
Example 21.
M=let swap xy =
let (x,y) = xy
in (y, x)
in swap
N=fun xy -> let (x,y) = xy in
ref x=xin ref y=yin
x := !x - !y; y := !x + !y;
x := !y - !x; (!x, !y)
7 Up to State Invariants
The addition of symbolic constants into
λimp
and the LTS not only allows us to
consider all possible opponent-generated constants simultaneously in a symbolic
execution of proponent expressions, but also allows us to define an additional
powerful up-to technique: up to state invariants. We define this technique in two
parts: up to abstraction and up to tautology realised by abs and taut.5
UpToabs
(σ1C1)R(σ2C2)
(σ1C1)[c/κ]abs(R) (σ2C2)[c/κ]
UpTotaut
(σ1, σ
1C1)R(σ2, σ
2C2)
σ1, σ2, σ
1, σ
2is sat.
σ1, σ2 ¬(σ
1, σ
2)is not sat.
(σ1C1)taut(R) (σ2C2)
The first function
abs
allows us to derive the equivalence of configurations by
abstracting constants with fresh symbolic constants (of the same type) and
instead prove equivalent the more abstract configurations. The second function
taut
allows us to introduce tautologies into the symbolic environments. These
are predicates which are valid; i.e., they hold for all instantiations of the abstract
variables. Combining the two functions we can introduce a tautology
I
(
c
)into
the symbolic environments, and then abstract constants
c
from the predicate but
also from the configurations with symbolic ones, obtaining
I
(
κ
), which encodes
an invariant that always holds.
Currently in Hobbit, up to abstraction and tautology are combined and
applied in a principled way. Functions can be annotated with the following syntax:
F=fun x {κ |l1as C1[κ], ..., lnas Cn[κ]|φ} -> e
The annotation instructs Hobbit to use the two techniques when opponent
applies related functions where at least one of them has such an annotation. If
both functions contain annotations, then they are combined and the same
κ
are
used in both annotations. The techniques are used again when proponent returns
from the functions, and proponent calls opponent from within the functions.
6
As
discussed in Sec. 5, the same annotation enables up to reentry in Hobbit.
When Hobbit uses the above two up-to techniques it 1) pattern-matches
the values currently in each location
li
with the value context
Ci
where fresh
5
Hobbit also implements an up to
σ
-normalisation and garbage col lection technique.
6Finer-grain control of application of these up-to techniques is left to future work.
190 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
symbolic constants
κ
are in its holes, obtaining a substitution [
c/κ
];2) the up to
tautology technique is applied for the formula
φ
[
c/κ
]; and 3) the up to abstraction
technique is applied by replacing
φ
[
c/κ
]in the symbolic environment with
φ
, and
the contents of locations liwith Ci[κ].
Example 22.
Following is an example by Meyer and Sieber [
21
] featuring location
passing, adapted to λimp where locations are local.
M=let loc_eq loc1loc2 = [. . . ]in
fun q -> ref x=0in
let locx = (fun () -> !x) , (fun v -> x := v) in
let almostadd_2 locz {w | x as w | w mod 2 == 0} =
if loc_eq (locx,locz) then x := 1 else x:=!x+2
in q almostadd_2; if !x mod 2 = 0 then _bot_else ()
N=fun q -> _bot_
In this example we simulate general references as a pair of read-write functions.
Function
loc_eq
implements a standard location equality test. The two higher-
order expressions are equivalent because the opponent can only increase the
contents of
x
through the function
almostadd_2
. As the number of times the
opponent can call this function is unbounded, the LTS is infinite. However, the
annotation of function
almostadd_2
applies the up to state invariants technique
when the function is called (and, less crucially, when it returns), replacing the
concrete value of
x
with a symbolic integer constant
w
satisfying the invari-
ant
w mod 2 == 0
. This makes the LTS finite, up to permutations of symbolic
constants. Moreover, up to separation removes the outer functions from the
Γ
environments, thus preventing re-entrant calls to these functions. Note the up to
techniques are applied even though one of the configurations is diverging (
_bot_
).
This would not be possible with the LTS and bisimulation of [3].
8 Implementation and Evaluation
We implemented the LTS and up-to techniques for
λimp
in a tool prototype called
Hobbit, which we ran on a test-suite of 105 equivalences and 68 inequivalences—
3338 and 2263 lines of code for equivalences and inequivalences respectively.
Hobbit is bounded in the total number of function calls it explores per path.
We ran Hobbit with a default bound of 6 calls except where a larger bound was
found to prove or disprove equivalence—46 examples required a larger bound,
and the largest bound used was 348. To illustrate the impact of up-to techniques,
we checked all files (pairs of expressions to be checked for equivalence) in five
configurations: default (all up-to techniques on), up to separation off, annotations
(up to state invariants and re-entry) off, up to re-entry off, and everything off.
The tool stops at the first trace that disproves equivalence, after enumerating
all traces up to the bound, or after timing out at 150 seconds. Time taken and
exit status (equivalent, inequivalent, inconclusive) were recorded for each file; an
overview of the experiment can be seen in the following table. All experiments
ran on an Ubuntu 18.04 machine with 32GB RAM, Intel Core i7 1.90GHz CPU,
with intermediate calls to Z3 4.8.10 to prune invalid internal symbolic branching
From Bounded Checking to Verification of Equivalence 191
and decide symbolic bisimulation conditions. All constraints passed to Z3 are of
propositional satisfiability in conjunctive normal form (CNF).
default sep. off annot. off ree. off all off
eq. 72 |0 [5.6s] 32 |0 [1622.9s] 47 |0 [178.3s] 57 |0 [177.6s] 3 |0 [2098.5s]
ineq. 0|68 [20.0s] 0 |66 [312.8s] 0 |68 [19.6s] 0 |68 [20.1s] 0 |65 [515.7s]
a|b[c] for a(out of 105) equivalences and
b(out of 68) inequivalences reported taking cseconds in total.
We can observe that Hobbit was sound and bounded-complete for our
examples; no false reports and all inequivalences were identified. Up-to techniques
also had a significant impact on proving equivalence. With all techniques on, it
proved 68.6% of our equivalences; a dramatic improvement over 2.9% proven
with none on. The most significant technique was up-to separation—necessary
for 55.6% of equivalences proven and reducing time taken by 99.99%—which was
useful when functions could be independently explored by the context. Following
was annotations—necessary for 34.7% of equivalences and decreasing time by
96.9%—and up-to re-entry—20.8% of files and decreased time by 96.8%. Although
the latter two required manual annotation, they enabled equivalences where our
language was able to capture the proof conditions. Note that, since turning off
invariant annotations also turns off re-entry, only 10 files needed up-to re-entry on
top of invariant annotations. In contrast, inequivalences did not benefit as much.
This was expected as without up-to techniques Hobbit is still based on bounded
model checking, which is theoretically sound and complete for inequivalences, and
finds the shortest counterexample traces using breadth-first search. Nonetheless,
with up-to techniques turned off, inequivalences were discovered in 515
.
7
s
(vs. 20
s
with techniques on) and three files timed out, due to the techniques reducing the
size and branching factor of configurations. This suggests that the reduction in
state space is still relevant when searching for counterexamples.
9 Comparison with Existing Tools
There are two main classes of tools for contextual equivalence checking. The first
one includes semantics-driven tools that tackle higher-order languages with state
like ours. In this class belong game-based tools Hector [
12
] and Coneqct [
24
],
which can only address carefully crafted fragments of the language, delineated by
type restrictions and bounded data types. The most advanced tool in this class
is SyTeCi [
14
], which is based on logical relations and removes a good part of
the language restrictions needed in the previous tools. The second class concerns
tools that focus on first-order languages, typically variants of C, with main tools
including Rêve [
9
], SymDiff [
17
] and RVT [
11
]. These are highly optimised
for handling internal loops, a problem orthogonal to handling the interactions
between higher-order functions and their environment, addressed by Hobbit and
related tools. We believe the techniques used in these tools may be useful when
adapted to Hobbit, which we leave for future work.
In the higher-order contextual equivalence setting, the most relevant tool to
compare with Hobbit is SyTeCi. This is because SyTeCi supersedes previous
tools by proving examples with fewer syntactical limitations. We ran the tools on
192 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
examples from both SyTeCi’s and our own benchmarks—7 and 15 equivalences,
and 2 and 7 inequivalences from SyTeCi and Hobbit respectively—with a
timeout of 150s and using Z3. Unfortunately, due to differences in parsing
and SyTeCi’s syntactical restrictions, the input languages were not entirely
compatible and only few manually translated programs were chosen.
SyTeCi Hobbit
SyTeCi eq. examples 3|0|4 (0.03s) 1 |0|6 (<0.01s)
Hobbit eq. examples 8|0|7 (0.4s) 15 |0|0 (<0.01s)
SyTeCi ineq. examples 0|2|0 (0.06s) 0 |2|0 (0.02s)
Hobbit ineq. examples 2|3|2 (0.52s) 0 |7|0 (0.45s)
a|b|c(d) for aeq’s, bineq’s and cinconclusive’s reported taking dsec in total
We were unable to translate many of our examples because of restrictions
in the input syntax supported by SyTeCi. Some of these restrictions were
inessential (e.g. absence of tuples) while others were substantial: the tool does not
support programs where references are allocated both inside and outside functions
(e.g. Ex. 15), or with non-synchroniseable recursive calls. Moreover, SyTeCi relies
on Constrained Horn Clause satisfiability which is undecidable. In our testing
SyTeCi sometimes timed out on examples; in private correspondence with its
creator this was attributed to Z3’s ability to solve Constrained Horn Clauses.
Finally, SyTeCi was sound for equivalences, but not always for inequivalences as
can be seen in the table above; the reason is unclear and may be due to bugs. On
the other hand, SyTeCi was able to solve equivalences we are not able to handle;
e.g. synchronisable recursive calls and examples with well-bracketing properties.
10 Conclusion
Our experience with Hobbit suggests that our technique provides a significant
contribution to verification of contextual equivalence. In the higher-order case,
Hobbit does not impose language restrictions as present in other tools. Our
tool is able to solve several examples that can not be solved by SyTeCi, which
is the most advanced tool in this family. In the first-order case, the problem of
contextual equivalence differs significantly as the interactions that a first-order
expression can have with its context are limited; e.g. equivalence analyses do not
need to consider callbacks or re-entrant calls. Moreover, the distinction between
global and local state is only meaningful in higher-order languages where a
program phrase can invoke different calls of the same function, each with its own
state. Therefore, tools for first-order languages focus on what in our setting are
internal transitions and the complexities arising from e.g. unbounded datatypes
and recursion, whereas we focus on external interactions with the context.
As for limitations, Hobbit does not handle synchronised internal recursion
and well-bracketed state, which SyTeCi can often solve. More generally, Hobbit
is not optimised for internal recursion as first-order tools are. In this work we
have also disallowed reference types in
λimp
to simplify the technical development;
location exchange is encoded via function exchange (cf. Ex. 22). We intend to
address these limitations in future work and explore applications of Hobbit to
real-world examples.
From Bounded Checking to Verification of Equivalence 193
References
1.
Ahmed, A., Dreyer, D., Rossberg, A.: State-dependent representation independence.
In: POPL. Association for Computing Machinery (2009)
2.
Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs.
In: TACAS. Springer Berlin Heidelberg (1999)
3.
Biernacki, D., Lenglet, S., Polesiuk, P.: A complete normal-form bisimilarity for
state. In: FOSSACS 2019, ETAPS 2019, Prague, Czech Republic. Springer (2019)
4.
Blanchet, B.: A computationally sound mechanized prover for security protocols.
In: IEEE Symposium on Security and Privacy (2006)
5.
Bohr, N., Birkedal, L.: Relational reasoning for recursive types and references. In:
Kobayashi, N. (ed.) APLAS. LNCS, vol. 4279, pp. 79–96. Springer (2006)
6.
Clarke, E., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs. In:
TACAS. Springer Berlin Heidelberg (2004)
7.
Cordeiro, L., Kroening, D., Schrammel, P.: JBMC: Bounded model checking for
Java Bytecode. In: TACAS. Springer (2019)
8.
Dimovski, A.: Program verification using symbolic game semantics. TCS 560 (2014)
9.
Felsing, D., Grebing, S., Klebanov, V., Rümmer, P., Ulbrich, M.: Automating
regression verification. In: ACM/IEEE ASE ’14. ACM (2014)
10.
Godlin, B., Strichman, O.: Inference rules for proving the equivalence of recursive
procedures. Acta Informatica 45(6) (2008)
11. Godlin, B., Strichman, O.: Regression verification. In: DAC. ACM (2009)
12.
Hopkins, D., Murawski, A.S., Ong, C.L.: Hector: An equivalence checker for a
higher-order fragment of ML. In: CAV. LNCS, Springer (2012)
13.
Hur, C.K., Dreyer, D., Neis, G., Vafeiadis, V.: The marriage of bisimulations and
Kripke logical relations. SIGPLAN Not. (2012)
14.
Jaber, G.: SyTeCi: Automating contextual equivalence for higher-order programs
with references. Proc. ACM Program. Lang. 4(POPL) (2020)
15.
Jaber, G., Tabareau, N.: Kripke open bisimulation - A marriage of game semantics
and operational techniques. In: APLAS. Springer (2015)
16.
Koutavas, V., Wand, M.: Small bisimulations for reasoning about higher-order
imperative programs. In: POPL. ACM (2006)
17.
Lahiri, S.K., Hawblitzel, C., Kawaguchi, M., Rebêlo, H.: SYMDIFF: A language-
agnostic semantic diff tool for imperative programs. In: CAV. Springer (2012)
18.
Laird, J.: A fully abstract trace semantics for general references. In: ICALP, Wroclaw,
Poland. LNCS, Springer (2007)
19.
Lassen, S.B., Levy, P.B.: Typed normal form bisimulation. In: Computer Science
Logic. Springer Berlin Heidelberg (2007)
20.
Lin, Y., Tzevelekos, N.: Symbolic execution game semantics. In: FSCD. Schloss
Dagstuhl - Leibniz-Zentrum für Informatik (2020)
21.
Meyer, A.R., Sieber, K.: Towards fully abstract semantics for local variables. In:
POPL. Association for Computing Machinery (1988)
22.
Morris, Jr., J.H.: Lambda Calculus Models of Programming Languages. Ph.D.
thesis, MIT, Cambridge, MA (1968)
23.
de Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems.
pp. 337–340. Springer Berlin Heidelberg, Berlin, Heidelberg (2008)
24.
Murawski, A.S., Ramsay, S.J., Tzevelekos, N.: A contextual equivalence checker for
IMJ*. In: ATVA. Springer (2015)
25. Murawski, A.S., Tzevelekos, N.: Nominal game semantics. FTPL 2(4) (2016)
194 V. Koutavas, Y.-Y. Lin, N. Tzevelekos
26.
Patterson, D., Ahmed, A.: The next 700 compiler correctness theorems (functional
pearl). Proc. ACM Program. Lang. 3(ICFP) (2019)
27. Pous, D.: Coinduction all the way up. In: ACM/IEEE LICS. ACM (2016)
28.
Pous, D., Sangiorgi, D.: Enhancements of the bisimulation proof method. In:
Advanced Topics in Bisimulation and Coinduction. CUP (2012)
29.
Sangiorgi, D., Kobayashi, N., Sumii, E.: Environmental bisimulations for higher-
order languages. In: LICS. IEEE Computer Society (2007)
30.
Schrammel, P., Kroening, D., Brain, M., Martins, R., Teige, T., Bienmüller, T.:
Successful use of incremental BMC in the automotive industry. In: FMICS (2015)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
From Bounded Checking to Verification of Equivalence 195
Equivalence Checking for Orthocomplemented
Bisemilattices in Log-Linear Time
Simon Guilloud() and Viktor Kunčak
EPFL IC LARA, Station 14, CH-1015 Lausanne, Switzerland
{Simon.Guilloud,Viktor.Kuncak}@epfl.ch
Abstract. Motivated by proof checking, we consider the problem of efficiently
establishing equivalence of propositional formulas by relaxing the completeness
requirements while still providing certain guarantees. We present a quasilinear
time algorithm to decide the word problem on a natural algebraic structures we call
orthocomplemented bisemilattices, a subtheory of Boolean algebra. The starting
point for our procedure is a variation of Aho, Hopcroft, Ullman algorithm for
isomorphism of trees, which we generalize to directed acyclic graphs. We combine
this algorithm with a term rewriting system we introduce to decide equivalence of
terms. We prove that our rewriting system is terminating and confluent, implying
the existence of a normal form. We then show that our algorithm computes this
normal form in log linear (and thus sub-quadratic) time. We provide pseudocode
and a minimal working implementation in Scala.
1 Introduction
Reasoning about propositional logic and its extensions is a basis of many verification
algorithms [19]. Propositional variables may correspond to, for example, sub-formulas
in first-order logic theories of SMT solvers [2,5,26], hypotheses and lemmas inside proof
assistants [13,27,32], or abstractions of sets of states. In particular, it is often of interest
to establish that two propositional formulas are equivalent. The equivalence problem
for propositional logic is coNP-complete as a negation of propositional satisfiability [8].
From proof complexity point of view [18] many known proof systems, including (non-
extended) resolution [31] and cutting planes [29] have exponential-sized shortest proofs
for certain propositional formulas. SAT and SMT solvers rely on DPLL-style algorithms
[9,10] and do not have polynomial run-time guarantees on equivalence checking, even if
formulas are syntactically close. Proof assistants implement such algorithms as tactics,
so they have similar difficulties. A consequence of this is that implemented systems may
take a very long time (or fail to acknowledge) that a large formula is equivalent to its
minor variant differing in, for example, reordering of internal conjuncts or disjuncts.
Similar situations also arise in program verifiers [12,21,30,34,35], where assertions act
as lemmas in a proof.
We acknowledge the financial support of the Swiss National Science Foundation project
200021_197288 A Foundational Verifier”.
©The Author(s) 2022
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 196–214, 2022.
https://doi.org/10.1007/978-3-030-99527-0_11
It is thus natural to ask for an approximation of the propositional equivalence prob-
lem: can we find an expressive theory supporting many of the algebraic laws of Boolean
algebra but for which we can still have a complete and efficient algorithm for formula
equivalence? By efficient, we mean about as fast, up to logarithmic factors, as the simple
linear-time syntactic comparison of formula trees.
We can use such an efficient equivalence algorithm to construct more flexible proof
systems. Consider any sound proof system for propositional logic and replace the notion
of identical sub-formulas with our notion of fast equivalence. For example, the axiom
schema 𝑝(𝑞𝑝)becomes 𝑝(𝑞𝑝)for all equivalent 𝑝and 𝑝. The new system
remains sound. It accepts all the previously admissible inference steps, but also some
new ones, which makes it more flexible.
L1: 𝑥⊔𝑦=𝑦⊔𝑥 L1’: 𝑥𝑦=𝑦𝑥
L2: 𝑥 (𝑦⊔𝑧)=(𝑥⊔𝑦) 𝑧 L2’: 𝑥 (𝑦𝑧)=(𝑥𝑦) 𝑧
L3: 𝑥⊔𝑥=𝑥L3’: 𝑥𝑥=𝑥
L4: 𝑥 1=1 L4’: 𝑥 0 = 0
L5: 𝑥 0 = 𝑥L5’: 𝑥 1 = 𝑥
L6: ¬¬𝑥=𝑥L6’: same as L6
L7: 𝑥 ¬𝑥= 1 L7’: 𝑥 ¬𝑥= 0
L8: ¬(𝑥⊔𝑦)=¬𝑥 ¬𝑦L8’: ¬(𝑥𝑦)=¬𝑥 ¬𝑦
Table 1. Laws of an algebraic structures (𝑆, , ⊔, 0,1,¬). Our algorithm is complete (and log-
linear time) for structures that satisfy laws L1-L8 and L1’-L8’. We call these structures orthocom-
plemented bisemilattices (OCBSL).
L9: 𝑥 (𝑥𝑦) = 𝑥L9’: 𝑥 (𝑥⊔𝑦) = 𝑥
L10: 𝑥 (𝑦𝑧)=(𝑥⊔𝑦)∧(𝑥⊔𝑧)L10’: 𝑥 (𝑦⊔𝑧)=(𝑥𝑦)(𝑥𝑧)
Table 2. Neither the absorption law L9,L9’ nor distributivity L10,L10’ hold in OCBSL. Without
L9,L9’, the operations and induce different partial orders. If an OCBSL satisfies L10,L10’
then it also satisfies L9,L9’ and is precisely a Boolean algebra.
1.1 Problem Statement
This paper proposes to approximate propositional formula equivalence using a new al-
gorithm that solves exactly the word problem for structures we call orthocomplemented
bisemilattices (axiomatized in Table 1), in only log-linear time. In general, the word
problem for an algebraic theory with signature 𝑆and axioms 𝐴is the problem of de-
termining, given two terms 𝑡1and 𝑡2in the language of 𝑆with free variables, whether
𝑡1=𝑡2is a consequence of the axioms. Our main interest in the problem is that ortho-
complemented bisemilattices (OCBSL) are a generalisation of Boolean algebra. This
structure satisfies a weaker set of axioms that omits the distributivity law as well as its
weaker variant, the absorption law (Table 2). Hence, this problem is a relaxation “up
to distributivity” of the propositional formula equivalence. A positive answer implies
formulas are equivalent in all Boolean algebras, hence also in propositional logic.
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 197
Definition 1 (Word Problem for Orthocomplemented Bisemilattices). Consider the
signature with two binary operations , , unary operation ¬and constants, 0,1. The
OCBSL-word problem is the problem of determining, given two terms 𝑡1and 𝑡2in this
signature, containing free variables, whether 𝑡1=𝑡2is a consequence (in the sense
of first-order logic with equality) of the universally quantified axioms L1-L8,L1’-L8’ in
Table 1.
Contribution. We present an (𝑛log2(𝑛)) algorithm for the word problem of orthocom-
plemented lattices. In the process, we introduce a confluent and terminating rewriting
system for OCBSL on terms modulo commutativity. We analyze the algorithm to show
its correctness and complexity. We present its executable description and a Scala imple-
mentation at https://github.com/epfl-lara/OCBSL.
1.2 Related Work
The word problem on lattices has been studied in the past. The structure we consider
is, in general, not a lattice. Whitman [33] showed decidability of the word problem on
free lattices, essentially by showing that the natural order relation on lattices between two
words can be decided by an exhaustive search. The word problem on orthocomplemented
lattices has been solved typically by defining a suitable sequent calculus for the order
relation with a cut rule for transitivity [4,17]. Because a cut elimination theorem can be
proved similarly to the original from Gentzen [11], the proof space is finite and a proof
search procedure can decide validity of the implication in the logic, which translates to
the original word problem.
The word problem for free lattices was shown to be in PTIME by Hunt et al. [15]
and the word problem for orthocomplemented lattices was shown to be in PTIME by
Meinander [25]. Those algorithms essentially rely on similar proof-search methods as
the previous ones, but bound the search space. These results make no mention of a spe-
cific degree of the polynomial; our analysis suggest that, as described, these algorithms
run in (𝑛4). Related techniques of locality have been applied more broadly and also
yield polynomial bounds, with the specific exponents depending on local Horn clauses
that axiomatize the theory [3,24].
Aside from the use in equivalence checking, the problem is additionally of indepen-
dent interest because OCBSL are a natural weakening of Boolean Algebra and ortho-
complemented lattices. They are dual to complemented lattices in the sense illustrated
by Figure 1. A slight weakening of OCBSL, called de Morgan bisemilattice, has been
used to simulate electronic circuits [6,22]. OCBSL may be applicable in this scenario
as well. Moreover, our algorithm can also be adapted to decide, in log-linear time, the
word problem for this weaker theory.
To the best of our knowledge, no solution was presented in the past for the word
problem for orthocomplemented bisemilattices (OCBSL). Moreover, we are not aware
of previous log-linear algorithms for the related previously studied theories either.
1.3 Overview of the Algorithm
It is common to represent a term, like a Boolean formula, as an abstract syntax tree.
In such a tree, a node corresponds to either a function symbol, a constant symbol or a
198 S. Guilloud and V. Kunˇcak
variable, and the children of a function node represent the arguments of the function. In
general, for a symbol function 𝑓, trees 𝑓(𝑥, 𝑦)and 𝑓(𝑦, 𝑥)are distinct; the children of a
node are stored in a specific order. Commutativity of a function symbol 𝑓corresponds to
the fact that children of a node labelled by 𝑓are instead unordered. Our algorithm thus
uses as its starting point a variation of the algorithm of Aho, Hopcroft, and Ullman [14]
for tree isomorphism, as it corresponds to deciding equality of two terms modulo com-
mutativity. However, the theory we consider contains many more axioms than merely
commutativity. Our approach is to find an equivalent set of reduction rules, themselves
understood modulo commutativity, that is suitable to compute a normal form of a given
formula with respect to those axioms using the ideas of term rewriting [1]. The interest
of tree isomorphism in our approach is two-fold: first, it helps to find application cases
of our reduction rules, and second, it compares the two terms of our word problem. In
the final algorithm, both aspects are realized simultaneously.
𝑎𝑏
𝑎⊑𝑏 ¬𝑏¬𝑎
¬𝑏 ¬𝑎
𝑎𝑏
𝑎⊑𝑏 ¬𝑏¬𝑎
¬𝑏 ¬𝑎
𝑎𝑏
𝑎⊑𝑏 ¬𝑏¬𝑎
¬𝑏 ¬𝑎
(a) Complemented lattice (b) Orthocomplemented bisemilattice(c) Orthocomplemented lattice
Fig. 1. Bisemilattices satisfying absorption or de Morgan laws.
2 Preliminaries
2.1 Lattices and Bisemilattices
To define and situate our problem, we present a collection of algebraic structures satis-
fying certain subsets of the laws in tables 1and 2.
A structure (𝑆, ∧) that is associative (L1), commutative (L2) and idempotent (L3) is
asemilattice. A semilattice induces a partial order relation on 𝑆defined by 𝑎𝑏
(𝑎𝑏) = 𝑎. Indeed, one can verify that 𝑐.(𝑏𝑐) = 𝑎(𝑏𝑎) = 𝑎, from which tran-
sitivity follows. Antisymetry is immediate. In such partially ordered set (poset) 𝑆, two
elements 𝑎and 𝑏always have a greatest lower bound, or 𝑔𝑙𝑏,𝑎𝑏. Conversely, a poset
such that any two elements have a 𝑔𝑙𝑏 is always a semilattice. A structure (𝑆, ,0,1) that
satisfies L1, L2, L3, L4, and L5 is a bounded upper-semilattice. Equivalently, 1is the
maximum element and 0the minimum element in the corresponding poset. Similarly,
a structure (𝑆, ⊔, 0,1) that satisfies L1’ to L5’ is a bounded lower-semilattice. In that
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 199
case, we write the corresponding ordering relation . Note that it points in the direc-
tion opposite to , so that 1 is always the “maximum” element and 0 the “minimum”
element. A structure (𝑆, , )is a bisemilattice if (𝑆, ∧) is an upper semilattice and
(𝑆, )a lower semilattice. There are in general no specific laws relating the two semi-
lattices of a bisemilattice. They can be the same semilattice or completely different. If
the bisemilattice satisfies the absorption law (L9), then the two semilattices are related
in such a way that 𝑎𝑏𝑎 𝑏, i.e. the two orders and are equal and the
structure is called a lattice. A bisemilattice is consistently bounded if both semilattices
are bounded and if 0= 0= 0 and 1= 1= 1, which will be the case in this
paper. A structure (𝑆, , ⊔, ¬,0,1) that satisfies L1 to L7 and L1’ to L7’ is called a com-
plemented bisemilattice, with complement operation ¬. A complemented bisemilattice
satisfying de Morgan’s Law (L8 and L8’) is an orthocomplemented bisemilattice and
implies ¬0 = ¬(¬1 0) = ¬¬1 ¬0 = 1. A structure satisfying L1-L9 and L1’-L9’ is an
orthocomplemented lattice. Both de Morgan laws (L8, L8’) and absorption laws (L9
and L9’) relate the two semilattices, in a way summarised in Figure 1. In bisemilattices,
orthocomplementation is (merely) equivalent to 𝑎𝑏¬𝑏 ¬𝑎. Indeed, we have:
𝑎𝑏𝑑𝑒𝑓
𝑎𝑏=𝑎𝐿8
¬𝑎 ¬𝑏= ¬𝑎𝑑𝑒𝑓
¬𝑏 ¬𝑎
In the presence of L1-L8,L1’-L8’, the law of absorption (L9 and L9’) is implied
by distributivity. In fact, an orthocomplemented bisemilattice with distributivity is a
lattice and even a Boolean algebra. In this sense, we can consider orthocomplemented
bisemilattices as “Boolean algebra without distributivity”.
2.2 Term Rewriting Systems
We next review basics of term rewriting systems. For a more complete treatment, see [1].
Definition 2. Aterm rewriting system is a list of rewriting rules of the form 𝑒𝑙=𝑒𝑟
with the meaning that the occurence of 𝑒𝑙in a term 𝑡can be replaced by 𝑒𝑟.𝑒𝑙and 𝑒𝑟
can contain free variables. To apply the rule, 𝑒𝑙is unified with a subterm of 𝑡, and that
subterm is replaced by 𝑒𝑟with the same unifier. If applying a rewriting rule to 𝑡1yields
𝑡2, we say that 𝑡1reduces to 𝑡2and write 𝑡1𝑡2. We denote by
the transitive closure
of and by
its transitive symmetric closure.
An axiomatic system such as L1-L9, L1’-L9’ induces a term rewriting system, inter-
preting equalities from left to right. In that case 𝑡1
𝑡2coincides with the validity of
the equality 𝑡1=𝑡2in the theory given by the axioms [1, Theorem 3.1.12].
Definition 3. A term rewriting system is terminating if there exists no infinite chain of
reducing terms 𝑡1𝑡2𝑡3....
Fact 1 If there is a well-founded order <(or, in particular, a measure 𝑚) on terms such
that 𝑡1𝑡2𝑡2< 𝑡1(or, in particular 𝑚(𝑡2)< 𝑚(𝑡1)) then the term rewriting
system is terminating.
cak200 S. Guilloud and V. Kunˇ
Definition 4. A term rewriting system is confluent iff: for all 𝑡1, 𝑡2, 𝑡3,𝑡1
𝑡2𝑡1𝑡3
implies 𝑡4.𝑡2
𝑡4𝑡3
𝑡4.
Theorem 1 (Church-Rosser Property ). [1, Chapter 2] A term rewriting system is
confluent if and only if 𝑡1, 𝑡2.(𝑡1
𝑡2)(∃𝑡3.𝑡1
𝑡3𝑡2
𝑡3).
A terminating and confluent term rewriting system directly implies decidability of
the word problem for the underlying structure, as it makes it possible to compute the
normal form of two terms to check if they are equivalent. Note that commutativity is not
a terminating rewriting rule, but similar results holds if we consider the set of all terms,
as well as rewrite rules, modulo commutativity [1, Chapter 11], [28]. To efficiently ma-
nipulate terms modulo commutativity and achieve log-linear time, we will employ an
algorithm for comparing trees with unordered children.
3 Directed Acyclic Graph Equivalence
The structure of formulas with commutative nodes correspond to the usual mathematical
definition of a labelled rooted tree, i.e. an acyclic graph with one distinguished vertex
(root) where there is no order on the children of a node. For this reason, we use as our
starting point the algorithm of Hopcroft, Ullman and Aho for tree isomorphism [14, Page
84, Example 3.2], which has also been studied subsequently [7,23].
To account for structure sharing, we further generalize this representation to singly-
rooted, labeled, Directed Acyclic Graphs, which we simply call DAGs. Our DAGs gener-
alize rooted directed trees. Any DAG can be transformed into a rooted tree by duplicating
subgraphs corresponding to nodes with multiple parent, as in Figure 2. This transforma-
tion in general results in an exponential blowup in the number of nodes. Dually, using
DAGs instead of trees can exponentially shrink space needed to represent certain terms.
Fig. 2. A DAG and the corresponding Tree
Fig. 3. Two equivalent DAGs with different
number of nodes.
Checking for equality between ordered trees or DAGs is easy in linear time: we
simply recursively check equality between the children of two nodes.
Definition 5. Two ordered nodes 𝜏and 𝜋with children 𝜏0, ..., 𝜏𝑚and 𝜋0, ..., 𝜋𝑛are
equivalent (noted 𝜏𝜋) iff
𝑙𝑎𝑏𝑒𝑙(𝜏) = 𝑙𝑎𝑏𝑒𝑙(𝜋),𝑚=𝑛and 𝑖 < 𝑛, 𝜏𝑖𝜋𝑖
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 201
For unordered trees or DAG, the equivalence checking is less trivial, as the naive al-
gorithm has exponential complexity due to the need to find the adequate permutation.
Definition 6. Two unordered nodes 𝜏and 𝜋with children 𝜏0, ..., 𝜏𝑚and 𝜋0, ..., 𝜋𝑛are
equivalent (noted 𝜏𝜋) iff
𝑙𝑎𝑏𝑒𝑙(𝜏) = 𝑙𝑎𝑏𝑒𝑙(𝜋),𝑚=𝑛and there exists a permutation 𝑝s.t. 𝑖 < 𝑛, 𝜏𝑝(𝑖)𝜋𝑖
For trees, note that this definition of equivalence corresponds exactly to isomor-
phism. It is known that DAG-isomorphism is GI-complete, so it is conjectured to have
complexity greater than PTIME. Fortunately, this does not prevent our solution because
our notion of equivalence on DAGs is not the same as isomorphism on DAGs. In partic-
ular, two DAGs can be equivalent without having the same number of nodes, i.e. without
being isomorphic, as Figure 3illustrates.
Algorithm 1: Unordered DAG equivalence. The operator ++ is concatenation.
input : two unordered DAGs 𝜏and 𝜋
output: True if 𝜏and 𝜋are equivalent, False else.
1codes HashMap[(String, List[Int]), Int];
2map HashMap[Node, Int];
3𝑠𝜏𝐿𝑖𝑠𝑡 ReverseTopologicalOrder(𝜏);
4𝑠𝜋𝐿𝑖𝑠𝑡 ReverseTopologicalOrder(𝜋);
5for (𝑛:Node in 𝑠𝜏++𝑠𝜋)do
6𝑙𝑛[map(𝑐)for 𝑐in children(𝑛)];
7𝑟𝑛(label(𝑛),sort(𝑙𝑛));
8if codes contains 𝑟then
9map(𝑛)codes(𝑟𝑛);
10 else
11 codes(𝑟𝑛)codes.size;
12 map(𝑛)codes(𝑟𝑛);
13 end
14 end
15 return 𝑚𝑎𝑝(𝜏) == 𝑚𝑎𝑝(𝜋)
Algorithm 1is the generalization of Hopcroft, Ullman and Aho’s algorithm. It de-
cides in log-linear time if two labelled (unordered) DAGs are equivalent according to
definition 5. The algorithm generalizes straightforwardly to DAGs with a mix of ordered
and unordered nodes: if a node is ordered, we skip the sorting operation in line 7.
The algorithm works bottom to top. We first sort the DAG in reverse topological
order using, for example, Kahn’s algorithm [16]. This way, we explore the DAG starting
from a leaf and finishing with the root. It is guaranteed that when we treat a node, all its
children have already been treated.
The algorithm recursively assigns codes to the nodes of both DAGs recursively. In
the unlabelled case:
cak202 S. Guilloud and V. Kunˇ
The first node, necessarily a leaf, is assigned the integer 0
The second node gets assigned 0if it is a leaf or 1if it is a parent of the first node
For any node, the algorithm makes a list of the integer assigned to that node’s chil-
dren and sort it (if the node is commutative). We call this the signature of the node.
Then it checks if that list has already been seen. If yes, it assigns to the node the
number that has been given to other nodes with the same signature. Otherwise, it
assigns a new integer to that node and its signature.
Lemma 1 (Algorithm 1Correctness). The codes assigned to any two nodes 𝑛and 𝑚
of 𝑠𝜏++𝑠𝜋are equal if and only if 𝑛𝑚.
Proof. Let 𝑛and 𝑚denote any two DAG nodes. By induction on the height of 𝑛:
In the case where 𝑛is a leaf, we have 𝑟𝑛= (𝑙𝑎𝑏𝑒𝑙(𝑛), 𝑁𝑖𝑙). Note that for any node
𝑛,𝑚𝑎𝑝(𝑛) = codes(𝑟𝑛). Since every time the map codes is updated, it is with a
completely new number, codes(𝑟𝑛) = codes(𝑟𝑚)if and only if 𝑟𝑛=𝑟𝑚, i.e. iff
𝑙𝑎𝑏𝑒𝑙(𝑚) = 𝑙𝑎𝑏𝑒𝑙(𝑛)and 𝑚has no children (like 𝑛).
In the case where 𝑛has children 𝑛𝑖, again codes(𝑟𝑛) = codes(𝑟𝑚)if and only if
𝑟𝑚=𝑟𝑛, which is equivalent to (𝑙𝑎𝑏𝑒𝑙(𝑚) = 𝑙𝑎𝑏𝑒𝑙(𝑛)and 𝑠𝑜𝑟𝑡(𝑙𝑚) = 𝑠𝑜𝑟𝑡(𝑙𝑛). This
means this means there is a permutation of children of 𝑛such that 𝑖, codes(𝑛𝑝(𝑖)) =
codes(𝑚𝑖). By induction hypothesis, this is equivalent to 𝑖, 𝑛𝑝(𝑖)𝑚𝑖. Hence we
find that 𝑚𝑎𝑝(𝑛) = 𝑚𝑎𝑝(𝑚)if and only if both:
1. Their labels are equal
2. There exist a permutation 𝑝s.t. 𝑛𝑝(𝑖)𝑚𝑖
i.e 𝑛and 𝑚have the same code if and only if 𝑛𝑚.
Corollary 1. The algorithm returns True if and only if 𝜏𝜋.
Time Complexity. Using Kahn’s algorithm, sorting 𝜏and 𝜋is done in linear time. Then
the loop touches every node a single time. Inside the loop, the first line takes linear time
with respect to the number of children of the node and the second line takes log-linear
time with respect to the number of children. Since we use HashMaps, the last instructions
take effectively constant time (because hash code is computed from the address of the
node and not its content).
So for general DAG, the algorithm runs in time at most log-quadratic in the number
of nodes. Note however that for DAGs with bounded number of children per node as well
as for DAGs with bounded number of parents per nodes, the algorithm is log-linear. In
fact, the algorithm is log-linear with respect to the total number of edges in the graph.
For this reason, the algorithm is still only log-linear in input size. It also follows that
the algorithm is always at most log-linear with respect to the tree or formula underlying
the DAG, which may be much larger than the DAG itself. Moreover, there exists cases
where the algorithm is log-linear in the number of nodes, but the underlying tree is
exponentially larger. The full binary symmetric graph is such an example.
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 203
4 Word Problem on Orthocomplemented Bisemilattices
We will use the previous algorithm for DAG equivalence applied to a formula in the
language of bisemilattices (𝑆, , )to account for commutativity (axioms L1, L1’), but
we need to combine it with the remaining axioms. From now on we work with axioms
L1-L8, L1’-L8’ in Table 1. The plan is to express those axioms as reduction rules. Of
rules L2-L8 and L2’-L8’, all but L8 and L8’ reduce the size of the term when applied
from left to right, and hence seem suitable as rewrite rules.
It may seem that the simplest way to deal with de Morgan law is to use it (along
with double negation elimination) to transform all terms into negation normal form. It
happens, however, that doing this causes troubles when trying to detect application cases
of rule L7 (complementation). Indeed, consider the following term:
𝑓= (𝑎𝑏)¬(𝑎𝑏)
Using complementation it clearly reduces to 1, but pushing into negation-normal form,
it would first be transformed to (𝑎𝑏)𝑎 ¬𝑏). To detect that these two disjuncts
are actually opposite requires to recursivly verify that ¬(𝑎𝑏) = 𝑎 ¬𝑏).
It is actually simpler to apply de Morgan law the following way:
𝑥𝑦= ¬(¬𝑥 ¬𝑦)
Instead of removing negations from the formula, we remove one of the binary semilattice
operators. (Which one we keep is arbitrary; we chose to keep .) Now, when we look if
rule L7 can be applied to a disjunction node (i.e. two children 𝑦and 𝑧such that 𝑦= ¬𝑧),
there are two cases: if 𝑥is not itself a negation, i.e. it starts with , we compute ¬𝑥code
from the code of 𝑥in constant time. If 𝑥= ¬𝑥then ¬𝑥𝑥so the code of ¬𝑥is simply
the code of 𝑥, in constant time as well. Hence we obtain the code of all children and
their negation and we can sort those codes to look for collisions, all of it in time linear
in the number of children.
We now restate the axioms L1-L8 ,L1’-L8’ in this updated language in Table 3.
𝐴1 (..., 𝑥𝑖, 𝑥𝑗, ...) = (..., 𝑥𝑗, 𝑥𝑖, ...)𝐴1 ¬ 𝑥, ¬𝑦)=¬𝑦, ¬𝑥)
𝐴2 (𝑥, (𝑦)) = (𝑥, 𝑦)𝐴2 ¬ 𝑥, ¬¬ 𝑦)) = ¬𝑥, ¬𝑦)
(𝑥) = 𝑥
𝐴3 (𝑥, 𝑥, 𝑦) = (𝑥, 𝑦)𝐴3 ¬ 𝑥, ¬𝑥, ¬𝑦)=¬𝑥, ¬𝑦)
𝐴4 (1, 𝑥)=1 𝐴4 ¬ (¬0,¬𝑦)=0
𝐴5 (0, 𝑥) = (𝑥)𝐴5 ¬ (¬1,¬𝑥)=¬𝑥)
𝐴6 ¬¬𝑥=𝑥
𝐴7 (𝑥, ¬𝑥, 𝑦) = 1 𝐴7 ¬ 𝑥, ¬¬𝑥, ¬𝑦)=0
𝐴8 ¬ (𝑥1, ...𝑥𝑖)=¬(¬¬𝑥1, ...¬¬𝑥𝑖)𝐴8 ¬¬ 𝑥1, ...¬𝑥𝑖) = 𝑥1, ...¬𝑥𝑖)
Table 3. Laws of algebraic structures (𝑆, ⊔, 0,1,¬), equivalent to L1-L8, L1-L8’ under de Morgan
transformation.
It is straightforward and not surprising that axiom A8 as well as A1’-A8’ all follow
from axioms A1-A7, so A1-A7 are actually complete for our theory.
cak204 S. Guilloud and V. Kunˇ
4.1 Confluence of the Rewriting System
In our equivalence algorithm, A1 is taken care of by the arbitrary but consistent ordering
of the nodes. Axioms A2-A7 form a term rewriting system. Since all those rules reduce
the size of the term, the system is terminating in a number of steps linear in the size of
the term. We will next show that it is confluent. We will thus obtain the existence of
a normal form for every term, and will finally show how our algorithm computes that
normal form.
Definition 7. Consider a pair of reduction rules 𝑙0𝑟0and 𝑙1𝑟1with disjoint sets
of free variables such that 𝑙0=𝐷[𝑠],𝑠is not a variable and 𝜎is the most general unifier
of 𝜎𝑠 =𝜎𝑙1. Then (𝜎𝑟0,(𝜎 𝐷)[𝜎𝑟2]) is called a critical pair.
Informally, a critical pair is a most general pair of term (with respect to unification)
(𝑡1, 𝑡2)such that for some 𝑡0,𝑡0𝑡1and 𝑡0𝑡2via two “overlapping” rules. They are
found by matching the left-hand side of a rule with a non-variable subterm of the same
or another rule.
Example 1 (Critical Pairs).
1. Matching left-hand side of A6 with the subterm ¬𝑥of rule A7, we obtain the pair
(1,𝑥, 𝑥, 𝑦))
which arises from reducing the term 𝑡0=𝑥, ¬¬𝑥, 𝑦)in two different ways.
2. Matching left-hand sides of A2 and A7 gives
((𝑥, 𝑦, ¬(𝑦)),1)
which arise from reducing (𝑥, (𝑦),¬(𝑦)) using A2 or A7.
3. Matching left-hand sides of A5 and A7 gives
(¬0,1)
which arise from reducing 0¬0 in two different ways.
Proposition 1 ( [1, Chapter 6]). A terminating term rewriting system is confluent if and
only if all critical pairs (𝑡1, 𝑡2)are joinable i.e. 𝑡3. 𝑡1
𝑡3𝑡2
𝑡3.
In the first of the previous examples, the pair is clearly joinable by commutativity and
a single application of rule A7 itself. The second example is more interesting. Observe
that (𝑥, 𝑦, ¬(𝑦)) = 1 is a consequence of our axiom, but the left part cannot be
reduced to 1 in general in our system. To solve this problem we need to add the rule A9:
(𝑥, 𝑦, ¬(𝑦)) = 1. Similarly, the third example forces us to add A10: ¬0 = 1 to our
set of rules. From A10 and A6 we then find the expected critical pair A11: ¬1 = 0.
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 205
𝐴1 (..., 𝑥𝑖, 𝑥𝑗, ...) = (..., 𝑥𝑗, 𝑥𝑖, ...)
𝐴2 (𝑥, (𝑦)) = (𝑥, 𝑦)
(𝑥) = 𝑥
𝐴3 (𝑥, 𝑥, 𝑦) = (𝑥, 𝑦)
𝐴4 (1, 𝑥)=1
𝐴5 (0, 𝑥) = (𝑥)
𝐴6 ¬¬𝑥=𝑥
𝐴7 (𝑥, ¬𝑥, 𝑦)=1
𝐴9 (𝑥, 𝑦, ¬(𝑦)) = 1
𝐴10 ¬0 = 1
𝐴11 ¬1 = 0
Table 4. Terminating and confluent set of rewrite rules equivalent to L1-L8, L1’-L8’
4.2 Complete Terminating Confluent Rewrite System
The analysis of all possible pairs of rules to find all critical pairs is straightforward. It
turns out that the A9, A10 and A11 are the only rules we need to add to our system to
obtain confluence. We have checked the complete list of critical pairs for rules A2-A11
(we omit the details due to lack of space). All those pairs are joinable, i.e. reduce to the
same term, which implies, by Proposition 1, that the system is confluent. Table 4shows
the complete set of reduction rules (as well as commutativity).
Since the system A2-A11 considered over the language (𝑆, ,¬,0,1) modulo com-
mutativity of is terminating and confluent, it implies the existence of a normal form
reduction. For any term 𝑡, we note its normal form 𝑡. In particular, for any two terms
𝑡1and 𝑡2, we have 𝑡1=𝑡2in our theory iff 𝑡1
𝑡2iff 𝑡1and 𝑡2are equivalent terms
modulo commutativity. We finally reach our conclusion: an algorithm that computes
the normal form (modulo commutativity) of any term gives a decision procedure for the
word problem for orthocomplemented bisemilattices.
5 Algorithm and Complexity
The rewriting system readily gives us a quadratic algorithm. Indeed, using our base
algorithm for DAG equivalence, we can check, in linear time, for application cases of
any one of rewriting rules A2-A11 of Table 4modulo commutativity. Since a term can
only be reduced up to 𝑛times, the total time spent before finding the normal form of a
term is at most quadratic. It is however possible to find the normal form of a term in a
single pass of our equivalence algorithm, resulting in a more efficient algorithm.
5.1 Combining Rewrite Rules and Tree Isomorphism
We give an overview on how to combine rules A2-A7, A9, A10, A11 within the tree
isomorphism algorithm, which we present using Scala-like 1pseudo code in Figure 7.
1https://www.scala-lang.org/
cak206 S. Guilloud and V. Kunˇ
For conciseness, we omit the dynamic programming optimizations allowed by struc-
ture sharing in DAGs (which would store the normal form and additionally check if a
node was already processed.) For each rule, we indicate the most relevant lines of the
algorithm in Figure 7.
A2 (Associativity, Lines 10, 20, 32, 42) When analysing a node, after the recursive
call, find all children that are themselves and replace them by their own children.
This is simple enough to implement but there is actually a caveat with this in term of
complexity. We will come back to it in section 5.
A3 (Idempotence, Lines 8, 31, 35 ) This corresponds to the fact that we eliminate du-
plicate children in disjunctions. When reaching a node, after having sorted the code
of its children, remove all duplicates before computing its own code.
A4, A5 (Bounds, Lines 8, 31, 35, 11, 36) To account for those axioms, we reserve a
special code for the nodes 1and 0. For A4, when we reach some node, if it has 1as
one of its children, we accordingly replace the whole node by 1. For A5, we just remove
nodes with the same codes as 0from the parent node before computing its own code.
A6 (Involution, Lines 17, 22) When reaching a negation node, if its child is itself a
negation node, replace the parent node by its grandchildren before assigning it a code.
A7 (Complement, Lines 11, 36) As explained earlier, our representation of nodes let us
do the following to detect cases of A7: First remember that we already applied double
negation elimination, so that two “opposite” nodes cannot both start with a negation.
Then we can simply separate the children between negated and non-negated (after the
recursive call), sort them using their assigned code and look for collisions.
A9 (Also Complement, Lines 11, 36) This rule is slightly more tricky to apply. When
analysing a node 𝑥, after computing the code of all children of 𝑥, find all children of
the form ¬. For every such node, take the set of its own children and verify if it is
a subset of the set of all children of 𝑥. If yes, then rule A9 applies. Said otherwise, we
look for collisions between grandchildren (through a negation) and children of every
node.
A10, A11 (Identities, Lines 17, 26) These rules are simple. In a ¬node, if its child has
the same code as 0(resp 1), assign code 1(resp 0) to the negated node.
5.2 Case of Quadratic Runtime for the Basic Algorithm
All the rules we introduced in the previous section into Algorithm 1take time (log)linear
in the number of children of a node to apply, which is not more than the time we spent in
the DAG/tree isomorphism algorithm. For A3, checking for duplicates is done in linear
time in an ordered data structure. A4 and A5 (Bounds) consist in searching for specific
values, which take logarithmic time in the size of the list. A6 (Involution) takes constant
time. A7 (Complement) is detected by finding a collision between two separate ordered
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 207
lists, also easily done in (log) linear time. A9 (Also complement) consists in verifying
if grandchildren of a node are also children, and since children are sorted this takes log-
linear time in the number of grandchildren. Since a node is the grandchild of only one
other node, the same computation as in the original algorithm holds. A10 and A11 take
constant time. Hence, the total time complexity is (𝑛log(𝑛)), as in the algorithm for
tree isomorphism.
As stated in Section 3 regarding the algorithm for DAG equivalence whose com-
plexity we aim to preserve, the time complexity analysis crucially relies on the fact that
in a tree, a node is never the child (or grandchild) of more than one node during the
execution. However, this is generally not true in the presence of associativity. Indeed
consider the term represented in Figure 4. The 5th has 2 children, but after applying
𝑥1𝑥2𝑥3𝑥4
𝑥5
𝑥6
Fig. 4. A term with quadratic runtime
A2, the 4th has 3 children, the 3rd has 4 children and so on. On the generalization of
such an example, since an 𝑥𝑖is the child of all higher , our key property does not hold
and the algorithm runtime would be quadratic. Of course, such a simple counterexam-
ple is easily solved by applying a leading pass of associativity reduction before actually
running the whole algorithm. It turns out however that it is not sufficient, since cases of
associativity can appear after the application of the other A-rules.
In fact, there is only one rule that can creates case of rule A2, and this rule is A6
(Involution). The remaining rules whose right-hand side can start with a have their
left-hand side already starting with . It may seem simple enough to also apply double
negation elimination in a leading pass, but unfortunately, cases of A6 can also be created
from other rules. It is easy to see, for similar reasons, that only the application of A2b
((𝑥) = 𝑥) can create such cases. And unfortunately, such cases of A2b can arise from
rules A3 and A5 which can only be detected using the full algorithm. To summarize,
the typical problematic case is depicted in Figure 5. This term is clearly equivalent to
(𝑥1, 𝑥2, 𝑥3, 𝑥4), but to detect it we must first find that 𝑧1and 𝑧2are equivalent to 0, so
we cannot simply solve it with an early pass.
5.3 Final Log-Linear Time Algorithm
Fortunately, we can solve this problem at a logarithmic-only price. Observe that if we
are able to detect early nodes which would cancel to 0, the problem would not exist:
When analysing a node, we would first call the algorithm on all subnodes equivalent to
cak208 S. Guilloud and V. Kunˇ
¬¬¬¬
𝑥1𝑧1∼0 𝑥2𝑧2∼0
𝑥3
𝑥4
Fig. 5. A non-trivial term with quadratic runtime
¬¬
𝑥1
𝑥2𝑧2∼0
𝑥3
𝑥4
Fig. 6. the term of Figure 5during the algorithm’s execution
0, remove them and then when there is a single children left, remove the trivial disjunct,
the double negation and the successive disjunction (as in Figure 5) before doing the
recursive call on the unique nontrivial child. However, we of course cannot know in
advance which child will be equivalent to 0.
Moreover note (still using Figure 5) that if the 𝑧-child is as large as the non-trivial
node, then even if we do the “useless work, we at least obtain that the size a tree is
divided by two, and hence the potential depth of the tree as well. By standard complexity
analysis, the time penalty would only be a logarithmic factor.
The previous analysis suggests the following solution, reflected in Figure 7lines
28-29. When analysing a node, make recursive calls on children in order of their size,
starting with the smallest up to the second biggest. If any of those children are non-zero,
proceed as normal. If all (but possibly the last) children are equivalent to zero, then
replace the current node by its biggest (and at this point non-analyzed) child, i.e. apply
second half of rule A2 (associativity). If applicable, apply double negation elimination
and associativity as well before continuing the recursive call.
We illustrate this on the example of Figure 5. Consider the algorithm when reaching
the second node. There are two cases:
1. Suppose 𝑧1is a smaller tree than the non-trivial child. In this case the algorithm
will compute a code for 𝑧1, find that it is 0 and delete it. Then the non trivial node
is a single child so the whole disjunction is removed. Hence, the double negation
can be removed and the two consecutive disjunction of 𝑥1and 𝑥2merged, obtaining
the term illustrated in Figure 6. In particular we did not compute a code for the two
deleted nodes, which is exactly what we wanted for our initial analysis.
2. Suppose 𝑧1is larger tree than the non-trivial child. In this case, we would first re-
cursively compute the code of the non-trivial child and then detect that 𝑧1 0. We
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 209
indeed computed the code of the disjunction that contains 𝑥2when it was unnec-
essary since we apply associativity anyway. This “useless” work consists in sorting
and applying axioms to the true children of the node (in this case 𝑥2, 𝑥3and 𝑥4) and
takes time quasilinear in the number of such children. In particular, it is bounded by
the size of the subtree itself and we know it is the smallest of the two.
Analogous situation can arise from the use of rule A3 (idempotence), but here triv-
ially the two subtrees must have the same number of (real) subnodes, so that the same
reasoning holds.
Denote by 𝑛the size of a node, i.e. the number of descendants of 𝑛. We compute
the penalty of useless work we incur by computing children of a node 𝑛in the wrong
order, i.e. by computing a non-0 child 𝑛𝑤when all other are 0. 𝑛𝑤cannot be the largest
child of 𝑛for otherwise we would have found that all other children are 0before needing
to compute 𝑛𝑤. Hence 𝑛𝑤𝑛∕2. It follows that the total amount of useless work is
bounded by log(𝑛)𝑊(𝑛), where
𝑊(𝑛)𝑛∕2 +
𝑖
𝑊(𝑛𝑖)for
𝑖𝑛𝑖<𝑛.
It is clear that 𝑊(𝑛)is maximized when 𝑛has exactly two children of equal size:
𝑊(𝑛)𝑛∕2 + 2 𝑊(𝑛∕2)
By observing that we can divide 𝑛by 2only log(𝑛)times,
𝑊(𝑛)
log(𝑛)
𝑚=1
2𝑚𝑛∕2𝑚
so we obtain 𝑊(𝑛) = (𝑛log(𝑛)) and hence the total runtime is (𝑛(log 𝑛)2).
6 Conclusion
We have described a decision procedure with log-linear time complexity for the word
problem on orthocomplemented bisemilattices. This algorithm can also be simplified
to apply to weaker theories. Dually, we believe it can be generalized to decide some
stronger theories (still weaker than Boolean algebras) efficiently. While the word prob-
lem for orthocomplemented lattices was known to be in PTIME [15] and as such the
membership of orthocomplemented bisemilattices in PTIME may not come as a sur-
prise, this is, to the best of our knowledge, the first time that this result has been ex-
plicitly stated, and the first time that an algorithm with such low log-linear complexity
was proposed for this or a related problem. The algorithm has not only low complexity
but, according to our experience, is easy to implement. It can be used as an approxi-
mation for Boolean algebra equivalence, and we plan to use it as the basis of a kernel
for a proof assistant. We also envision possible uses of the algorithm in SMT and SAT
solvers. The algorithm is able to detect many natural and non-trivial cases of equiva-
lence even on formulas that may be too large for existing solvers to deal with, so it may
also complement an existing repertoire of subroutines used in more complex reasoning
tasks. For a minimal working implementation in Scala closely following Figure 7, see
https://github.com/epfl-lara/OCBSL.
cak210 S. Guilloud and V. Kunˇ
1def equivalentTrees(tau: Term, pi: Term): Boolean =
2val codesSig: HashMap[(String, List[Int]), Int] = Empty
3 codesSig.update(("zero", Nil), 0); codesSig.update(("one", Nil), 1)
4val codesNodes: HashMap[Term, Int] = Empty
5def updateCodes(sig: (String, List[Int]), n: Node): Unit = ... // codesSig, codesNodes
6def bool2const(b:Boolean): String = if bthen "one" else "zero"
7def rootCode(n: Term): Int =
8val L = pDisj(n, Nil).map(codesNodes).sorted.filter(_0).distinct
9if L.isEmpty then ("zero", Nil), n)
10 else if L.length == 1 then codesNodes.update(n, L.head)
11 else if L.contains(1) or checkForContradiction(L) then updateCodes(("one", Nil), n)
12 else updateCodes(("or", L), n)
13 codesNodes(n)
14 def pDisj(n:Node, acc:List[Node]): List[Node] = n match
15 case Variable(id) updateCodes((id.toString, Nil), n); return n :: acc
16 case Literal(b) updateCodes((bool2const(b), Nil), n); return n :: acc
17 case Negation(child) pNeg(child, n, acc)
18 case Disjunction(children) children.foldleft(acc)(pDisj)
19 def pNeg(n:Node, parent:Node, acc:List[Node]): List[Node] = n match // under negation
20 case Negation(child) pDisj(child, acc)
21 case Variable(id) updateCodes((id.toString, Nil), n)
22 updateCodes(("neg", List(codesNodes(n))), parent)
23 List(parent)::acc
24 case Literal(b) updateCodes((bool2const(b), Nil), n)
25 updateCodes((bool2const(!b), Nil), parent)
26 List(parent)::acc
27 case Disjunction(children)
28 val r0 = orderBySize(children)
29 val r1 = r0.tail.foldLeft(Nil)(pDisj)
30 val r2 = r1.map(codesNodes).sorted.filter(_0).distinct
31 if isEmpty(r2) then pNeg(r0.head, parent, acc)
32 else val s1 = pDisj(r0.head, r1)
33 val s2 = s1 zip (s1 map codesNodes)
34 val s3 = s2.sorted.filter(_0).distinct // all wrt. 2nd element
35 if s3.contains(1) or checkForContradiction(s3)
36 then updateCodes(("one", Nil), n); updateCodes(("zero", Nil), parent)
37 List(parent)::acc
38 else if isEmpty(s3) then updateCodes(("zero", Nil), n)
39 updateCodes(("one", Nil), parent)
40 List(parent)::acc
41 else if s3.length == 1 then pNeg(s3.head._1, parent, acc)
42 else updateCodes(("or", s3 map (_._2)), n)
43 updateCodes(("neg", List(n)), parent)
44 List(parent)::acc
45 return rootCode(tau) == rootCode(pi)
Fig. 7. Final algorithm. distinctBy runs in log-linear time. checkForContradiction detects appli-
cation cases of A7 and A9 (Complement). Maintenance of size field used by orderBySize elided.
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 211
References
1. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cam-
bridge (1998). https://doi.org/10.1017/CBO9781139172752
2. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds,
A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) Computer Aided Verifica-
tion. pp. 171–177. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-22110-1_14
3. Basin, D.A., Ganzinger, H.: Automated complexity analysis based on ordered resolution. J.
ACM 48(1), 70–109 (2001). https://doi.org/10.1145/363647.363681
4. Bruns, G.: Free Ortholattices. Canadian Journal of Mathematics 28(5), 977–985 (Oct 1976).
https://doi.org/10.4153/CJM-1976-095-6
5. Bruttomesso, R., Pek, E., Sharygina, N., Tsitovich, A.: The OpenSMT Solver. In: Hutchi-
son, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nier-
strasz, O., Pandu Rangan, C., Steffen, B., Sudan, M., Terzopoulos, D., Tygar, D., Vardi, M.Y.,
Weikum, G., Esparza, J., Majumdar, R. (eds.) Tools and Algorithms for the Construction and
Analysis of Systems, vol. 6015, pp. 150–153. Springer Berlin Heidelberg, Berlin, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-12002-2_12
6. Brzozowski, J.: De Morgan bisemilattices. In: Proceedings 30th IEEE International
Symposium on Multiple-Valued Logic (ISMVL 2000). pp. 173–178 (May 2000).
https://doi.org/10.1109/ISMVL.2000.848616
7. Buss, S.R.: Alogtime algorithms for tree isomorphism, comparison, and canonization. In:
Gottlob, G., Leitsch, A., Mundici, D. (eds.) Computational Logic and Proof Theory. pp. 18–
33. Springer Berlin Heidelberg, Berlin, Heidelberg (1997)
8. Cook, S.A.: The complexity of theorem-proving procedures. In: Proceedings of the Third
Annual ACM Symposium on Theory of Computing. p. 151–158. STOC ’71, Association for
Computing Machinery, New York, NY, USA (1971). https://doi.org/10.1145/800157.805047
9. Davis, M., Logemann, G., Loveland, D.: A machine program for theorem-proving. Commun.
ACM 5(7), 394–397 (Jul 1962). https://doi.org/10.1145/368273.368557
10. Ganzinger, H., Hagen, G., Nieuwenhuis, R., Oliveras, A., Tinelli, C.: DPLL(T): Fast De-
cision Procedures. In: Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C.,
Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., Sudan, M., Terzopoulos, D., Ty-
gar, D., Vardi, M.Y., Weikum, G., Alur, R., Peled, D.A. (eds.) Computer Aided Verifi-
cation, vol. 3114, pp. 175–188. Springer Berlin Heidelberg, Berlin, Heidelberg (2004).
https://doi.org/10.1007/978-3-540-27813-9_14
11. Gentzen, G.: Untersuchungen über das logische schließen. I. Mathematische Zeitschrift 39,
176–210 (1935)
12. Hamza, J., Voirol, N., Kunčak, V.: System FR: Formalized foundations
for the Stainless verifier. Proc. ACM Program. Lang 3(November 2019).
https://doi.org/https://doi.org/10.1145/3360592
13. Harrison, J.: HOL Light: An Overview. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel,
M. (eds.) Theorem Proving in Higher Order Logics, vol. 5674, pp. 60–66. Springer Berlin
Heidelberg, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03359-9_4
14. Hopcroft, J., UIIman, J., Aho, A.: The Design And Analysis Of Computer Algorithms.
Addison-Wesley (1974)
15. Hunt, H. B., I., Rosenkrantz, D.J., Bloniarz, P.A.: On the Computational Complex-
ity of Algebra on Lattices. SIAM Journal on Computing 16(1), 129–148 (Feb 1987).
https://doi.org/10.1137/0216011
16. Kahn, A.B.: Topological sorting of large networks. Communications of the ACM 5(11), 558–
562 (Nov 1962). https://doi.org/10.1145/368996.369025
cak212 S. Guilloud and V. Kunˇ
17. Kalmbach, G.: Orthomodular Lattices. Academic Press Inc, London ; New York (Mar 1983)
18. Krajíček, J.: Proof Complexity. Encyclopedia of Mathematics and Its Appplications, Vol.170,
Cambridge University Press (2019)
19. Kroening, D., Strichman, O.: Decision Procedures - An Algorithmic Point of View. Springer
(2016)
20. Kuncak, V.: Modular Data Structure Verification. Ph.D. thesis, EECS Department, Mas-
sachusetts Institute of Technology (February 2007), http://hdl.handle.net/1721.1/38533
21. Leino, K.R.M., Polikarpova, N.: Verified calculations. In: Cohen, E., Rybalchenko, A. (eds.)
Verified Software: Theories, Tools, Experiments. pp. 170–190. Springer Berlin Heidelberg,
Berlin, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54108-7_9
22. Lewis, D.W.: Hazard detection by a quinary simulation of logic devices with bounded
propagation delays. In: Proceedings of the 9th Design Automation Workshop. pp. 157–
164. DAC ’72, Association for Computing Machinery, New York, NY, USA (Jun 1972).
https://doi.org/10.1145/800153.804941
23. Lindell, S.: A logspace algorithm for tree canonization (extended abstract). In: Pro-
ceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing. p.
400–404. STOC ’92, Association for Computing Machinery, New York, NY, USA (1992).
https://doi.org/10.1145/129712.129750
24. McAllester, D.A.: Automatic recognition of tractability in inference relations. Journal of the
ACM 40(2), 284–303 (1993). https://doi.org/10.1145/151261.151265
25. Meinander, A.: A solution of the uniform word problem for ortholattices.
Mathematical Structures in Computer Science 20(4), 625–638 (Aug 2010).
https://doi.org/10.1017/S0960129510000125
26. Merz, S., Vanzetto, H.: Automatic Verification of TLA+ Proof Obligations with SMT
Solvers. In: Bjørner, N., Voronkov, A. (eds.) Logic for Programming, Artificial Intelligence,
and Reasoning. pp. 289–303. Lecture Notes in Computer Science, Springer, Berlin, Heidel-
berg (2012). https://doi.org/10.1007/978-3-642-28717-6_23
27. Naumowicz, A., Korniłowicz, A.: A brief overview of mizar. In: Berghofer, S., Nipkow, T.,
Urban, C., Wenzel, M. (eds.) Theorem Proving in Higher Order Logics. pp. 67–72. Springer
Berlin Heidelberg, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03359-9_5
28. Peterson, G.E., Stickel, M.E.: Complete sets of reductions for some equational theories. J.
ACM 28(2), 233–264 (Apr 1981). https://doi.org/10.1145/322248.322251
29. Pudlák, P.: The Lengths of Proofs. In: Studies in Logic and the Foundations of Mathematics,
vol. 137, pp. 547–637. Elsevier (1998). https://doi.org/10.1016/S0049-237X(98)80023-2
30. Tschannen, J., Furia, C.A., Nordio, M., Polikarpova, N.: Autoproof: Auto-active functional
verification of object-oriented programs. In: Baier, C., Tinelli, C. (eds.) Tools and Al-
gorithms for the Construction and Analysis of Systems. pp. 566–580. Springer (2015).
https://doi.org/10.1007/978-3-662-46681-0_53
31. Urquhart, A.: Hard examples for resolution. J. ACM 34(1), 209–219 (Jan 1987).
https://doi.org/10.1145/7531.8928
32. Wenzel, M., Paulson, L.C., Nipkow, T.: The Isabelle Framework. In: Theorem Proving in
Higher Order Logics. pp. 33–38. Lecture Notes in Computer Science, Springer, Berlin, Hei-
delberg (2008). https://doi.org/10.1007/978-3-540-71067-7_7
33. Whitman, P.M.: Free Lattices. Annals of Mathematics 42(1), 325–330 (1941).
https://doi.org/10.2307/1969001
34. Zee, K., Kuncak, V., Rinard, M.: Full functional verification of linked data structures. In:
ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI) (2008).
https://doi.org/10.1145/1375581.1375624, see also [20]
35. Zee, K., Kuncak, V., Rinard, M.: An integrated proof language for imperative programs. In:
ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI) (2009).
https://doi.org/10.1145/1543135.1542514
Equivalence Checking for Orthocomplemented Bisemilattices in Log-Linear Time 213
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which per-
mits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
cak214 S. Guilloud and V. Kunˇ
Monitoring and Analysis
A Theoretical Analysis of
Random Regression Test Prioritization
Pu Yi1, Hao Wang1, Tao Xie1() , Darko Marinov2, and Wing Lam3
1Peking University, Beijing, China
lukeyi@pku.edu.cn,tony.wanghao@stu.pku.edu.cn,taoxie@pku.edu.cn
2University of Illinois Urbana-Champaign, Urbana, IL, USA
marinov@illinois.edu
3George Mason University, Fairfax, VA, USA
winglam@gmu.edu
Abstract. Regression testing is an important activity to check software
changes by running the tests in a test suite to inform the developers
whether the changes lead to test failures. Regression test prioritization
(RTP) aims to inform the developers faster by ordering the test suite
so that tests likely to fail are run earlier. Many RTP techniques have
been proposed and are often compared with the random RTP baseline
by sampling some of the n! different test-suite orders for a test suite
with ntests. However, there is no theoretical analysis of random RTP.
We present such an analysis, deriving probability mass functions and ex-
pected values for metrics and scenarios commonly used in RTP research.
Using our analysis, we revisit some of the most highly cited RTP papers
and find that some presented results may be due to insufficient sampling.
Future RTP research can leverage our analysis and need not use random
sampling but can use our simple formulas or algorithms to more precisely
compare with random RTP.
Keywords: Regression Test Prioritization ·Random ·Analysis
1 Introduction
Software developers commonly check their code by running tests. Regression
testing [48] runs tests after code changes, to check whether the changes break
the existing functionality. A test that passes before the changes but fails after
indicates that the changes should be debugged (unless the test is flaky [25]).
Finding test failures faster enables the developers to start debugging earlier.
A popular regression testing approach is regression test prioritization (RTP) [12,
19,21,23,38,39,48], which runs the tests from a test suite in an order that aims
to find test failures sooner. For example, Google [14] and Microsoft [42] report on
using RTP in industry. More formally, a test suite Tis a set (unordered) of tests,
and RTP techniques produce a test-suite order—a permutation of the tests in
the test suite—in which to run the tests. Various RTP techniques have been pro-
posed in the literature since the seminal papers from 20+ years ago [12,36,38,47]
that have garnered thousands of citations.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 217–235, 2022.
https://doi.org/10.1007/978-3-030-99527-0_12
RTP techniques are often compared with random RTP. Our inspection [44]
of the 100 most cited papers on RTP shows that 56 papers use random RTP
as a comparison baseline. Although random RTP often performs worse than ad-
vanced techniques, recent papers still use random RTP, because it has a small
overhead and may perform well in certain scenarios. We additionally check pa-
pers published in the latest testing conferences (ICST and ISSTA 2020/2021)
and find that 50% (2/4) of the RTP papers [6,15,30,34] use random RTP. While
random RTP has been used as a baseline for 20+ years, all evaluations have
been empirical, performed by randomly sampling some of the n! orders for a test
suite with ntests. The selected sample size varies (20, 50, 100, 200, 1000), with
no clear correlation with n; some papers do not even report the sample size [44].
However, no prior work has presented a theoretical analysis of random RTP.
Before we summarize our analysis, we describe some metrics and scenarios
most commonly used in RTP research. We first introduce some terms: failure
is simply a failing test, fault is the root cause (bug in the code) for the failure,
and we say that a failure detects a fault if the failure is caused by the fault [36].
In general, many failures may detect the same fault, and one failure may detect
many faults. We capture the relationship between failures and faults by a failure-
to-fault matrix. To compare RTP techniques, researchers quantify how fast (test-
suite) orders find all faults (not failures because having many failures that detect
the same fault is not as valuable as having a few failures that detect many faults).
RTP evaluations involve three aspects: RTP metric, failure-to-fault matrix,
and allowed orders. The most widely used metric is Average Percentage of Faults
Detected (APFD) [38], denoted as αfor short. Another popular metric is Cost-
Cognizant APFD (APFDc) [11], denoted as γfor short. Section 2formally defines
these metrics based on the failure-to-fault matrix; each metric assigns to an order
a value between 0 and 1, with higher values indicating better orders. Traditional
RTP research used seeded faults, which allow fairly precisely deriving the failure-
to-fault matrix [10,22,37] that can arbitrarily map failures and faults. Recent
RTP research mostly uses real failures, e.g., analyzing real regression testing
runs from continuous integration systems [14,15,23,24,27,34], making it rather
difficult to precisely derive the failure-to-fault matrix. As a result, the increas-
ingly popular failure-to-fault matrices are all-to-one, where all failures map to
the same one fault, and one-to-one, where each failure maps to a distinct fault.
To describe allowed orders, we note that real test suites often partition tests,
e.g., in JUnit [20], each test method belongs to a test class. Traditional research
ignores this partitioning and allows all n! orders (a(T) for short) of ntests.
We introduced compatible4orders [46] (c(T) for short) that consider the parti-
tioning and allow only orders that do not interleave tests from different classes.
We present the first theoretical analysis for the cases most commonly used
in RTP research. We introduce an algorithm for efficiently computing the ex-
act probability mass functions (PMFs) of αfor all failure-to-fault matrices and
a(T). We demonstrate the efficiency of our algorithm on the benchmarks from
4Our original term was class-compatible [46] because we considered as tests only test
methods in test classes, but the concept easily generalizes to other kinds of tests.
218 P. Yi et al.
Fig. 1: Example metrics for two orders (Com. is compatible) for n= 5, m = 3;
class C1 has 3 tests with costs 40,20,60, class C2 has 2 with 100,80; C1.t1
detects fault F1; C1.t3 detects F2; C2.t1 detects F2 and F3; C2.t2 detects F3.
the largest RTP dataset for Java projects [34]. For the common all-to-one and
one-to-one cases, we further derive a closed-form formula and a good approxima-
tion, respectively. We also derive closed-form formulas for the expected values
for both αand γfor the general failure-to-fault matrix, for both a(T) and
c(T), and we compare these values in various scenarios. Interestingly, on aver-
age, a(T) can perform much better (up to 1/2) than c(T) for certain scenarios,
but cannot perform much worse (only up to 1/6) for any scenario; Section 5.1
presents this comparison, including two scenarios near the limits (1/2 and 1/6).
We finally derive two interesting properties for the αand γmetrics. Using
these properties, we revisit some of the highly cited papers on RTP and find that
some presented results may be biased due to insufficient sampling. Overall, our
theoretical analysis provides new insights into the random RTP widely used in
prior work but only via empirical sampling. Our results show that in many cases
researchers need not run sampling but can use simple formulas or algorithms to
obtain more precise statistics for the random RTP metrics.
2 Preliminaries
Our notation largely follows the prior work that introduced APFD (α) [38] and
APFDc(γ) [11], but we make explicit the failure-to-fault matrix. Let nbe the
number of tests and mbe the number of faults detected by (some of) these
tests. Let Mbe a failure-to-fault matrix, i.e., a n×mBoolean matrix such that
Mj,i = true iff (failure of) test jdetects fault i, and each fault has at least one
failure (i.e., i.j.Mj,i). Let Tbe the set of tests in the test suite. We denote
the set of tests that detect the fault ias Ti={j|Mj,i}. In general, Tiand Ti
for i=ineed not be disjoint because one failing test can detect multiple faults.
The total number of failures is k=|{j|∃i.Mj,i}|, and we use ki=|Ti|.
For an order o(a permutation of T), we use <oto compare the positions of
two tests tand tin the order: t <otdenotes that tprecedes tin o, and tot
denotes that t=tor t <ot. We denote the jth test in an order oas tj(o). Let
τi(o) = minjMtj(o),i be the position of the first test to detect the fault iin o.
Prior work [11,38] defined metrics αand γ(using the notation T F instead of τ).
A Theoretical Analysis of Random Regression Test Prioritization 219
We use α(o) and γ(o) to indicate αand γ, respectively, for a given order o. We
drop ofrom <o,o,tj(o), τi(o), α(o), and γ(o) when clear from the context.
The most popular RTP metric is α[38], defined for an order oas follows.
Definition 1 (α).APFD is defined as
α= 1 Pm
i=1 τi
nm +1
2n(1)
Plotting the percentage of faults detected against the percentage of executed
tests, αrepresents the area under the curve, as shown in two examples in Fig. 1.
The diagonal lines interpolate the percentage of faults detected and lead to nice
properties of mean/median αvalues and symmetry (Section 6). αranges between
0 and 1, more precisely between 1/(2n) and 11/(2n). A larger αindicates that
an order detects faults earlier, on average.
While αeffectively considers the number of tests, the “cost cognizant” metric
γconsiders the cost of tests [11]. The cost can be measured in various ways, but
most work uses the test runtime. We use σ(t) to denote the cost (runtime) of a
test t; the total cost of a set of tests Tis σ(T) = PtTσ(t).
Definition 2 (γ).APFDcis defined as
γ=Pm
i=1 Pn
j=τiσ(tj)1
2σ(tτi)
m·σ(T)(2)
Plotting the percentage of faults detected against the percentage of total
test-suite cost, γrepresents the area under the curve, as shown in Fig. 1. Note
that αcan be viewed as a special case of γwhere t, tT (t) = σ(t).
In practice, tests often belong to classes 5—e.g., JUnit [20] test methods be-
long to test classes, Maven [28] test classes belong to modules, and pytest [35]
test functions belong to test files—and tests from each class run together. Our
prior work [46] defined compatible orders as those where all tests from each class
are consecutive. We use TCto denote the set of tests in a class C. An order o
is compatible iff C, j jj′′ .tj(o)TCtj′′ (o)TCtj(o)TC. For
example, o2 in Fig. 1is compatible, while o1 is not. To distinguish the cases for
all orders from the cases for only compatible orders, we use the subscripts aand
c, respectively, e.g., Ea[x] and Ec[x] represent the expected value of xfor the
uniform selection of all orders and compatible orders, respectively, and Pa(A)
and Pc(A) represent the probability of event Afor the uniform selection of all
orders and compatible orders, respectively. We denote the set of all orders and
all compatible orders for Tas a(T) and c(T), respectively [46].
We analyze RTP techniques in scenarios, each of which consists of a test suite
with ntests, mfaults, the failure-to-fault matrix, the cost of each test, and for
c(T) the class of each test. To analyze compatible orders, we introduce some
new notation to indicate the class of tests. We use Ti,C =TiTCto denote the
5The term class for a set of tests that run together need not represent a test class.
220 P. Yi et al.
set of tests in class Cthat detect the fault i. Let Cbe the set of all classes, and
Cibe the set of classes that contain at least one test that detects the fault i, i.e.,
Ci={C C|Ti,C =∅}. Let C(t) be the class that tbelongs to, i.e., tTC(t).
The number of compatible orders is |c(T)|=|C|!QC∈C |TC|!.
For a set of orders S, be it a(T) or c(T), the probability mass function
(PMF) of a metric, αor γ, is a function pfrom the metric value to its probability:
p(x) = P(metric = x) = |{oS|metric(o) = x}|/|S|. We next derive some
PMFs as all prior RTP work shows only sampled distributions of random RTP.
3 PMF of α
To analyze the PMF of the metric α, we first propose an algorithm to calculate
the PMF of αfor the general case of M. We then discuss two special cases, i.e.,
all-to-one and one-to-one, which are the most common in recent RTP research.
3.1 Algorithm to Calculate PMF of αfor the General Case
To calculate the PMF of α, a na¨ıve algorithm would enumerate all n! orders and
compute αfor each order. In theory, αcan take O(n!) different values, e.g., when
m=Pn
i=1 niand all ntests fail and detect n, n2, . . . , nndifferent faults, then
each of the n! orders has a different α. In practice, however, the number of faults
mand the number of failing tests kare usually small, e.g., in our evaluation
dataset [34], 2906 out of 2980 (98%) scenarios have k10. We present an
algorithm that computes the exact PMF with O(n2mk ·k!) time complexity.
Despite the k! factor, the algorithm runs in reasonable time in practice, under
30sec for any of the 2906 scenarios. When k > 10, one can resort to sampling.
We next describe the intuition for our algorithm. Pm
i=1 τiis the only part of
αthat depends on the (test-suite) order, so we first calculate the PMF of this
sum and then convert it to the PMF of α. Iterating over the faults does not
lead to a nice recursive formulation. Our key insight is to instead iterate over
the positions of all kfailing tests. We view Pm
i=1 τias a weighted sum
m
X
i=1
τi=
k
X
j=1
wjϕj(3)
where ϕjis the position of the jth failing test in the order, and wj0 is the
weight, calculated as the number of faults detected first by the jth failing test
(Line 11 of Algorithm 1). For example, consider the order o1 in Fig. 1. The
relative order of the k= 4 failing tests is ρ=C2.t2,C1.t1,C1.t3,C2.t1; we use
metavariable ρto distinguish the notation from ofor the order of all ntests. For
this relative order, w=1,1,1,0because the m= 3 faults are detected first by
C2.t2, C1.t1, and C1.t3. The positions for this relative order ρare ϕ=1,2,3,5
because the 4 failing tests in ρappear in these positions in the order o1.
We call a ϕ=ϕ1. . . ϕkvalid if 1 ϕ1< . . . < ϕkn. Both sequences
ϕand w=w1. . . wkcan vary for different orders. While ϕhas n
kvalid
A Theoretical Analysis of Random Regression Test Prioritization 221
Algorithm 1: Calculate the PMF of α
1Input: n,m,M // the number of tests and faults, and the failure-to-fault matrix
2Output: p// the PMF of α:p(x) = P(α=x)
3Function PMF() // main function; return the PMF of αfor all orders
4k = |{j|∃i.Mj,i}| // number of failing tests in M, in practice kn
5q = PMF sum() // compute the PMF of Pm
i=1 τi
6return λx.q(mn mnx + m
2)// convert that PMF to the PMF of α
7Function PMF sum() // return the PMF of Pm
i=1 τifor all orders
8P=PMF rorder(ρ),ρperms({j|∃i.Mj,i})// enumerate all relative orders
9return λx.Pp∈P p(x)/|P| // average PMFs of Pm
i=1 τifor each relative order
10 Function PMF rorder(ρ)// return the PMF of Pm
i=1 τifor a relative order ρ
11 w = ⟨|{i|Mρj,ij< j.Mρj,i}|,j1..k// w are the weights in formula (3)
12 return λs.f(w,k,n)(s)/n
k// the total number of ϕis n
k
13 // the function should be memoized to reuse the results for the repeated w,g,h
Function f(w,g,h)// return fg,hgiven weights w, calculated with formula (4)
14 if g>hthen
15 return λs.0
16 if g = 0 then
17 return λs.1s=0
18 return λs.f(w,g,h1)(s)+f(w,g1,h1)(s wgh)
possibilities, we note that whas at most k! possibilities (with k!n
kas
knin practice) because wdepends only on ρ. Therefore, we first fix wby
enumerating the k! relative orders of the kfailing tests. Then for each relative
order, the problem of calculating the PMF of Pm
i=1 τi=Pk
j=1 wjϕjbecomes
“given w, count the number of valid ϕsuch that Pk
j=1 wjϕj=sfor each s,
which can be solved recursively as follows.
Let fg,h(s) be the number of assignments for the values of ϕ1, . . . , ϕgsuch
that 1 ϕ1< . . . < ϕghand Pg
j=1 wjϕj=s. The problem is to find
fk,n(s). As the base case, (1) fg,h(s) = 0 for g > h because ϕg< g cannot hold;
(2) f0,h(s) = 1s=0 , where 1is the indicator function, because only the empty
sequence ⟨⟩ is valid and P0
j=1 wjϕj= 0. For all hg > 0, the number of
assignments for fg,h(s) has two cases: (1) if ϕgh1, the number is equal to
fg,h1(s) by definition; (2) if ϕg=h, the number for sis equal to the number of
assignments for ϕ1, . . . , ϕg1such that ϕg1ϕg1 = h1 and Pg1
j=1 wjϕj=
(Pg
j=1 wjϕj)wgϕg=swgh, which is fg1,h1(swgh). In total,
fg,h(s) =
0g > h
1s=0 g= 0
fg,h1(s) + fg1,h1(swgh) otherwise
(4)
After solving fk,n, we get the PMF of Pm
i=1 τifor each relative order of the
kfailing tests. Because each of k! relative orders has the same probability by
symmetry, we simply take the average of their PMFs to get the PMF of Pm
i=1 τi
for all orders. Finally, we convert the PMF of Pm
i=1 τito the PMF of α.
222 P. Yi et al.
Table 1: Number of tests, failures, runtime (in ms), and Jensen-Shannon (JS)
distance for 10 largest scenarios [34] and one synthetic scenario (TSmax)
Test #Tests #Failures Runtime [ms] Jensen-Shannon
suite (n) (k)all-to-one one-to-one distance (§3.2.2)
TS1 2118 1 513 505 0.0000
TS2 1986 2 563 629 0.0005
TS3 2080 3 617 871 0.0003
TS4 1929 4 680 1147 0.0004
TS5 1795 5 731 1408 0.0006
TS6 339 6 627 732 0.0040
TS7 465 7 678 756 0.0034
TS8 813 8 829 2009 0.0023
TS9 52 9 1496 1846 0.0442
TS10 161 10 10989 27095 0.0150
TSmax 2118 10 32801 242400 0.0011
We next describe Algorithm 1in more detail. The input is the number of tests
n, the number of faults m, and the failure-to-fault matrix M. The main function
PMF invokes PMF sum to get the PMF of Pm
i=1 τiand converts it to the PMF of
α. The function PMF sum enumerates all relative orders ρof the kfailing tests,
invokes PMF rorder(ρ)to get the PMF of Pm
i=1 τifor each relative order, and
averages these PMFs to get the PMF of Pm
i=1 τifor all (relative) orders. Function
PMF rorder(ρ)computes the weights wfrom formula (3), invokes f(w,k,n) to
get fk,n for w, and converts it to the PMF of Pm
i=1 τi.
We finally discuss the time complexity and the empirical performance of
Algorithm 1. The major cost comes from computing the function f. Because
there are O(k!) different wand 0 gk, g hn, we have O(nk·k!) different
inputs for which to compute f. With memoization, fis computed only once for
each input. Each computation takes O(nm) because |support(fg,h)|=O(nm)
as 1 τinfor 1 im. Therefore, the cost of computing ffor all inputs
is O(n2mk ·k!). The other costs in the algorithm are lower than the cost of f;
hence, the overall time complexity of Algorithm 1is O(n2mk ·k!).
Implementation: While top-down recursion makes it easier to present the al-
gorithm, for better performance our implementation uses bottom-up dynamic
programming to compute f. Our implementation fits in only 117 lines of C++.
Dataset: We use the RTP dataset with the most Java projects [34] for our
evaluation. In this dataset, each test is a test class and each class is a Maven
module [28]. The dataset has 2980 scenarios, and 2906 (98%) have k10. We
select, for each k10, the scenario with the maximum number of tests (n)
from the dataset. We also make a synthetic scenario with 2118 tests, being the
largest number of tests in the dataset, and 10 failures. We use both all-to-one
and one-to-one failure-to-fault matrices on the selected scenarios.
Evaluation: As Table 1shows, the code finishes in under 30sec (on a common
laptop) for all real scenarios; it takes more time on the synthetic one for all-to-one
and one-to-one, but the runtime is still 33sec and 4min, respectively.
A Theoretical Analysis of Random Regression Test Prioritization 223
3.2 PMFs of αfor Special Cases
As mentioned in Section 1, recent RTP research uses real failures and faults,
with two kinds of failure-to-fault matrices: all-to-one and one-to-one. We discuss
the PMFs of αfor these two commonly used cases.
3.2.1 All-to-One: We first derive the PMF of αfor all-to-one. In this case,
m= 1, k1, and w1= 1,j > 1.wj= 0 in formula (3). Therefore, the
recursive formula (4) becomes fg,h(s) = fg ,h1(s) + fg1,h1(s) for g > 1, which
is similar to Pascal’s triangle. This observation hints that the PMF of αfor
all-to-one may have a closed formula with binomial coefficients.
Theorem 3 (The PMF of αfor all-to-one failure-to-fault matrix).
P(α= 1 s
n+1
2n) = ns
k1
n
k, s {1,2, . . . , n k+ 1}(5)
Proof. For all-to-one, the αvalue depends solely on τ1, which is essentially ϕ1in
formula (3). For 1 snk+1, τ1=sholds as long as s=ϕ1< . . . < ϕkn.
To satisfy the condition, we just need to choose the k1 positions after position s.
Therefore, ns
k1out of n
kways to choose kpositions in nsatisfy the condition,
so P(τ1=s) = ns
k1/n
k, and formula (5) directly follows.
With (5), we can use O(n) time to compute the PMF of αfor all-to-one. We
can compute the needed binomial coefficients iteratively, starting from k1
k1= 1,
with the recurrence n+1
k1=n+1
nk+2 n
k1, nk1, and get n
k=n
kn1
k1.
3.2.2 One-to-One: We next consider the PMF of αfor one-to-one. In this case,
m=kand each failing test finds a distinct fault, so for every relative order of
the kfailing tests, j.wj= 1 in formula (3). Therefore, running Algorithm 1 and
memoizing on w, the complexity becomes O(n2k2+k!). k! is because we need
to iterate through all the relative orders. We can avoid k! if we check in advance
that the failure-to-fault matrix is one-to-one, so the complexity is O(n2k2).
Moreover, considering formula (4) when j.wj= 1, fk,n essentially models
the problem “counting the number of partitions of sinto kdistinct summands
from {1,2, . . . , n}. Specifically, fg ,h(s) can be viewed as the number of parti-
tions of sinto gdistinct summands in {1,2, . . . , h}, and fg,h(s) = fg ,h1(s) +
fg1,h1(sh) holds because the summand gcan be either less than hor exactly
h, corresponding to fg,h1(s) and fg1,h1(sh), respectively. To the best of
our knowledge, no closed formula is known for this problem. Considering that
in our evaluation dataset, 99.8% (2975/2980) of scenarios have n2k2<109, the
O(n2k2) algorithm is efficient enough for practical use for almost all cases.
Approximation: Furthermore, we can approximate the PMF by ignoring the
distinct-number constraint, i.e., “counting the number of partitions of sinto
ksummands from {1,2, . . . , n}. This problem has a nice generating function
(x+x2+. . . +xn)k, where the coefficient of xsis the number of partitions [43]:
sk
n
X
i=0 k
i(1)isni 1
k1(6)
224 P. Yi et al.
We can calculate these coefficients using two algorithms with different tradeoffs.
The first algorithm first pre-calculates the binomial coefficients with Pascal’s
triangle and then calculates all the coefficients with formula (6). The first step
takes O(nk2) because sni 1nk and ik. The second step takes O(nk2)
because each of O(nk) coefficients takes O(k) to compute as sk
n k. Thus, the
overall time complexity of the first algorithm is O(nk2). The second algorithm
calculates the generating function directly with the fast Fourier transform [4] by
first converting x+x2+. . . +xnto the point-value representation, calculating
each point value to the kth power, and interpolating to get the coefficients. The
second algorithm takes O(nk log(nk)) because the length of the polynomial is
O(nk). Comparing the complexity, the first algorithm is better when kis small
compared to n(i.e., klog k < log n), and the second is better otherwise.
To evaluate the approximation, we use Jensen–Shannon (JS) distance [16]
between the exact and the approximated PMFs. We check our approximation
on the same real scenarios as in Section 3.1. As Table 1shows, the approximation
yields PMFs with a small JS distance, the largest only 0.0442 for n= 52, k = 9.
3.3 PMF of γ
The PMF of γis more complex than that of αbecause even for the simplest all-
to-one failure-to-fault matrix, the number of possible values of γcan be (2n).
For example, consider ntests with costs 1,2,4,...,2n1, and only one test fails
and detects the only fault. The γvalue depends on the sum of the costs of the
tests that precede the failure. 2n1different sets of the tests can precede the
failure, and every set has a distinct sum of the costs. Even for the example in
Fig. 1, the support of PMF for γ(33) is much bigger than that for α(8).
4 Expected Values for All Orders a(T)
While some comparisons of RTP techniques use full samples of PMFs, many use
just the arithmetic mean of the samples. We next derive formulas for expected
values to obtain the mean faster and without the imprecision from sampling.
In this section, we consider the case where order ois uniformly selected from
a(T), allowing n! orders of ntests. Because αis a special case of γwhere
t, tT(t) = σ(t), we first derive γ.
To start with a simple example, consider a test suite with only one failing
test (k= 1). For a random order, the test can be at any position with equal
probability. Intuitively, the expected position across all of the orders is at the
middle of the sequence, hence αand γshould be about 1/2. In fact, we will show
that they are exactly 1/2. Moreover, the expected values of both αand γare
1/2 as long as each fault is detected by only one failing test (i.ki=|Ti|= 1,
which includes one-to-one). In general, the failure-to-fault matrix can be more
complex: many tests could detect the same fault, and a test could detect many
faults. To compute the expected values of αand γ, we first prove a useful lemma.
A Theoretical Analysis of Random Regression Test Prioritization 225
Lemma 4. For every fault i,
t /Ti.Pa(t < tτi) = Pa(tTi.t < t) = 1
ki+ 1 (7)
Proof. Since τiis the position of the first test from Tiin the order, tprecedes
tτiiff tprecedes every tTi. Consider the relative position of each t /Tiwith
respect to all the tests from Tiin a random order. By symmetry, it is equally
likely that tis in any of the ki+1 relative positions created by the relative order
of the kitests from Ti. Therefore, the probability that tis in the relative position
preceding all the kitests from Tiis 1
ki+1 .
We first use this lemma to compute Ea[γ].
Theorem 5 (The expected value of γfor a(T)).
Ea[γ]=1Pm
i=1 σ(T\Ti)
ki+1 +σ(Ti)
2ki
m·σ(T)(8)
Proof. From (2), the two key terms in γare σ(tτi) and Pn
j=τiσ(tj). By symme-
try, any test tTican be the first in the order, or equivalently t=tτi, with
probability 1
ki. Thus
Ea[σ(tτi)] = X
tTi
P(t=tτi)σ(t) = σ(Ti)
ki
(9)
Next, consider that Pn
j=τiσ(tj) = PtTσ(t)1tτitcan be also calculated as
PtTiσ(t)1tτit+Pt /Tiσ(t)1tτit. For every test tTi,tτitby definition,
so tTi.Ea[1tτit] = 1. For every test t /Ti, Ea[1tτit] = Pa(tτit) =
1Pa(t<tτi) = ki
ki+1 . The last equality stems from Lemma 4. Therefore, by
the linearity of expectation, we get
Ea[
n
X
j=τi
σ(tj)] = σ(Ti) + ki
ki+ 1σ(T\Ti) (10)
From (2), (9), and (10), we get (8).
Corollary 5.1 (The expected value of αfor a(T)).
Ea[α]=1(n+ 1) Pm
i=1 1
ki+1
nm +1
2n(11)
Revisiting the case where each fault can be detected by only one failing test,
setting i.ki= 1 in (8) or (11), gives exactly 1/2 = Ea[α] = Ea[γ]. In fact, even
in the general case of any failure-to-fault matrix, we find that the two expected
values are similar if not the same, inspiring us to derive the following bound:
226 P. Yi et al.
Theorem 6 (The expected difference of αand γfor a(T)).
1
12 <Ea[α]Ea[γ]<1
2n(12)
Proof. From formulas (8) and (11), we have Ea[α]Ea[γ] = γα+1
2n,
where γ=Pm
i=1(1
2ki1
ki+1 )σ(Ti)
m·σ(T)and α=Pm
i=1 1
ki+1
nm . Since ki1, we have
1
12 1
2ki1
ki+1 0 (with basic calculus, minimum is for ki= 2 or ki= 3),
which, combined with σ(Ti)σ(T), gives 1
12 γ0. Since ki1, we
also have 0 <1
ki+1 1
2, which gives 0 < α1
2n. Thus, we have 1
12
γα+1
2n<1
2n. However, γα+1
2n=1
12 would require α=1
2nand
thus i.ki= 1, in which case γ= 0 and γα+1
2n= 0 =1
12 . Therefore,
the equality cannot hold and 1
12 <Ea[α]Ea[γ]<1
2n.
5 Expected Values for Compatible Orders c(T)
In this section, we consider the expected values of αand γfor c(T). Compatible
orders do not interleave tests from different classes, as defined in Section 2.
Similar to a(T), we first prove a useful lemma for c(T).
Lemma 7. For every fault i, (note that if t /Ti,C(t)may have another tTi)
t /Ti.Pc(t<tτi) = Pc(tTi.t < t) = (1
|Ci|(|Ti,C(t)|+1) C(t) Ci
1
|Ci|+1 C(t)/ Ci
(13)
Proof. For C(t) Cicase, two conditions must hold for t /Ti,C (t)to precede
all tests that detect the fault i. First, among all classes in Ci,C(t) must be the
first in the order, and by symmetry, each class in Cican be the first with the
same probability 1
|Ci|. Second, tmust precede all tests from Ti,C(t), which (sim-
ilar to Lemma 4) holds with the probability 1
|Ti,C(t)|+1 . The two conditions are
independent because they are about the class order and the test order inside the
class, respectively, and these orders are independent of each other. Therefore, the
probability that tprecedes the first test that detects the fault iis 1
|Ci|(|Ti,C(t)|+1) .
For C(t)/ Cicase, only one condition—C(t) precedes all classes in Ci
must hold for tto precede the first test that detects the fault i, which (similar
to Lemma 4) happens with probability 1
|Ci|+1 .
Theorem 8 (The expected value of γfor c(T)).
Ec[γ] = 1 1
m·σ(T)Pm
i=1 PC /∈Ciσ(TC)
|Ci|+1 +
1
|Ci|PC∈Ciσ(TC\Ti,C )
|Ti,C |+1 +σ(Ti,C )
2|Ti,C |!(14)
A Theoretical Analysis of Random Regression Test Prioritization 227
Proof. We first compute the two key terms σ(tτi) and Pn
j=τiσ(tj) in γ. For each
test tTito be the first, its class C(t) Cishould be the first among all classes
in Ciwith probability 1
|Ci|, and tmust be the first among all tests in Ti,C(t)with
probability 1
|Ti,C(t)|. These two events are independent, so the joint probability
is 1
|Ci||Ti,C(t)|. By σ(tτi) = PtTiσ(t)·1t=tτi, we have
Ec[σ(tτi)] = X
tTi
σ(t)
|Ci||Ti,C(t)|=1
|Ci|X
C∈Ci
σ(Ti,C )
|Ti,C |(15)
Next, consider Pn
j=τiσ(tj) = PtTσ(t)·1tτit. Each tis either (1) tTi, where
1tτit= 1 by definition of τi; or (2) t /Ti, where Ec[1tτit] = Ec[1tτi<t] =
Pc(tτi< t) = 1 Pc(t < tτi) can be obtained from Lemma 7. Combining these
cases, we have
Ec[Pn
j=τiσ(tj)] = σ(Ti) + |Ci|
|Ci|+1 PC /∈Ciσ(TC)+
PC∈Ci11
|Ci|(|Ti,C |+1) σ(TC\Ti,C )(16)
From (2), (15), and (16), we get (14).
Corollary 8.1 (The expected value of αfor c(T)).
Ec[α]=11
nm
m
X
i=1 PC /∈Ci|TC|
|Ci|+ 1 +1
|Ci|X
C∈Ci
|TC|+ 1
|Ti,C |+ 1!+1
2n(17)
We next discuss the expected difference of Ec[α] and Ec[γ]. Unlike the case
with a(T), where the difference has a rather small bound, we find that the
difference can be rather large for c(T).
Theorem 9 (The expected difference of αand γfor c(T)).
1
2<Ec[α]Ec[γ]1
21
2n(18)
Proof. From (14) and (17), we get Ec[α]Ec[γ] = γα+1
2n, where
γ=1
m·σ(T)Pm
i=1 PC /∈Ciσ(TC)
|Ci|+1 +1
|Ci|PC∈Ciσ(TC\Ti,C )
|Ti,C |+1 +σ(Ti,C )
2|Ti,C |!and
α=1
nm Pm
i=1(PC /∈Ci|TC|
|Ci|+1 +1
|Ci|PC∈Ci
|TC|+1
|Ti,C |+1 ). γ>0 because all the terms
in γare positive. From i, C Ci.|Ci| 1,|Ti,C | 1, we have
γ1
m·σ(T)Pm
i=1 PC /∈Ciσ(TC)
1+1 +1
1PC∈Ciσ(TC\Ti,C )
1+1 +σ(Ti,C )
2·1!
=1
m·σ(T)·1
2Pm
i=1 σ(T) = 1
2
Similarly,
228 P. Yi et al.
α1
nm Pm
i=1(PC /∈Ci|TC|
|Ci|+1 +1
|Ci|PC∈Ci
|TC|+1
1+1 )
1
nm Pm
i=1(PC /∈Ci|TC|
2|Ci|+PC∈Ci(|TC|+1)
2|Ci|) = 1
nm Pm
i=1(n
2|Ci|+1
2)n+1
2n
From 0 |Ti,C | |TC|, we also have α1
n. Combining 0 < γ1
2and
1
nαn+1
2n, we get 1
2< γα+1
2n1
21
2n
Considering many inequalities in the preceding proof, one may expect the
bounds to be loose, but we show two scenarios where bounds are close to tight.
Both scenarios have only one fault. Scenario one has two classes: C1has only
one passing test twith cost qN (q > 0 is arbitrary), and C2has Nfailing
tests each with cost q
N. We assume N1. tmust be the first or last in any
compatible order, each with probability 1/2 (when C1is first or second). Ec[α] is
close to 1, and Ec[γ] is only about 1/2. Precisely, Ec[α]Ec[γ] = N22N+2
2N2+2N1
2
when N1. Scenario two has two classes: C2has Nfailing tests with cost q
N,
and C3has N2passing tests each with cost q
N3. The two classes have only two
orders, each with probability 1/2. Ec[γ] is close to 1, and Ec[α] is only about
1/2. Precisely, Ec[α]Ec[γ] = 1
N+1 N2+2
2N2+2N+1
2N 1
2when N1.
5.1 Comparison of a(T) and c(T)
Orders that are compatible have more constraints on the PMF, which could
increase or decrease average αor γvalues. To compare how orders in a(T) and
c(T) perform on average, we compare Ea[α] with Ec[α] and Ea[γ] with Ec[γ].
Theorem 10 (Difference of Ec[γ] and Ea[γ]).
1
2n1
2Ec[γ]Ea[γ]1
6(19)
Proof. From (8) and (14), we have
Ec[γ]Ea[γ] = 1
m·σ(T)Pm
i=1 σ(Ti)
2ki+σ(T\Ti)
ki+1 PC /∈Ciσ(TC)
|Ci|+1
1
|Ci|PC∈Ciσ(TC\Ti,C )
|Ti,C |+1 +σ(Ti,C )
2|Ti,C |(20)
Because i.1kin, |Ci| 1,|Ti,c| 1, we have
Ec[γ]Ea[γ]1
m·σ(T)Pm
i=1(1
2n1
2)σ(T) = 1
2n1
2
For the other side, because i.|Ci| ki,|Ti,c | ki, we have
Ec[γ]Ea[γ]1
m·σ(T)Pm
i=1 σ(Ti)
2ki+σ(T\Ti)
ki+1 PC /∈Ciσ(TC)
ki+1
(PC∈Ciσ(TC))σ(Ti)
|Ci|(ki+1) σ(Ti)
2|Ci|ki
=1
m·σ(T)Pm
i=1 11
|Ci|PC∈Ciσ(TC)
ki+1 σ(Ti)1
ki+1 1
2ki
1
m·σ(T)Pm
i=1 11
|Ci|PC∈Ciσ(TC)
ki+1
1
m·σ(T)Pm
i=1
|Ci|−1
|Ci|(|Ci|+1) PC∈Ciσ(TC)
1
m·σ(T)Pm
i=1
σ(T)
6=1
6
A Theoretical Analysis of Random Regression Test Prioritization 229
The third last inequality holds because ki1.1
ki+1 1
2ki0. The last inequal-
ity holds because ∀|Ci| 1.|Ci|−1
|Ci|(|Ci|+1) 1
6, which can be shown with simple
calculus, and PC∈Ciσ(TC)σ(T).
Corollary 10.1 (Difference of Ec[α] and Ea[α]).
1
2n1
2Ec[α]Ea[α]1
6(21)
We give two scenarios where the preceding bounds are close to tight. In both
scenarios, we set t, tT.σ(t) = σ(t), so that α=γand Ec[α]Ea[α] =
Ec[γ]Ea[γ]. The first scenario has one fault F, each of the |C| classes contains
n
|C| tests, and tests from only one class detect Fbut all tests in that class detect
F. In this scenario, Ea[α] = 1 |C|(n+1)
n(n+|C|)+1
2n, and Ec[α] = 1 |C|−1
2|C| 1
2n.
If we consider |C| =n, when n1, Ea[α]1 but Ec[α]1/2, hence
Ec[α]Ea[α] 1/2. The second scenario has one fault Fand two classes with
1 and n1 tests, and each class contains only one test that detects F. In this
scenario, Ea[α] = 2
31
2nand Ec[α] = 3
4. When n1, Ec[α]Ea[α]1
12 , close
to the upper bound of 1/6.
In brief, measured by αor γ, compatible orders can be much worse on average
than all orders (up to 1/2) but cannot be much better (up to 1/6).
6 Properties of Metrics and Checking Prior RTP Work
Prior work on random RTP uses sampling and often visualizes αand γvalues
as boxplots that may show the median, mean, quartiles (25% and 75%), and
“whiskers” (1.5 times the interquartile range) of the sampled distribution. For
papers that show these boxplots, we identify two properties for the boxplots, fo-
cusing on a(T) because it is used in almost all prior work instead of c(T) [46]:
Mean/Median at Least Half: Ea[α],Meda(α),Ea[γ],Meda(γ)1/2.
Symmetric PMF: Ea[α]=1/2Meda(α)=1/2Ea[γ]=1/2
Meda(γ)=1/2 i.ki= 1 PMFs of αand γare symmetric around 1/2.
To check the boxplots from prior work, we search on Google Scholar for
papers related to “test prioritization” and keep only the papers that contain
both “test” and “prioriti” in the titles. We sort these papers based on their
citation count and check the top 100 papers with the highest citation count [44].
6.1 Mean/Median at Least Half
Lemma 11. oa(T)and its reverse order oa(T),
γ(o) + γ(o)1 (22)
The equality holds iff i.ki= 1.
230 P. Yi et al.
Proof sketch. To give some intuition, when i.ki= 1, the test that detects the
fault ifirst does not change by reversing the order, so the “prefixes” of the test
in oand ocomplement each other and form the entire test suite. In this case,
γ(o) + γ(o) = 1. If i.ki2, the test that detects the fault ifirst in ois not the
same test in o, and the “prefixes” of these two tests in oand odo not form the
entire test suite, so γ(o) + γ(o)>1. We omit the details due to space limit.
Theorem 12 (Measures of central tendency are at least half).
min{Ea[α],Meda(α),Ea[γ],Meda(γ)} 1/2 (23)
The equality holds iff i.ki= 1.
Proof sketch. From (22), we get Ea[γ] = 1
2·Poa(T)(γ(o)+γ(o))
n!1
2and the
equality holds iff i.ki= 1. Because αcan be viewed as a special case of γ, we
also have the same result for Ea[α]. The same result for Meda(α) and Meda(γ)
can also be derived from (22). We omit the details due to space limit.
When we inspect the top 100 most cited RTP papers, we find at least five
papers with boxplots clearly showing a mean or median below 1/2. These papers
range from seminal papers [12, Figs. 2b, 2c, 2e] (year 2000) and [13, Fig. 3:
schedule, tcas] (2002) to more recent [29, Fig. 4] (2007), [5, Fig. 2] (2016 a co-
author of this prior paper is also in this paper), and [41, Fig. 5] (2017). Instead of
sampling random orders for an arbitrary number of times, future RTP research
could use our formulas or algorithm to obtain correct mean and median values.
6.2 Symmetric PMF
We also prove that αand γPMFs are symmetric when (23)’s equality holds.
Theorem 13 (Symmetry of the αand γPMFs).If Ea[α] = 1/2Meda(α) =
1/2Ea[γ]=1/2Meda(γ)=1/2 i.ki= 1, then
δ.P (α= 1/2δ) = P(α= 1/2 + δ)P(γ= 1/2δ) = P(γ= 1/2 + δ) (24)
Proof. From Theorem 12, min{Ea[α],Meda(α),Ea[γ],Meda(γ)}= 1/2 i.ki=
1 i.ki= 1 o.α(o) + α(o) = 1 γ(o) + γ(o) = 1. Each order has exactly
one reverse order, so the PMFs of αand γare symmetric around 1/2.
When we inspect the top 100 most cited RTP papers again, we find at least
three papers relevant to this property. Based on the information in these pa-
pers, we believe that i.ki= 1 is true. Ideally, we would confirm each paper’s
failure-to-fault matrix, but papers often omit such details. On a positive note,
the authors of one paper [38] released their dataset, which we analyze and con-
firm that i.ki= 1. The papers that violate this property include the most
widely cited paper on RTP [38, Fig. 5: schedule, schedule2, tcas] (year 2001;
1563 citations per Google Scholar) and others, both older [36, Fig. 4: schedule,
schedule2, tcas] (1999) and newer [40, Fig. 2] (2015) papers.
A Theoretical Analysis of Random Regression Test Prioritization 231
Instead of randomly sampling orders to approximate PMFs, future RTP pa-
pers could use our algorithm to compute exact PMFs. While we find only five
and three papers that definitely violate Mean/Median at Least Half and Sym-
metric PMF, respectively, we suspect that many others may violate these or
similar properties. However, due to the lack of data in many papers (e.g., no
boxplot for random RTP), we cannot easily identify all violations.
7 Related Work
Some prior work [45,49] considers expected values of αand γbut in different
contexts from ours. Random testing (but not random RTP) has been studied for
a while [79,17,18,3133,50]. The most related are theoretical analyses of random
test generation. ohme and Paul [2,3] analyze how random sampling of test
inputs compares to systematic generation: random can be more efficient when
the cost to systematically generate a test input exceeds the cost to randomly
sample an input by some factor. ohme et al. [1] analyze the connection between
Shannon’s entropy and the discovery rate of a fuzzer that randomly generates
inputs. They provide the foundation for identifying random seeds for the fuzzer
to improve the overall efficiency. Their analysis also enables future systematic
approaches for test generation to be more efficiently compared with random.
Similarly, our analysis can help future RTP work more efficiently compare against
random RTP and avoid insufficient sampling. Beyond random test generation,
Majumdar and Niksic [26] present a theoretical analysis on the effectiveness
of randomly inserted partition faults to find bugs in distributed systems. In
contrast, our analysis is on test-suite orders for random RTP.
8 Conclusion
Regression test prioritization (RTP) is a popular regression testing approach.
Majority of highly cited RTP papers have compared RTP techniques with ran-
dom RTP. However, all evaluations have been empirical, with no prior theoretical
analysis of random RTP. This paper has presented such analysis, by introduc-
ing an algorithm for efficiently computing the exact probability mass function
of APFD, deriving closed-form formulas and approximations for various metrics
and scenarios, and deriving two interesting properties forAPFD and APFDc.
Overall, our analysis provides new insights into the random RTP, and our re-
sults show that future RTP work often need not use random sampling but can
use our simple formulas or algorithms to more precisely evaluate random RTP.
Acknowledgments. We thank Anjiang Wei, Dezhi Ran, and Sasa Misailovic
for their help. This work was partially supported by US NSF grants CCF-
1763788. CCF-1956374, NSFC grant No. 62161146003, Tencent Foundation, and
XPLORER PRIZE. We acknowledge support for research on regression testing
from Dragon Testing, Microsoft, and Qualcomm. Tao Xie is the corresponding
author, and also affiliated with Key Laboratory of High Confidence Software
Technologies (Peking University), Ministry of Education, China.
232 P. Yi et al.
References
1. ohme, M., Man`es, V.J.M., Cha, S.K.: Boosting fuzzer efficiency: An information
theoretic perspective. In: ESEC/FSE (2020)
2. ohme, M., Paul, S.: On the efficiency of automated testing. In: FSE (2014)
3. ohme, M., Paul, S.: A probabilistic analysis of the efficiency of automated software
testing. TSE (2016)
4. Brigham, E.O.: The fast Fourier transform and its applications. Prentice-Hall, Inc.
(1988)
5. Busjaeger, B., Xie, T.: Learning for test prioritization: An industrial case study.
In: FSE (2016)
6. Cheng, R., Zhang, L., Marinov, D., Xu, T.: Test-case prioritization for configuration
testing. In: ISSTA (2021)
7. Claessen, K., Hughes, J.: QuickCheck: A lightweight tool for random testing of
Haskell programs. In: ICFP (2000)
8. Csallner, C., Smaragdakis, Y., Xie, T.: DSD-Crasher: A hybrid analysis tool for
bug finding. TOSEM (2008)
9. Duran, J.W., Ntafos, S.C.: An evaluation of random testing. TSE (1984)
10. Elbaum, S., Kallakuri, P., Malishevsky, A., Rothermel, G., Kanduri, S.: Under-
standing the effects of changes on the cost-effectiveness of regression testing tech-
niques. STVR (2003)
11. Elbaum, S., Malishevsky, A., Rothermel, G.: Incorporating varying test costs and
fault severities into test case prioritization. In: ICSE (2001)
12. Elbaum, S., Malishevsky, A.G., Rothermel, G.: Prioritizing test cases for regression
testing. In: ISSTA (2000)
13. Elbaum, S., Malishevsky, A.G., Rothermel, G.: Test case prioritization: A family
of empirical studies. TSE (2002)
14. Elbaum, S., Rothermel, G., Penix, J.: Techniques for improving regression testing
in continuous integration development environments. In: FSE (2014)
15. Elsner, D., Hauer, F., Pretschner, A., Reimer, S.: Empirically evaluating readily
available information for regression test optimization in continuous integration. In:
ISSTA (2021)
16. Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. Trans-
actions on Information Theory (2003)
17. Fraser, G., Zeller, A.: Generating parameterized unit tests. In: ISSTA (2011)
18. Hamlet, R.: Random testing. In: Encyclopedia of Software Engineering (1994)
19. Jiang, B., Zhang, Z., Chan, W.K., Tse, T.H.: Adaptive random test case prioriti-
zation. In: ASE (2009)
20. JUnit (2022), https://junit.org
21. Kim, J.M., Porter, A.: A history-based test prioritization technique for regression
testing in resource constrained environments. In: ICSE (2002)
22. Kim, J.M., Porter, A., Rothermel, G.: An empirical study of regression test appli-
cation frequency. STVR (2005)
23. Liang, J., Elbaum, S., Rothermel, G.: Redefining prioritization: Continuous prior-
itization for continuous integration. In: ICSE (2018)
24. Lu, Y., Lou, Y., Cheng, S., Zhang, L., Hao, D., Zhou, Y., Zhang, L.: How does
regression test prioritization perform in real-world software evolution? In: ICSE
(2016)
25. Luo, Q., Hariri, F., Eloussi, L., Marinov, D.: An empirical analysis of flaky tests.
In: FSE (2014)
A Theoretical Analysis of Random Regression Test Prioritization 233
26. Majumdar, R., Niksic, F.: Why is random testing effective for partition tolerance
bugs? In: POPL (2017)
27. Mattis, T., Rein, P., ursch, F., Hirschfeld, R.: RTPTorrent: An open-source
dataset for evaluating regression test prioritization. In: MSR (2020)
28. Maven (2022), https://maven.apache.org
29. Mirarab, S., Tahvildari, L.: A prioritization approach for software test cases based
on Bayesian networks. In: FASE (2007)
30. Mondal, S., Nasre, R.: Summary of Hansie: Hybrid and consensus regression test
prioritization. In: ICST (2021)
31. Ntafos, S.: On random and partition testing. In: ISSTA (1998)
32. Ozkan, B.K., Majumdar, R., Oraee, S.: Trace aware random testing for distributed
systems. OOPSLA (2019)
33. Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test
generation. In: ICSE (2007)
34. Peng, Q., Shi, A., Zhang, L.: Empirically revisiting and enhancing IR-based test-
case prioritization. In: ISSTA (2020)
35. pytest (2022), https://docs.pytest.org
36. Rothermel, G., Untch, R., Chu, C., Harrold, M.: Test case prioritization: An em-
pirical study. In: ICSM (1999)
37. Rothermel, G., Elbaum, S., Malishevsky, A., Kallakuri, P., Davia, B.: The impact
of test suite granularity on the cost-effectiveness of regression testing. In: ICSE
(2002)
38. Rothermel, G., Untch, R.H., Chu, C., Harrold, M.J.: Prioritizing test cases for
regression testing. TSE (2001)
39. Rummel, M.J., Kapfhammer, G.M., Thall, A.: Towards the prioritization of re-
gression test suites with data flow information. In: SAC (2005)
40. Saha, R.K., Zhang, L., Khurshid, S., Perry, D.E.: An information retrieval approach
for regression test prioritization based on program changes. In: ICSE (2015)
41. Spieker, H., Gotlieb, A., Marijan, D., Mossige, M.: Reinforcement learning for
automatic test case prioritization and selection in continuous integration. In: ISSTA
(2017)
42. Srivastava, A., Thiagarajan, J.: Effectively prioritizing tests in development envi-
ronment. In: ISSTA (2002)
43. Stanley, R.P.: Enumerative Combinatorics, Volume 1. Cambridge University Press
(2011)
44. A Theoretical Analysis of Regression Test Prioritization website (2022), https:
//sites.google.com/view/theoretical-analysis-of-rtp
45. Wang, Z., Chen, L.: Improved metrics for non-classic test prioritization problems.
In: SEKE (2015)
46. Wei, A., Yi, P., Xie, T., Marinov, D., Lam, W.: Probabilistic and systematic cov-
erage of consecutive test-method pairs for detecting order-dependent flaky tests.
In: TACAS (2021)
47. Wong, W., Horgan, J., London, S., Agrawal, H.: A study of effective regression
testing in practice. In: ISSRE (1997)
48. Yoo, S., Harman, M.: Regression testing minimization, selection and prioritization:
A survey. STVR (2012)
49. Zhai, K., Jiang, B., Chan, W.: Prioritizing test cases for regression testing of
location-based services: Metrics, techniques, and case study. IEEE TSC (2012)
50. Zhang, S., Saff, D., Bu, Y., Ernst, M.D.: Combined static and dynamic automated
test generation. In: ISSTA (2011)
234 P. Yi et al.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
A Theoretical Analysis of Random Regression Test Prioritization 235
Verified First-Order Monitoring with Recursive Rules
Sheila Zingg
1
, Sr ¯
dan Krsti´
c
1
(
) , Martin Raszyk
1
(
) , Joshua Schneider
1
(
) , and
Dmitriy Traytel2()
1Institute of Information Security, Department of Computer Science, ETH Zürich,
Zurich, Switzerland, {srdan.krstic,martin.raszyk,joshua.schneider}@inf.ethz.ch
2Department of Computer Science, University of Copenhagen, Copenhagen, Denmark,
traytel@di.ku.dk
Abstract.
First-order temporal logics and rule-based formalisms are two popular
families of specification languages for monitoring. Each family has its advan-
tages and only few monitoring tools support their combination. We extend metric
first-order temporal logic (MFOTL) with a recursive let construct, which enables
interleaving rules with temporal logic formulas. We also extend VeriMon, an
MFOTL monitor whose correctness has been formally verified using the Isabelle
proof assistant, to support the new construct. The extended correctness proof
covers the interaction of the new construct with the existing verified algorithm,
which is subtle due to the presence of the bounded future temporal operators. We
demonstrate the recursive let’s usefulness on several example specifications and
evaluate our verified algorithm’s performance against the DejaVu monitoring tool.
Keywords: Rule-based specifications · Monitoring · Formal verification.
1 Introduction
In runtime verification, a monitor observes events generated by a running system and
analyzes the event streams for compliance with a given specification. Temporal spec-
ification languages for monitoring are often classified as operational or declarative [10].
Operational languages explicitly describe how the monitor’s input should be transformed
to obtain an output. Two important subclasses of operational languages are rule-based for-
malisms [2,13] and stream runtime verification (SRV) languages [6,8,11,20]. Both formu-
late the transformations as recursive equations. In contrast, declarative languages, such as
first-order temporal logics [4,15], describe the output by composing high-level operators.
Operational and declarative languages have complementary advantages: declarative
languages let specification authors focus on the “what” and not the “how”, whereas
operational languages offer the authors more control over the evaluation. Most runtime
verification tools do not support mixing the paradigms, especially when it comes to
parametric, i.e., first-order, specification languages. A notable exception is the recent
addition of recursive rules to past-time first-order temporal logic (PFLTL), implemented
in the DejaVu monitoring tool [14]. As another important benefit, recursive rules can
express operations like transitive closure that are not expressible in first-order logics.
In this paper, we introduce recursion in metric first-order temporal logic (MFOTL) [4]
in the form of a recursive let construct. We develop and implement an evaluation al-
gorithm for MFOTL with recursion in VeriMon [3,21], an MFOTL monitor whose
correctness has been formally verified in the Isabelle proof assistant. To this end, we
extend the formal correctness proof to cover the recursive let construct.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 236–253, 2022.
https://doi.org/10.1007/978-3-030-99527-0_13
Unlike PFLTL, MFOTL supports bounded future temporal operators and aggrega-
tions (Section 2). The interaction of recursion with bounded future operators is subtle.
To avoid non-termination, DejaVu requires all recursive occurrences to be guarded by
a previous operator. We similarly require the recursive occurrences to be guarded in our
monitor, but we relax the requirement on the guard to other past-time operators which
ensure that their subformulas are evaluated strictly in the past. Moreover, we allow future
operators in the recursive let construct, as long as no recursion takes place in the future op-
erator’s arguments. These restrictions ensure that the fixpoint given by the recursive let op-
erator is well-defined. At the same time, they are permissive and allow us to formulate in-
teresting examples, several of which are beyond what PFLTL with recursion can express.
Consider a specification that aims to secure hosts in a network that communicate with
each other and with the outside world. A host is tainted by an address range iff there is a
chain of communication from the address to the host and all hosts on the chain trigger an
intrusion detection alert within one hour after communicating with the previous host. This
specification can be expressed directly using our recursive let construct (to model chains
of communication) and future temporal operators (to specify “within one hour after”).
We start by extending MFOTL with a non-recursive let operator (Section 3). This spe-
cial case is mainly of pedagogical value: aspects common to both let operators are easier
to explain on the simpler non-recursive variant. Yet, this construct is useful in practice
to structure complex formulas and improve monitoring performance by sharing common
subformulas. Thus we extend VeriMon’s algorithms and proofs with the non-recursive let.
We then introduce the recursive let operator (Section 4.1), exemplify its semantics
with several specifications (Section 4.2), and develop the monitoring algorithm and sketch
its correctness (Section 4.3). VeriMon’s repository [24] contains complete formal proofs.
This work is part of the long-term effort to develop a trustworthy monitor that
surpasses in expressiveness and efficiency other non-verified tools. In this work, our focus
is on expressiveness (and trustworthiness). Nonetheless, we evaluate our algorithmic
additions to VeriMon on a micro-benchmark and observe that even without further
optimizations it exhibits an incomparable performance to DejaVu (Section 5). Moreover,
we detected a problem in DejaVu’s handling of variable names in recursive subformulas.
In summary, our main contribution is the extension of MFOTL with a recursive let
operator and the design of an evaluation algorithm for it. Along the way, we introduce a
non-recursive let operator, which proved essential when writing complex specifications.
Our contributions are implemented as part of VeriMon and proved correct using Isabelle.
Related Work. Our work adds rule-based specification features [13] to a first-order spec-
ification language [16]. Above we describe our contribution’s relationship to DejaVu and
VeriMon, two monitors for first-order temporal specifications. VeriMon’s algorithm [21],
which we extend, is based on the algorithm used in the MonPoly monitor [5], although Ve-
riMon has optimizations that are not present in MonPoly and vice versa [3]. VeriMon sup-
ports a more expressive specification language than MonPoly, and our introduction of the
recursive let has increased the gap between the two. VeriMon’s and MonPoly’s algorithms
work with finite relations. These tools are thus restricted to MFOTL’s monitorable frag-
ment [4], which ensures that all subformulas evaluate to finite results. In contrast, DejaVu
finitely represents infinite relations using BDDs and thus supports the full PFLTL (but
only closed formulas). Both DejaVu and our work restrict the recursive let syntactically.
Verified First-Order Monitoring with Recursive Rules 237
datatype data =Int int |Flt double |Str string
type_synonym db =string data list set
datatype trm =Vnat |Cdata |trm +trm |...
type_synonym ts =nat
typedef trace ={s:: (db ×ts)stream.trace s}
typedef I={(a:: nat,b:: enat).ab}
datatype frm =string(trm list)|trm trm | ¬ frm | frm |frm frm |frm frm
| Ifrm |#Ifrm |frm SIfrm |frm UIfrm |nat agg_op(trm;nat)frm
fun etrm :: data list trm data where
etrm v(Vx) = v!x|etrm v(Cx) = x|etrm v(t1+t2) = etrm v t1+etrm v t2|...
fun sat :: trace data list nat frm bool where
sat σv i (
p
(as)) = (map (etrm v)as
Γ
σi p)|sat σv i (t1t2)=(etrm v t1etrm v t2)
|sat σv i (¬ϕ) = (¬sat σv i ϕ)|sat σv i (ϕ) = (z.sat σ(z#v)iϕ)
|sat σv i (αβ)=(sat σv i αsat σv i β)|sat σv i (αβ)=(sat σv i αsat σv i β)
|sat σv i ( Iϕ) = (case iof 0False |j+1TσiTσjIIsat σv j ϕ)
|sat σv i (#Iϕ) = (Tσ(i+1)TσiIIsat σv(i+1)ϕ)
|sat σv i (αSIβ) = (ji.TσiTσjIIsat σv j β(k { j<.. i}.sat σv k α))
|sat σv i (αUIβ) = (ji.TσjTσiIIsat σv j β(k {i..< j}.sat σv k α))
|sat σv i (y(t;b)ϕ) = let M={(x,cardZ)|xZ.
Z={z.length z=bsat σ(z@v)iϕetrm (z@v)t=x} Z6={}}
in (M={} fv ϕ {0..< b})v!y=eval_agg_op M)
Fig. 1. Formal syntax and semantics of MFOTL with aggregations, where ◦∈{=, <, ≤}
Other rule-based [2,13] and SRV-based monitors [6,8,11,20] can express the temporal
operators present in LTL, but struggle with extensions that introduce parameters. Even
for the operators they can express, specialized algorithms that are carefully tuned for the
operators tend to exhibit a better performance. Instead of encoding temporal operators,
we take the opposite approach and enrich a monitor that uses specialized algorithms for
temporal operators with general-purpose recursion.
Datalog [1] adds recursion to first-order logic, similarly to our addition of recursion to
temporal logic. However, Datalog has no built-in notion of time and hence other measures
must be taken to ensure that the fixpoints are well-defined, e.g., by restricting negation.
Restricting the recursive occurrences to be strictly in the past is a natural and expressive
alternative for monitoring, as we do not restrict negation beyond of what the monitorable
fragment requires. Works on Datalog extensions with metric temporal operators [7,19,22]
mostly study the decidability and complexity of computational problems related to these
extensions, whereas we design, implement, and formally verify an executable algorithm.
2 Metric First-Order Temporal Logic
MFOTL extends linear temporal logic with first-order quantification, past-time operators,
and interval bounds on the temporal operators [4]. The VeriMon monitor [3] supports
a fragment of this logic. It also adds new features, specifically regular matching oper-
ators as in linear dynamic logic [9], which results in metric first-order dynamic logic
(MFODL), as well as aggregations. Our extension of VeriMon with recursive rules retains
the additional features of MFODL. However, the additional features are orthogonal to our
extension and hence we base our presentation in this paper on MFOTL with aggregations.
We summarize MFOTL’s syntax and semantics, as well as the monitorable fragment.
The presentation generally follows the Isabelle formalization; however, we sometimes
238 S. Zingg et al.
deviate from Isabelle’s concrete syntax for simplicity. We begin by defining some
auxiliary types (top of Fig. 1). The logic’s universe (type
data
) is fixed and infinite: it is
a disjoint sum of integers, 64-bit IEEE floats, and strings of 8-bit characters. Databases
(type
db
) encode first-order structures as functions from predicate names to relations
over
data
. Relations are represented as sets of lists. A
trace
is a
stream
(an infinite
sequence) of time-stamped databases. Time-stamps (type
ts
) are modeled as natural
numbers (type
nat
). We write Γ
σi
for the
i
th database in
σ
, and T
σi
for its time-stamp.
The predicate
trace
enforces monotone and eventually increasing time-stamps, i.e.,
ij.
T
σi
T
σj
and
x.i.x<
T
σi
. Non-empty intervals (type
I
) are represented
by their end-points. We write
[a,b]
for the unique interval satisfying
nI[a,b]
iff
anb
, where
nII
denotes that
I
contains the natural number
n
. The interval is
unbounded from above if b=, which the type enat adds to the natural numbers.
Terms (type
trm
) are constructed recursively from variables (represented by De Bruijn
indices), constants, and arithmetic operators. We use named variables in examples and
omit the Vand Cconstructors. There are two kinds of atomic formulas (type
frm
):
flexible predicates of the form p
(as)
, where
as
is a list of terms, and rigid predicates
t1t2
for
{=, <, ≤}
, which have a fixed interpretation. Formally, the existential
quantifier
does not carry a variable name because of the De Bruijn encoding. We use
fv αto denote the set of De Bruijn indices of α’s free variables.
The semantics is given by the functions
etrm
and
sat
(Fig. 1). Both depend on a valu-
ation, which is a
data list
assigning a value to each variable. The satisfaction function
sat
for formulas additionally depends on a trace
σ
and a time-point
i
, which is an index into
the trace. Indexing into lists is denoted by
v!x
, the operation
z#v
prepends the value
z
to
the list
v
, and
@
concatenates two lists. The notation
{x..< y}
and
{x<.. y}
is shorthand
for the sets
{x,x+1,...,y1}
and
{x+1,x+2,...,y}
of natural numbers, respectively.
An aggregation formula
y(t;b)ϕ
binds
b
variables in the subformula
ϕ
; the
remaining free variables of
ϕ
are used for grouping. Each group is assigned an aggregate
value
y
, which is computed by first evaluating the term
t
on each valuation that matches
the group and that satisfies
ϕ
, then aggregating the results using the operator
(e.g.,
MIN
for minimum). To this end,
eval_agg_op M
(not shown) applies
to a set
M
of
value–multiplicity pairs [3];
cardZ
is the cardinality of
Z
, or
if
Z
is infinite. The con-
junct
M={} fv ϕ {0..< b}
ensures that the formula is satisfied by the aggregate
value of an empty
M
only if there are no grouping variables. Otherwise, infinitely many
groups would be labeled with that value, rendering such aggregations non-monitorable.
The decidable predicate
mon :: frm bool
specifies the monitorable fragment. We
omit its formal definition and refer to the earlier descriptions of VeriMon [3,21] for details.
Intuitively,
mon
places restrictions on the formula’s structure to ensure that all subfor-
mulas have finitely many satisfying valuations. Also, the interval
I
of every U
I
operator
must be bounded. A monitor for a monitorable formula can thus compute a finite set of
satisfying valuations for every time-point after observing a sufficiently long trace prefix.
3 Non-Recursive Let Operator
We first introduce a non-recursive let operator
Let string :=frm in frm
to the
frm
datatype.
The formula
Let p:=αin β
associates the formula
α
with the predicate named
p
, which
may be used in the formula
β
. We call such a predicate let-bound. The operator is
Verified First-Order Monitoring with Recursive Rules 239
non-recursive:
p
has the same meaning within
α
as in the surrounding context (unless it
is bound by a nested let in
α
). Although the non-recursive let operator does not enhance
MFOTL’s expressiveness, it improves readability (by using descriptive let-bound pred-
icate names), as well as modularity and evaluation efficiency (by sharing subformulas).
Intuitively, the meaning of
Let p:=αin β
is the same as that of
β
after replacing
all its predicates of the form p
(as)
with the formula
α
, whose free variables have been
replaced with the terms
as
in a capture-avoiding way. The formal syntax does not spec-
ify explicitly how
α
’s free variables map to
p
’s arguments. The mapping is induced
by the De Bruijn indices: the variable with index
0
becomes the first argument, and
so forth. We list the arguments explicitly in examples that use named variables. For
instance, the formula
Let
p
(x):=
p
(x) y.
q
(x,y)in [0,2]
p
(y)
should be equivalent to
[0,2](p(y) z.q(y,z)). We achieve this by defining Lets semantics as follows.
sat σv i (Let p:=αin β) = sat (σ[pVλj.satrel σjα]) v i β
We write
satrel σjα
as an abbreviation for
{v.sat σv j αlength v=nfv α}
, i.e.,
the relation containing the valuations that satisfy
α
. The function
nfv α
returns the
minimum length of
v
needed to cover all of
α
’s free variables, i.e.,
0
if
α
is closed and
Max (fv α) + 1
otherwise. The trace
σ[pVR]
is the same as the trace
σ
except that for
every time-point
i
, the database at
i
maps the predicate name
p
to
R i
, where
R
has type
nat data list set
and is called a temporal relation. Note that the subformula
α
is not
necessarily evaluated at time-point
i
. Instead, the choice of the time-point is deferred
until the predicate
p
is used within
β
, which we achieve by updating the entire trace.
This supports the intuition behind unfolding the let operator
Let p:=αin β
described
above, especially as subformulas p(as)may occur under temporal operators in β.
Implementation. To evaluate an MFOTL formula on a trace, VeriMon computes a
finite set of satisfying valuations (represented by the type
table
) recursively for each
subformula. It applies standard table operations such as the natural join (
) and union.
Tables are sets of tuples, which are lists of optional
data
values (with missing values
denoted by
) and thus refine valuations. This representation allows us to use lists of
the same length for subformulas with different free variables. As with valuations, the
variables’ De Bruijn indices are used to look up their value in a tuple.
VeriMon processes an unbounded trace incrementally. Its interface consists of two
functions
init :: frm state
and
step :: dbs ×ts list state (nat ×table)list ×state
.
The function
init
initializes the monitor’s state (type
state
), and
step
updates it with
a batch of new time-stamped databases to produce a list of new satisfactions. Instead
of
db list
,
step
uses the type
dbs = (string table list)
(a partial mapping from
string
to
table list
) to efficiently retrieve all relations (encoded as tables) associated with a
predicate name at once. Besides some auxiliary data,
state
stores an inductive state of
type
sfrm
that mirrors the inductive representation of formulas, augmented with data
structures for evaluating temporal operators and buffering intermediate results. Inter-
nally
step (dbs,tss)st
calls
eval j n tss dbs sϕ
, where
j
is the combined length of the
trace prefix including the new batch,
n=nfv ϕ
for the monitored formula
ϕ
, and
sϕ
is
the inductive state, all stored in
st
. The function
eval
returns a list of tables with new
satisfactions, as well as the updated inductive state. Satisfactions are reported for every
time-point in order. They may be delayed if the formula contains future operators.
240 S. Zingg et al.
To evaluate
Let p:=αin β
, we use the tables with
α
’s satisfactions to evaluate
p
within
β
, which requires that the tuples in these tables do not have missing values.
Therefore, we require that let operators satisfy
mon (Let p:=αin β) = ({0..< nfv α}
fv αmon αmon β)
. Specifically, the (indices of)
α
’s free variables must not have
gaps. We add the constructor
SLet p m sαsβ
to the inductive state, which stores
p
, the
number
m=nfv α
of free variables in
α
, and the states for subformulas
α
and
β
. It is
initialized by initializing
sα
and
sβ
recursively. The function
eval
evaluates it as follows.
eval j n tss dbs (SLet p m sαsβ) =
let (xs,s0
α) = eval j m tss dbs sα;(ys,s0
β) = eval j n tss (dbs[p7→ xs]) sβ
in (ys,SLet p m s0
αs0
β)
We write
dbs[p7→ xs]
for the partial mapping
dbs
updated at
p
with
xs
. The recursive call
of
eval
on
sα
may return multiple tables in the list
xs
. Note that
step
generalizes the orig-
inal VeriMon interface [3] as it consumes multiple time-stamped databases at once. The
generalized interface of
eval
allows us to pass all tables at once to the recursive call for
sβ
.
Correctness. We relate the outputs of
step
and
sat
to prove our monitor correct. As men-
tioned earlier, the monitor may delay its output. We precisely characterize its progress
for a given formula and trace prefix. Intuitively, the progress is the number of time-points
that the monitor is able to evaluate given a trace prefix. Progress is a useful tool in the cor-
rectness proof as it helps us describe the output at every time-point. Moreover, we show
below that progress can be made arbitrarily large, which is important for completeness.
Formally,
prog σPϕj
is
ϕ
’s progress
iϕ
after reading the first
j
databases of trace
σ
.
We added the partial mapping
P
that assigns to every let-bound predicate its own progress,
i.e., the progress of the formula defining the predicate. For example, the progress of
a predicate
p
that is not let-bound is
j
. Otherwise, it is equal to the progress of the
formula it is bound to (stored in
P p
). The progress of
α
U
[a,b]β
is the smallest
i
such that
τ σ iτ σ (Min {iα,iβ,j1})b
. The progress of both
αβ
and
αβ
is
Min {iα,iβ}
.
The invariant
invar σj P n sϕϕ
relates an inductive state
sϕ
to the formula
ϕ
. The
inductive state must reflect the monitor’s state after processing the first
j
databases in
the trace
σ
, assuming that
P
specifies the let-bound predicates’ progress. The parameter
n
is the length of the tuples stored within
sϕ
. The invariant is defined inductively over
sϕ; we reuse VeriMon’s definition for the MFOTL operators and add a case for Let:
invar σj P m sααinvar (σ[pVλi.satrel σiα]) j(P[p7→ prog σPαj]) n sββ
m=nfv α{0..< m} fv α
invar σj P n (SLet p m sαsβ) (Let p:=αin β)
The first two premises restrict the subformula states
sα
and
sβ
, where
sβ
reflects the eval-
uation of
β
on the modified trace, and
p
’s progress is that of
α
. The premise
m=nfv α
enforces that mis equal to p’s arity, and {0..< m} fv αis the constraint from mon.
Our extensions preserve the monitor’s correctness: we formally proved the theorem
below, which characterizes the monitor’s
eval
function. The theorem is stated here for
the empty progress mapping
, which must be generalized in the proof (as
P
changes in
the above rule). Let
δ
be a natural number and
ϕ
be a monitorable formula with
n=nfv ϕ
.
The function the maps the optional value hxito xand to some unspecified value.
Verified First-Order Monitoring with Recursive Rules 241
Theorem 1. (a) invar σ0n s0
ϕϕfor the initial state s0
ϕ. (b) Suppose that sϕsatisfies
invar σjn sϕϕ
and that
dbs
contains all relations from
σ
for the indices in the list
js = [ j..< j+δ]
. Then
(xs,s0
ϕ) = eval (j+δ)n(map (τ σ)js)dbs sϕ
satisfies
invar σ(j+
δ)n s0
ϕϕ
, and the
i
-th table in the list
xs
, for
prog σϕji<prog σϕ(j+δ)
,
contains (only) all tuples vof length nsatisfying sat σ(map the v)σiϕ.
Soundness follows immediately from Thm. 1, whereas completeness additionally re-
quires the aforementioned property that any progress can be reached by making the trace
prefix long enough, which we also proved for our modified progress function:
Theorem 2. If mon ϕ, then for all ithere exists a jsuch that prog σϕji.
4 Past-Recursive Let Operator
It is well-known that first-order logic (FOL) cannot express certain queries, notably the
transitive closure of a binary relation. This remains true when restricted to finite struc-
tures [18]. Although MFOTL is rather different from ordinary FOL, we conjecture that
it cannot express transitive closure either. This hampers its ability to model hierarchies
of unbounded depth. Moreover, recursive patterns are sometimes the most natural way
to express certain specifications. We describe an extension of MFOTL that can encode a
“temporally directed” form of transitive closure and other recursive patterns.
Specifically, we introduce another let operator in which the predicate may refer to
itself recursively. The intended semantics is that of a fixpoint, i.e., the predicate
p
defined
by a formula
α
should be interpreted by a temporal relation that is equal to the evaluation
of
α
under that interpretation of
p
. The fixpoint might not always exist or it might not
be unique. Therefore, different fixpoint operators have been studied in the context of
nontemporal logics and query languages [1]. For instance, it is common to require that all
recursive occurrences of
p
in its defining formula are positive, i.e., under an even number
of negations. This ensures monotonicity and hence the existence of a least fixpoint.
MFOTL’s future operators are interpreted over infinite traces. This poses a new chal-
lenge for monitoring recursively defined predicates, even if we restrict our attention to
positive formulas. Consider the recursive definition of
p
by
q#[0,]p
, where
q
is a pred-
icate from the trace. Although
q#[0,]p
is monitorable (at most one additional time-
point must be known to evaluate it), the recursive definition of
p
is equivalent to
[0,]q
under the least fixpoint semantics. However,
[0,]q
is not monitorable, as one might need
the entire, infinite trace to evaluate it. Therefore, we focus on a fragment where every re-
cursive occurence of
p
must be strictly in the past. This guarantees a unique fixpoint even
if the defining formula is not monotone, so the predicate may occur negatively as well.
The syntax of our past-recursive let operator is similar to the one of
Let
: we add the
constructor
LetPast string :=frm in frm
to the
frm
datatype. However, the semantics is
different (Section 4.1). The restriction to strictly past recursion is enforced by a syntactic
monitorability condition that is checked by
mon
. Consider the formula
LetPast p:=
αin β
. Intuitively, every recursive occurrence of
p
in
α
must be guarded by at least
one strictly past operator, and there must be no future operator on the path from the
occurrence to α’s root. We do allow future operators in the other parts of α, though.
We give examples of
LetPast
in Section 4.2. The evaluation of
LetPast
requires an
extension of VeriMon’s algorithm (Section 4.3), which we also formally prove correct.
242 S. Zingg et al.
datatype recSafety =
U|P|NF |A
fun ():: recSafety
recSafety
recSafety
where
U_=U
|_U=U
|A_=A
|_A=A
|P_=P
|_P=P
|NF NF =NF
fun slp :: string frm recSafety where
slp p(q(as)) = (if p=qthen NF else U)
|slp p(Let q:=αin β) =
(slp qβslp pα)t(if p=qthen U else slp pβ)
|slp p(LetPast q:=αin β) =
(if p=qthen U else (slp qβslp pα)tslp pβ)
|slp p(t1t2) = U|slp p(y(t;b)ϕ) = slp pϕ
|slp p(¬ϕ) = slp pϕ|slp p(ϕ) = slp pϕ
|slp p(αβ) = slp pαtslp pβ
|slp p(αβ) = slp pαtslp pβ
|slp p( Iϕ) = Pslp pϕ|slp p(#Iϕ) = Aslp pϕ
|slp p(αSIβ) = slp pαt((if 0IIthen NF else P)slp pβ)
|slp p(αUIβ) = A(slp pαtslp pβ)
Fig. 2. Auxiliary definitions for the syntactic restriction on LetPast
4.1 Semantics
The semantics of the past-recursive let operator is defined by the equation
sat σv i (LetPast p:=αin β) = sat (σ[pVrecp (λR j.satrel (σ[pVR]) jα)]) v i β
We evaluate
β
at the same time-point
i
as the recursive let operator using an appropriately
updated trace. The temporal relation assigned to
p
is computed by the combinator
recp
:
fun recp :: ((nat data list set)nat data list set)nat data list set where
recp f v i =f(λj.if j<ithen recp f j else {})i
The argument
f
is a function that transforms temporal relations, and
recp f
returns again
a temporal relation. Intuitively,
recp f
evaluates to the fixpoint
f(recp f)
, except that
f R i
can only access time-points of
R
before
i
. For all other time-points
ji
, the relation
R j
is empty. The combinator
recp
is well-defined because
i
is a natural number; the
recursive call
recp f j
affects the result only if
j<i
and hence we can prove termination
using
i
as a variant. For the semantics of
LetPast
, we choose
f R i =satrel (σ[pVR]) iα
,
i.e., the satisfactions of
α
with
p
mapped to
f
’s argument
R
, to which
recp
supplies the
result of the recursive evaluation (up to but excluding i).
Our definition of
sat
is total: it gives meaning to every formula. This includes for-
mulas
LetPast p:=αin β
where
p
occurs in
α
without a past guard or under a future
operator. However, the semantics behaves unexpectedly in such cases. For example,
LetPast p:= (q#[0,]p)in p
is equivalent to
q
. Our monitor therefore requires properly
guarded formulas. Not only does this avoid confusion about the semantics, it also simpli-
fies the implementation because the monitor need not eliminate unguarded occurrences.
Next, we describe the formalization of the syntactic restriction. The idea is to deter-
mine for every predicate whether it is used strictly in the past by analyzing the formula
recursively. The datatype recSafety (Fig. 2) represents the possible outcomes. U(nused)
means that a predicate does not occur in the formula. P(ast) means that it is evaluated
at strictly earlier time-points, whereas
NF
(Non-Future) additionally allows the current
time-point. A(ny) covers all remaining cases. The linear order
<
on
recSafety
is induced
by U
<
P
<NF <
A. Its reflexive closure
corresponds to implication. For example, if
the predicate
p
is unused (U), it is clearly evaluated at earlier time-points only (P). The
least upper bound xtywith respect to corresponds to logical disjunction.
Verified First-Order Monitoring with Recursive Rules 243
The function
slp pϕ
(Fig. 2) analyzes the past-guardedness of a predicate
p
in a for-
mula
ϕ
. It uses a composition operator
yx
on
recSafety
. The patterns in the definition of
should be matched sequentially from top to bottom; e.g., A
Uis equal to U. Intuitively,
yx
describes the guardedness of a predicate that is
x
-used in some subformula, which
is then
y
-used. For example,
slp p( Iϕ) =
P
slp pϕ
because
ϕ
and all occurences of
p
therein are evaluated at time-points that are strictly in the past relative to
Iϕ
. Note that
we make a case distinction for
α
S
Iβ
: if the interval
I
excludes zero,
β
is always evaluated
strictly in the past. Future operators always result in Aif pis used in an operand.
Finally, we define the mon predicate for the recursive let operator:
mon (LetPast p:=αin β)=(slp pα
P
{0..< nfv α} fv αmon αmon β)
The only difference to
Let
is the restriction of
p
’s occurrences in
α
via
slp
, which is
generally an over-approximation. For example,
slp p( I I#Ip) =
Aeven though
p
is
evaluated at strictly earlier time-points. Therefore, some instances of
LetPast
that our
algorithm could evaluate correctly are not considered to satisfy
mon
. We plan to replace
recSafety with a more precise lattice in future work.
4.2 Examples
Temporal Operators. We first show that the non-metric Soperator can be reduced to
LetPast
and
. (We omit the interval subscripts if the interval is
[0,]
.) Using the special
ts(t)
predicate, which is true iff
t
is the current time-stamp, we can also express the metric
version. This example serves to gently illustrate the semantics of
LetPast
. In general, for-
mulas are more readable if they are directly expressed in terms of S, and monitoring can
be more efficient. Below we give further examples in which
LetPast
adds expressiveness.
Let
α
and
β
be two monitorable MFOTL formulas with free variables
fv α
and
fv β
,
respectively. The formula
α
S
β
is monitorable only if
fv αfv β
, so let us assume that,
too. The following unfolding of Ss semantics is well-known:
sat σv i (αSβ) sat σv i βsat σv i αi>0sat σv(i1) (αSβ)(1)
As the unfolding recursively evaluates the formula at the previous time-point, we can
directly translate it into a recursive let operator:
ϕSLetPast
s
(x):=ψin
s
(x)
, where
ψβ(α
s
(x))
. The predicate name
s
must be fresh, i.e., it must not occur in
α
nor
β
. The variable list
x
enumerates
fv β
. The formula
ϕS
is monitorable because
s
(x)
is
clearly past-guarded, and hence
slp sψ=
P. (We also need
fv β={0..< nfv β}
, which
can be achieved by renaming variables in αand β.) Let us analyze the semantics of ϕS:
sat σv i ϕS sat (σ[sVrecp (λR j.satrel (σ[sVR]) jψ)
| {z }
=fψ
]) v i (s(x))
vrecp fψi
sat (σ[sVλj.if j<ithen recp fψjelse {}]) v i ψ
()
sat σv i βsat σv i αi>0
v(if i1<ithen recp fψ(i1)else {})
sat σv i βsat σv i αi>0sat σv(i1)ϕS
These equations hold for all valuations
v
of length
nfv β
and if the variables
x
are ordered
by their De Bruijn indices. Step
()
exploits the freshness of
s
with respect to
α
and
β
,
244 S. Zingg et al.
which allows us to replace
σ[sV...]
by
σ
. The equations result in the same unfolding
as
(1)
. Hence, we can prove the semantic equivalence of
ϕS
and
α
S
β
by induction on
i
.
The following SinceLet formula encodes
α
S
[a,b]β
. Other encodings exist, however.
LetPast s(x,t):= (βts(t))(α s(x,t)) in t,u.s(x,t)ts(u)aututb
Here,
t
and
u
are fresh variables, where
t
records the time-stamp of the past satisfaction of
β
, whereas
u
is the time-stamp at which we evaluate
SinceLet
. The subformula
aut
utb
corresponds to T
σj
T
σiI[a,b]
, which is part of S
[a,b]
’s semantics (Fig. 1).
Temporally-Directed Transitive Closure. We proceed by showing that
LetPast
can
compute a temporally-directed transitive closure over events observed at a sequence of
distinct time-points. Hence, we assume that the trace contains a single event at every
time-point. The closure is directed in the sense that the transitive chains can only be
extended by newer events. We consider the following two types of events from [14]:
r
(y,x,d)
denotes that process
y
reports some data
d
to another process
x
, and s
(x,y)
denotes that process xspawns process y. The Spawn formula
LetPast p(u,v):=s(u,v)( p(u,v)) (t.( p(u,t)) s(t,v)) in r(y,x,d) ¬p(x,y)
encodes violations of the property that whenever process
y
sends some data
d
to a process
x
, denoted as r
(y,x,d)
, then there was a chain of process spawns: s
(x,x1),
s
(x1,x2),...,
s
(xk,y)
, occurring in this order in the trace. In other words, a process may only send data
to its “ancestors”. To check this property, a monitor needs to compute the (temporally-
directed) transitive closure p
(u,v)
of the relation
s
. The definition of the closure has two
recursive predicate instances with different arguments. The Spawn formula is inspired
by a similar one used to evaluate the DejaVu monitor [14]. Unlike DejaVu, we do not
require the formula to be closed and thus leave the variables x,y, and dfree.
The Trans formula
LetPast p(u,v):=s(u,v)( p(u,v))
(t.( p(u,t)) s(t,v)) (t.s(u,t)( p(t,v)))
(t,t0.( p(u,t)) s(t,t0)( p(t0,v))) in r(y,x,d) ¬p(x,y)
encodes violations of the same property as Spawn even if s
(x,x1),
s
(x1,x2),...,
s
(xk,y)
are received by the monitor out-of-order, i.e., they do not occur in this order in the trace.
We can interpret the events s
(x,y)
as edges in a directed graph and the predicate
p
(x,y)
in Trans as computing the reachability of vertices in the directed graph. We also
extend the directed edges s
(x,y)
with a weight
w
to
s+(x,y,w)
. Then the
Trans+
formula
LetPast p(u,v,w):=s+(u,v,w)( p(u,v,w))
(t,w1,w2.( p(u,t,w1)) s+(t,v,w2)w=w1+w2)
(t,w1,w2.s+(u,t,w1)( p(t,v,w2)) w=w1+w2)
(t,t0,w1,w2,w3.( p(u,t,w1)) s+(t,t0,w2)( p(t0,v,w3))
w=w1+w2+w3)in
Let m(u,v,w):=wMIN(w;u,v).p(u,v,w)in m(x,y,w)∧¬( m(x,y,w))
yields all pairs of vertices
x
,
y
and the length
w
of the shortest path from
x
to
y
whenever
y
becomes reachable from
x
or the length of the shortest path changes. The relation
Verified First-Order Monitoring with Recursive Rules 245
s+(x,y,w)
can itself be obtained by evaluating a more complex temporal formula, e.g.,
s+(x,y,w)
e
(x,y,w)¬ [0,10]
d
(x,y)
with the following two types of events: e
(x,y,w)
denotes an edge from
x
to
y
with weight
w
;d
(x,y)
denotes deletion of the edge from
x
to
y
. The eventually operator
Iϕ
abbreviates
(x.x=x)
U
Iϕ
. Such a relation
s+(x,y,w)
contains all edges that are not revoked within
10
time units after receiving e
(x,y,w)
. We
could use the non-recursive let operator
Let s+(x,y,w):=
e
(x,y,w) ¬ [0,10]
d
(x,y)
to
precompute the relation and use it when evaluating the recursive let operator in
Trans+
.
As another application of future operators under
LetPast
, recall our introductory
example. Suppose that hosts in a network communicate with each other and with the out-
side world:
comm(src,dest)
indicates that host
src
sends a message to host
dest
;
in(r,h)
and
out(h,r)
indicate that the host
h
receives or sends traffic from or to an IP address in
the range
r
, respectively. The hosts are equipped with an intrusion detection system (IDS),
whose alerts are denoted by
ids(h)
. We say that a host
h
is tainted by an address range
r
iff
there is a chain of communication from
r
to
h
and all hosts on the chain (including
h
) trig-
ger an IDS alert within one hour after communicating with the previous host. The formula
LetPast taint(r,h):=in(r,h) h0.( taint(r,h0)) comm(h0,h)[0,1h]ids(h)
( taint(r,h)) in taint(r,h)out(h,r)
is true whenever a host communicates back to the IP range by which it was tainted.
Periodic Behavior. Suppose that we monitor a boolean signal b
(x)
, parametrized by an
integer parameter
x
, between the user’s
start(x)
and
stop(x)
commands. An arbitrary
amount of time may pass between these two commands. Our task is to detect periodic
activations of b
(x)
, with a fixed period
t>0
and error tolerance
0ε < t
. We shall
ignore positive noise in b(x), i.e., additional activations besides the periodic ones.
Let us make the task more precise. An alarm must be raised at time-point
in
iff
there exist time-points
i0<i1<·· · <in
such that
start(x)
holds at
i0
,
stop(x)
holds at
in
,
and b
(x)
holds at all
ik
for
1kn1
. Moreover, the difference of time-stamps for
adjacent time-points
ik
and
ik+1
, where
1kn2
, must be in the interval
[tε, t+ε]
;
the differences for the pairs i0,i1and in1,inmust each be at most t+ε.
Our first attempt PB to formalize the alarm condition without recursion is
stop(x)I(start(x)b(x))(b(x) (Istart(x)) (Jb(x))) Sstart(x)
where
I= [0,t+ε]
,
J= [tε, t+ε]
, and
Kϕ
abbreviates
(x.x=x)
S
Kϕ
. This formula
follows an inductive approach: every b
(x)
between
start(x)
and
stop(x)
must be preceded
by b
(x)
or
start(x)
, with the appropriate time difference. However,
PB
does not ignore
noise, as adding b
(x)
events to the trace may silence an alarm. For example, let
t=10
,
ε=
0
, and
σ
be a trace starting with
({start(1)},0),({
b
(1)},10),({stop(1)},20)
. We write
{p(1),p(2)}
for the database where the predicate
p
holds for
1
and
2
. On
σ
,
PB
is true
at the third time-point. Inserting a database
{
b
(1)}
with time-stamp
15
falsifies
PB
at the
now fourth time-point, although the trace still satisfies the natural language description.
The following PBLet formula expresses the intended condition using LetPast:
LetPast periodic(x):=start(x)b(x)(Istart(x)) (Jperiodic(x)) in
stop(x)Iperiodic(x)
This example depends crucially on the flexible past guards we support: here, the recursion
goes through with an interval constraint. Note that 0 6∈ Jbecause we assumed ε < t.
246 S. Zingg et al.
As another example of periodic behavior, we analyze an integer-valued
signal(y)
between the (now non-parametric) commands
start
and
stop
. We aim to discover whether
signal(y)
is piecewise constant, with the constant segments being exactly
t
time units
long. Moreover, the signal’s values for subsequent segments must differ by at most
δ
. The
next formula uses the general Soperator as the recursion guard to capture this property.
LetPast segment(y):=z.signal(y)( signal(z)) S[0,t](signal(z) start)
( signal(z)) S[t,t]segment(z) δyzyzδin
stop y.(( signal(y)) S[0,t]segment(y))
Turing Machines. Every MFOTL formula can be viewed as a function on traces, where
the function’s output is the set of satisfying valuations, either at a fixed or at all time-
points. VeriMon’s monitorable fragment guarantees that one can compute the valuation
at every time-point. Thus, monitorable formulas correspond to computable functions. If
we give up on the requirement that the function’s output must be available at a fixed time-
point, the past-recursive let operator is expressive enough to simulate arbitrary Turing ma-
chines (TM). This is not a contradiction: we simulate a single TM step at every time-point,
and there is an infinite supply of time-points. Running the monitor on a configuration that
does not halt will never produce an output, i.e., a nonempty set of satisfying valuations.
Let
M=hΣ, b,Q,q0,qf, δi
be a deterministic TM with tape alphabet
Σ
, blank symbol
bΣ
, control states
Q
, initial state
q0Q
, final state
qfQ
, and transition function
δ(Q×ΣQ×Σ× {−1,0,1})
. Whenever the machine is in state
q1
and reads the
symbol
s1
, it enters state
q2
, writes the symbol
s2
, and moves the head by
m
tape cells to
the right, where
δ(q1,s1) = hq2,s2,mi
. Without loss of generality, we assume that
Σ
and
Qare finite subsets of the integers. We simulate Musing the formula ϕMshown below.
LetPast cfg(q,i,s):=
Let cfg(q,i,s):= cfg(q,i,s)in
Let head(q,s):=cfg(q,0,s)¬(x,z.cfg(x,0,z)) (y,z.cfg(q,y,z)) s=bin
input(i,s)q=q0
_
q1,s1
δ(q1,s2)=hq2,s2,mi
head(q1,s1)q=q2(i=ms=s2)
(j.cfg(q1,j,s)j6=0i=jm)in cfg(qf,i,s)
The idea is that
cfg
represents the current configuration of the TM. Specifically,
cfg(q,i,s)
holds if the machine is in control state
q
and the tape contains the sym-
bol
s
in the
i
th cell to the right of the head (
i
may be negative). Note that we use
nested, non-recursive let operators to abbreviate repeated subformulas. In the body of
Let cfg(q,i,s):= cfg(q,i,s)in ...
, the predicate
cfg
refers to the previous configuration.
The predicate
head
provides the current state and the symbol under the head. Its definition
extends the tape by a blank symbol if necessary. The simulation is started at time-point
0
by providing the tape’s initial content in the predicate
input
, which must include the cell
input(0,s0)
with the symbol
s0
under the head’s initial position. If and only if
M
halts
on this input, there exists a time-point
i
at which
ϕM
is satisfied by at least one valuation
(i,s). Moreover, the satisfying valuations at irepresent the final state of the tape.
Verified First-Order Monitoring with Recursive Rules 247
4.3 Algorithm
The restriction to past-guarded recursion allows for an efficient evaluation algorithm for
LetPast
formulas. It is efficient because no fixpoint iteration is required at individual
time-points. To evaluate
LetPast p:=αin β
, we first try to evaluate
α
for as many time-
points as possible and then use the results to interpret
p
in
β
. This part is the same as for
the non-recursive
Let
, but the evaluation of
α
itself differs. The syntactic monitorability
condition guarantees that
α
at time-point
i
depends on the predicate
p
only for time-
points strictly less than
i
. Specifically, we have defined
mon (LetPast p:=αin β)
such
that the progress of
α
’s evaluation does not depend on
p
’s progress beyond time-point
i1
. Therefore, we can evaluate
α
at time-point
0
without providing any table for
p
,
then use the result to evaluate αat time-point 1, and so forth.
There are two cases that require care. First, if
α
contains future operators, multiple
time-points may be evaluated at once. The above process must then be repeated within a
single monitor step. Second, if
α
contains no future operators,
α
is evaluated at all time-
points
i<j
, where
j
is the current trace prefix length. We could then attempt to evaluate
α
once more at time-point
j
using the table computed at
j1
for
p
. However, this would not
yield any further tables because all occurrences of
p
are below at least one past operator
that tries to access the time-stamp at time-point
j
, which is not yet known. Therefore,
this last evaluation attempt would needlessly traverse the formula state. We optimize this
case and buffer αs result at time-point j1 until the next input database arrives.
It is crucial that the evaluation of a recursive let does not get stuck waiting for tables
that it needs to produce itself. Therefore, all operators that are strictly past-guarding as
defined by
slp
(Fig. 2) must be well-behaved: the evaluation algorithm must compute a
result at time-point
i<j
even if the operands’ results are available only for time-points
i0<i
. In particular, S
I
without
0
in the interval is considered strictly past-guarding. We
have modified VeriMon’s evaluation algorithm for αSIβto achieve this behavior.
The inductive state
SLetPast p m sαsβibuf
for a recursive let operator extends
SLet
with a counter
i:: nat
, which tracks the progress of
p
as observed by
sα
, and an
optional buffer
buf :: table option
. The meaning of the other arguments is the same as
for
SLet
. In the initial state,
i
is zero and
buf
is
. Let the function
list_opt
map
to
[]
and
hxi
to
[x]
, where
hxi
is the embedding of
x
into the
option
type. A single monitor
step updates the state as follows (see Section 3for a description of evals interface):
eval j n tss dbs (SLetPast p m sαsβibuf ) =
let (xs,s0
α,i0,buf 0) = evalLP j m tss dbs p[] sαi(list_opt buf );
(ys,s0
β) = eval j n tss (dbs[p7→ xs]) sβ
in (ys,SLetPast p m s0
αs0
βi0buf 0)
The heavy lifting is performed by
evalLP
, which is mutually recursive with
eval
. We
forward relevant variables from
eval
. The accumulator
xs :: table list
collects
sα
’s results.
evalLP j m tss dbs pxs sαibuf =
let (xs0,s0
α) = eval j m tss (dbs[p7→ buf ]) sα;i0=i+length buf
in case xs0of [] (xs,s0
α,i0,)
|x# _ (if i0+1jthen (xs @xs0,s0
α,i0,hxi)
else evalLP m j [] (clear_dbs dbs)p(xs@xs0)s0
αi0xs0)
248 S. Zingg et al.
First,
evalLP
evaluates
sα
with
dbs
updated at
p
using the current buffer, which may
be empty. Since
i
tracks
p
’s progress, we then increase its new value
i0
by the length
of
buf
. The evaluation results in a list
xs0
of tables and a new state
s0
α
. We continue to
iterate
evalLP
only if two conditions are met:
xs0
must be nonempty, as otherwise there is
no new data to evaluate
s0
α
on, and
i0+1
must be less than the current input prefix length.
The latter condition serves as an obvious termination criterion, although it is stricter than
necessary. We could perform an additional iteration in the case that
i0+1=j
. However,
such an iteration would never produce new results because the past operators guarding
p
can only be evaluated further if there are new time-stamps. Therefore, we optimize this
case by choosing the stricter condition. If we continue the iteration, we append
xs0
to the
accumulator
xs
. Moreover, we clear
tss
and
dbs
because all tables from the new input
database have already been processed by the first call to
eval
. Specifically, the function
clear_dbs dbs updates dbs at all points at which it is defined to an empty list.
We illustrate our algorithm with an example, tracing the computations of
eval
and
evalLP
. We evaluate
LetPast
p
(x):=
q
(x)
p
(x)in
p
(x)
, which has the same semantics
as
[0,]
q
(x)
, on a prefix with two time-points at time-stamps
0
and
3
. We omit details
about the subformulas’ states, as well as brackets around singleton lists, i.e.,
[1]
is
displayed as 1. Let dbs0={q7→ [{1},{2}]}be the content of the trace prefix.
eval j:2n:1tss:[0,3]dbs:dbs0sϕ:(SLetPast p1α0β00)
|evalLP j:2m:1tss:[0,3]dbs:dbs0p:pxs:[] sα:α0i:0buf :[]
| | eval j:2n:1tss:[0,3]dbs:(dbs0[p7→ []]) sϕ:α0= ([{1}], α1)
| | evalLP j:2m:1tss:[] dbs:{q7→ []}p:pxs:[{1}]sα:α1i:0buf :[{1}]
|||eval j:2n:1tss:[] dbs:{p7→ [{1}],q7→ []}sϕ:α1= ([{1,2}], α2)
|||iteration stops because i0=1 and hence i0+1=2j=2
| | = ([{1},{1,2}], α2,1,h{1,2}i)
|= ([{1},{1,2}], α2,1,h{1,2}i)
|eval j:2n:1tss:[0,3]dbs:(dbs0[p7→ [{1},{1,2}]]) sϕ:β0= ([{1},{1,2}], β2)
= ([{1},{1,2}],SLetPast p1α2β21h{1,2}i)
Correctness. We extended the correctness proof of
eval
(Thm. 1) to cover the new state
constructor
SLetPast
. The added case differs from the one for the non-recursive let in
that
evalLP
is used to evaluate the first subformula. The proof also required additional
invariants for the
i
and
buf
arguments of
SLetPast
, as well as a characterization of
LetPast
’s progress. Recall that progress describes the number of time-points that the
monitor is able to evaluate given a trace prefix of length
j
. We express the progress of
the let-bound predicate p, which is defined in terms of α, as a least fixpoint:
progLP σP p αj=l{i.i=prog σ(P[p7→ i]) αj}
prog σP(LetPast p:=αin β)j=prog σ(P[p7→ progLP σP p αj]) βj
(We do not update
σ
in these definitions as progress depends only on the time-stamp
sequence but not on the databases in
σ
.) The above characterization follows the iteration
in
evalLP
: Since
prog
is pointwise monotone in
P
and at most
j
(both facts we prove in the
formalization), the fixpoint can be reached by iteratively computing
prog σ(P[p7→ i]) αj
starting with
i=0
. Similarly,
evalLP
starts by evaluating
α
with no data for
p
and it feeds
the results back into the evaluation until no further results can be obtained. Theorem 2
remains true after adding the above equation to prog.
Verified First-Order Monitoring with Recursive Rules 249
The state invariant for SLetPast is given by the rule
invar σ[pVrecp (λR k.satrel (σ[pVR]) kα)]j(P[p7→ i]) m sαα
invar σ[pVrecp (λR k.satrel (σ[pVR]) kα)]j(P[p7→ progLP σP p αj]) n sββ
buf = i=progLP σP p αj
Z.buf =hZi i+1=progLP σP p αj
table m(fv α) (recp (λR k.satrel (σ[pVR]) kα)) Z
m=nfv αslp pαP{0..< m} fv α
invar σj P n (SLetPast p m sαsβibuf ) (LetPast p:=αin β)
The first two premises use the same updated trace as in the semantics of
LetPast
(Section 4.1). The updated progress for
p
differs slightly between the premise for
sα
and
that for
sβ
. For the latter it is given by
progLP
, as expected. The predicate
p
’s progress
within
sα
is equal to the state variable
i
, which is one less than
progLP σP p αj
if the
buffer
buf
is nonempty. This reflects to the optimization discussed in Section 4.3. The
predicate
table A n R Z
is true iff the table
Z
contains tuples of length
n
that assign values
to variables Aand they are exactly the tuples of this kind satisfying map the vR.
5 Evaluation
We have used Isabelle/HOL’s code generator [12] to export a certified implementation of
VeriMon’s core
init
and
step
functions and every function those depend on (e.g., opera-
tions on red-black trees), which amounts to about
10000
lines of OCaml code. VeriMon
augments this generated code with unverified parsers and pretty-printers. We evaluate
this implementation to answer the following research questions: (1) How does VeriMon
perform when monitoring formulas with the recursive let operator?; and (2) How does it
compare to existing monitors for temporal first-order specifications with recursive rules?
To answer these questions, we run VeriMon and DejaVu and benchmark some of
the example formulas introduced in Section 4.2. Instead of SinceLet, we opt for the
simpler OnceLet =
LetPast
o
(u,v):=
s
(u,v)
o
(u,v)in filter(x,y)
o
(x,y)
encoding
the non-metric
operator. We also include Once =
filter(x,y)
s
(x,y)
for comparison.
The predicate
filter(x,y)
keeps the output size small. The OnceLet formula uses only
one recursive predicate instance, whose variable order matches the one in the predicate’s
definition. Other formulas have more than one instance with different variable orders.
For the PBLet formula, we use an existing random trace generator [17] configured
to pick parameters from a small integer domain, which increases the probability of
producing satisfactions. For the other formulas, we generate traces using a similar strategy
to the one used in DejaVu’s benchmarks on the Spawn formula [14]. Namely, edges of a
tree of spawned processes with a configurable branching factor are linearized into a trace,
level by level. In the final level all edges converge to a single node for the formulas Trans
and
Trans+
. We define the edges by
Let s+(x,y,w):=
e
(x,y,w) ¬ [0,10]
d
(x,y)
in the
Trans+formula and revoke one half of the edges on the second level of the branching.
We have executed our experiments on an Intel Core i5-4200U CPU using 8 GB
RAM. Initially, DejaVu crashed on the OnceLet and Spawn formulas. We investigated
the issue and found that its formula’s abstract syntax tree was disconnected in these cases.
We assume that this is caused by naming variables in the recursive rules’ definitions
250 S. Zingg et al.
Trace Once OnceLet Spawn Trans Trans+PBLet
length VeriMon DejaVu VeriMon DejaVu VeriMon DejaVu VeriMon DejaVu VeriMon VeriMon
100 0.0 1.1 0.0 1.1 0.6 1.5 1.3 3.7 5.6 0.0
200 0.0 1.2 0.0 1.2 3.1 2.1 6.1 8.1 25.9 0.0
400 0.0 1.3 0.0 1.3 14.0 3.4 28.3 23.6 117.4 0.0
800 0.0 1.5 0.0 1.4 64.8 8.2 TO 83.4 TO 0.0
4000 0.2 41.3 0.1 40.5 TO TO TO TO TO 0.1
8000 0.4 TO 0.2 TO TO TO TO TO TO 0.1
10000 0.5 TO 0.3 TO TO TO TO TO TO 0.2
Fig. 3. Execution times of the monitors in seconds (TO = timeout of 120 seconds)
differently from those in the rules’ usages. After renaming the variables in the let-bound
predicates of these two formulas, the issue was fixed and we restarted the experiments.
The evaluation results (Figure 3) show that DejaVu’s performance is incomparable to
VeriMon’s. VeriMon outperforms DejaVu on the formulas Once and OnceLet and scales
well on PBLet, which, together with the
Trans+
formula, we could not express in PFLTL
with recursion. DejaVu outperforms VeriMon on the Spawn and Trans formulas for which
VeriMon’s time complexity of processing one event is linear in the trace length because
the number
N
of valuations satisfying the recursive predicates grows linearly in the trace
length and the time complexity of updating the recursive predicate is linear in
N
. We
conjecture based on some preliminary experiments that VeriMon’s performance can be
significantly improved by optimizing the representation of sets of tuples in two ways: (a)
using tuples of a fixed length with a fixed assignment of variables to positions in a tuple
(i.e., no De Bruijn indices); (b) using a collection of indices to optimize the computation
of joins on various sets of shared columns. Nevertheless, processing one event can
unlikely be made trace-length independent: Trans encodes the incremental dynamic
transitive closure graph problem, with the best known algorithm processing every new
edge in the input in amortized linear time (in the graph’s maximum out-degree) [23].
6 Conclusion
We have presented the extension of a monitor for MFOTL with non-recursive and past-
recursive let operators. The presence of bounded future temporal operators complicates
both the semantics and the evaluation algorithms for the new constructs, compared to
earlier unverified extensions of past-only monitors [14]. Yet, the formal correctness
proofs that we have carried out ensure the trustworthiness of our development.
As future work we plan to improve the performance of evaluating expensive joins
by introducing indices, as used in database management systems. Expressiveness-wise
we will consider further relaxing the requirements on the recursive let. We can omit the
past guard if we define a Datalog-style fragment for which the fixpoint is well-defined.
Beyond relaxing guards, we may want to allow recursion through future operators
in certain situations. The main challenge is that this would make the progress notion
data-dependent (unlike currently, where it only depends on the time-stamps).
Acknowledgments We thank David Basin for supporting this work and the anonymous
TACAS reviewers for their helpful comments. Dmitriy Traytel is supported by a Novo
Nordisk Fonden Start Package Grant (NNF20OC0063462).
Verified First-Order Monitoring with Recursive Rules 251
References
1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
2.
Barringer, H., Goldberg, A., Havelund, K., Sen, K.: Rule-based runtime verification. In:
Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 44–57. Springer (2004).
https://doi.org/10.1007/978-3-540-24622-0_5
3.
Basin, D., Dardinier, T., Heimes, L., Krsti ´
c, S., Raszyk, M., Schneider, J., Traytel, D.:
A formally verified, optimized monitor for metric first-order dynamic logic. In: Peltier,
N., Sofronie-Stokkermans, V. (eds.) IJCAR 2020. LNCS, vol. 12166, pp. 432–453. Springer
(2020). https://doi.org/10.1007/978-3-030-51074-9_25
4.
Basin, D., Klaedtke, F., Müller, S., Z˘
alinescu, E.: Monitoring metric first-order temporal
properties. J. ACM 62(2), 15:1–15:45 (2015). https://doi.org/10.1145/2699444
5.
Basin, D., Klaedtke, F., Z˘
alinescu, E.: The MonPoly monitoring tool. In: Reger, G., Havelund,
K. (eds.) RV-CuBES 2017. Kalpa Publications in Computing, vol. 3, pp. 19–28. EasyChair
(2017). https://doi.org/10.29007/89hs
6.
Convent, L., Hungerecker, S., Leucker, M., Scheffel, T., Schmitz, M., Thoma, D.: TeSSLa:
Temporal stream-based specification language. In: Massoni, T., Mousavi, M.R. (eds.) SBMF
2018. LNCS, vol. 11254, pp. 144–162. Springer (2018). https://doi.org/10.1007/978-3-030-
03044-5_10
7.
Cucala, D.J.T., Walega, P.A., Grau, B.C., Kostylev, E.V.: Stratified negation in Datalog with
metric temporal operators. In: AAAI 2021. pp. 6488–6495. AAAI Press (2021)
8.
D’Angelo, B., Sankaranarayanan, S., Sánchez, C., Robinson, W., Finkbeiner, B., Sipma, H.B.,
Mehrotra, S., Manna, Z.: LOLA: runtime monitoring of synchronous systems. In: TIME 2005.
pp. 166–174. IEEE Computer Society (2005). https://doi.org/10.1109/TIME.2005.26
9.
De Giacomo, G., Vardi, M.Y.: Linear temporal logic and linear dynamic logic on finite traces.
In: Rossi, F. (ed.) IJCAI 2013. pp. 854–860. IJCAI/AAAI (2013)
10.
Falcone, Y., Krsti´
c, S., Reger, G., Traytel, D.: A taxonomy for classifying run-
time verification tools. Int. J. Softw. Tools Technol. Transf.
23
(2), 255–284 (2021).
https://doi.org/10.1007/s10009-021-00609-z
11.
Gorostiaga, F., Sánchez, C.: Stream runtime verification of real-time event streams
with the striver language. Int. J. Softw. Tools Technol. Transf.
23
(2), 157–183 (2021).
https://doi.org/10.1007/s10009-021-00605-3
12.
Haftmann, F.: Code generation from specifications in higher-order logic. Ph.D. thesis, Techni-
cal University Munich (2009)
13.
Havelund, K.: Rule-based runtime verification revisited. Int. J. Softw. Tools Technol. Transf.
17(2), 143–170 (2015). https://doi.org/10.1007/s10009-014-0309-2
14.
Havelund, K., Peled, D.: An extension of LTL with rules and its application to runtime
verification. In: Finkbeiner, B., Mariani, L. (eds.) RV 2019. LNCS, vol. 11757, pp. 239–255.
Springer (2019). https://doi.org/10.1007/978-3-030-32079-9_14
15.
Havelund, K., Peled, D., Ulus, D.: First-order temporal logic monitoring with BDDs. Formal
Methods Syst. Des. 56(1), 1–21 (2020). https://doi.org/10.1007/s10703-018-00327-4
16.
Havelund, K., Reger, G., Thoma, D., Z˘
alinescu, E.: Monitoring events that carry data. In:
Bartocci, E., Falcone, Y. (eds.) Lectures on Runtime Verification Introductory and Advanced
Topics, LNCS, vol. 10457, pp. 61–102. Springer (2018). https://doi.org/10.1007/978-3-319-
75632-5_3
17.
Krsti´
c, S., Schneider, J.: A benchmark generator for online first-order monitoring. In: Desh-
mukh, J., Niˇ
ckovi´
c, D. (eds.) RV 2020. LNCS, vol. 12399, pp. 482–494. Springer (2020).
https://doi.org/10.1007/978-3-030-60508-7_27
18. Libkin, L.: Elements of Finite Model Theory. Springer (2004)
252 S. Zingg et al.
19.
Ronca, A., Kaminski, M., Grau, B.C., Motik, B., Horrocks, I.: Stream reasoning in temporal
Datalog. In: McIlraith, S.A., Weinberger, K.Q. (eds.) AAAI 2018. pp. 1941–1948. AAAI
Press (2018)
20.
Sánchez, C.: Online and offline stream runtime verification of synchronous systems. In:
Colombo, C., Leucker, M. (eds.) RV 2018. LNCS, vol. 11237, pp. 138–163. Springer (2018).
https://doi.org/10.1007/978-3-030-03769-7_9
21.
Schneider, J., Basin, D., Krsti´
c, S., Traytel, D.: A formally verified monitor for metric first-
order temporal logic. In: Finkbeiner, B., Mariani, L. (eds.) RV 2019. LNCS, vol. 11757, pp.
310–328. Springer (2019). https://doi.org/10.1007/978-3-030-32079-9_18
22.
Walega, P.A., Kaminski, M., Grau, B.C.: Reasoning over streaming data in met-
ric temporal Datalog. In: AAAI 2019. pp. 3092–3099. AAAI Press (2019).
https://doi.org/10.1609/aaai.v33i01.33013092
23.
Yellin, D.M.: Speeding up dynamic transitive closure for bounded degree graphs. Acta
Informatica 30(4), 369–384 (1993). https://doi.org/10.1007/BF01209711
24.
Zingg, S., Krsti´
c, S., Raszyk, M., Schneider, J., Traytel, D.: VeriMon’s development repository.
https://bitbucket.org/jshs/monpoly/src/887b996966/thys/ (2021)
Open Access
This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you
give appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Verified First-Order Monitoring with Recursive Rules 253
Maximizing Branch Coverage with
Constrained Horn Clauses
Ilia Zlatkin , Grigory Fedyukovich()
Florida State University, Tallahassee, FL, USA,
iz20e@fsu.edu,grigory@cs.fsu.edu
Abstract. State-of-the-art solvers for constrained Horn clauses (CHC)
are successfully used to generate reachability facts from symbolic encod-
ings of programs. In this paper, we present a new application to test-case
generation: if a block of code is provably unreachable, no test case can be
generated allowing to explore other blocks of code. Our new approach
uses CHC to incrementally construct different program unrollings and
extract test cases from models of satisfiable formulas. At the same time,
a CHC solver keeps track of CHCs that represent unreachable blocks
of code which makes the unrolling process more efficient. In practice,
this lets our approach to terminate early while guaranteeing maximal
coverage. Our implementation called Horntinuum exhibits promising
performance: it generates high coverage in the majority of cases and
spends less time on average than state-of-the-art.
1 Introduction
Branch coverage is a method for testing that aims to maximize the number of
program branches to be collectively visited by a set of test cases. Branches in the
code are commonly attributed to the conditional statements or loops. For testing
a loop-free program, possible test cases for all the branches can be identified by
symbolic execution, powered by efficient solvers for Boolean Satisfiability (SAT)
or Satisfiability Modulo Theories (SMT). If a conditional is placed inside or after
a loop, test-case generation immediately becomes challenging because the cost
of exploration of every next iteration grows exponentially in the worst case.
Many verification problems can be reduced to synthesizing interpretations of
predicates in systems of SMT formulas, also known as constrained Horn clauses
(CHC), that provide a modular encoding for programs with arbitrary control
flow. In this paper, we propose to use CHC also for test-case generation. So-
lutions to CHC, also called inductive invariants, carry reachability information
and are useful in pruning the search space explored by test-case generators. If an
invariant shows that a branch can never be taken, then it is guaranteed that no
test can ever reach the branch, and thus a test-case generator can safely proceed
to discovery of the next test case.
We contribute a new approach to test-case generation that aims at maximiz-
ing branch coverage using inductive invariants. In essence, our approach gradu-
ally enumerates different unrollings and uses an off-the-shelf SMT solver to get
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 254–272, 2022.
https://doi.org/10.1007/978-3-030-99527-0_14
values for program variables that represent test cases. Unrollings are constructed
on-the-fly by exploring the CHC encoding of programs. Concurrently, an incre-
mental CHC solver determines a subset of unreachable CHCs which allows the
algorithm to explore fewer unrollings in the next iterations. The algorithm ter-
minates when the test cases are generated for all reachable branches, but all the
remaining branches are provably unreachable.
These features distinguish our approach from other white-box test genera-
tors [1,8,9] that consider reachability information only in the bounded context.
That is, in the presence of unreachable branches and loops, they may continue
iterating forever, even if all possible test cases have already been generated.
Reliance on invariants lets our tool to terminate early while still guaranteeing
maximal possible coverage.
The approach has been implemented on top of the FreqHorn CHC solver [14]
and the Z3 SMT solver [27]. It enables test-case generation for C programs, con-
verted to CHCs by the SeaHorn [21] tool. Experiments conducted on a range
of public benchmarks demonstrate the strengths of our approach compare to
state-of-the-art: SMT-based incremental test-case generation is able to detect
high-quality solutions in the majority of cases and is on average less expensive.
2 Related Work
Automated test generation has two main approaches: fuzzing (e.g., [7,20,25,26,
29,31,33,34]) and symbolic/concolic execution (e.g., [3,8,11,22,23,28,32]). The
former group uses user-given seed inputs and further mutates them based on var-
ious heuristics (sometimes using the source code as well). The latter group, which
also includes our one, proceeds by enumerating paths and generating test cases,
often using constraint solvers. Recent algorithms, including FuSeBMC [1] and
Verifuzz [9], follow both approaches: begin with symbolic execution (namely,
some bounded model checking [10,19]) and then proceed to fuzzing.
The closest related work [22] suggests to accelerate testing using interpola-
tion. Aiming the same goal as us, i.e., prune unreachable paths, they however do
not generate inductive invariants, which limits the generality of their method.
Earlier attempts to combine static analysis techniques and testing [11] were
tailored to particular frameworks and languages. With the rise of SMT solvers,
approaches became more scalable, goal-oriented [3], and at the same time more
agnostic to programming languages. Recent works, e.g. [33], offer a great flex-
ibility of applications of static analyzers to test-case generation, e.g., to direct
fuzzers to specific blocks of code. Following this trend, our approach continues
bridging the gap between state-of-the-art in automated reasoning and testing.
While we are not aware of any specific applications of CHC solvers to test-
case generation, we are largely inspired by the work in model checking, e.g., [6,21]
that can both discover invariants and find counterexamples (from which a test
case can be extracted). The main difference is in the application: model checkers
often focus on a single property/bug, while our goal is to cover the maximal
number of branches. Furthermore, many practical approaches including [1,9]
Maximizing Branch Coverage with Constrained Horn Clauses 255
1in t x= 0 ;
2in t y=nondet () ;
3in t z=nondet () ;
4wh i l e (1 ) {
5if (x> = 5)
6y++ ; / / ne e d s a t le a s t 6 i te r a t i o n s t o re a c h
7else
8x++ ; / / 𝑥[0,5] al w a y s h old s
9if (y< = 5)
10 z++ ;
11 else
12 if (x>y)
13 y++ ; / / th i s i s u n rea c h a b l e
14 else
15 x= 0;
16 if (z= = 0)
17 br e ak ;
18 }
Fig. 1: Loopy program with control-flow divergence and unreachable branches.
are based on existing model checkers (that typically use constraint solvers as
blackbox), CHC formulation allows to build tools modularly and directly on top
of an SMT solver, thus allowing to use it incrementally for both counterexample
finding and invariant generation.
3 Motivating Example
Fig. 1gives a program with a single loop. It has three variables: xis assigned
zero before the loop, which we cannot change, and the remaining yand zcould
be taken from the user. The loop has four if-then-elses (including one nested),
and it terminates when the value of zat the end of an iteration equals zero. To
completely cover all the branches, we need to consider seven cases, in particular:
line 6: In order to reach the first then-branch, the loop needs to iterate at least
six times and do not reach lines 15 and 17 at the first five iterations.
Thus, line 8 should be visited at the first five times. A possible scenario
for that would be if initially y= 0 and z= 0.
line 8: The loop always reaches the else-branch of the first conditional be-
cause initially xis zero, and the guard trivially does not hold.
line 10: The guard of the second conditional might hold even at the first itera-
tion if yis sufficiently small. Since we know that the increment at line
6 does not happen at the first iteration, ymight initially be 5 (and z
any). Then the branch is reachable.
line 13: The branch is never reachable because 0 𝑥5 is a loop invariant
(and thus, holds at each iteration) and the path condition 𝑥>𝑦𝑦 > 5
is unsatisfiable.
line 15: Because line 13 is unreachable, we know that line 15 is reached always
if the guard of the second conditional does not hold, e.g., when yis
initially greater than 5.
256 I. Zlatkin and G. Fedyukovich
line 17: If initially z= 0 and yis greater than 5, the loop executes the break
statement at the end of its only iteration.
line 19: We have already seen a test case (for line 6) that gives a possible
condition for the loop to continue iterating. In fact, for any values of
zgreater than zero (and any values of y), the loop does not terminate
at all.
All these make the program quite interesting and its analysis challenging.
4 Background
This paper approaches the problem of automated test-case generation by re-
duction to the Satisfiability Modulo Theories (SMT) problem. Automated SMT
solvers determine the existence of a satisfying assignment to variables (also called
amodel) of a first-order logic formula. Formula 𝜙is logically stronger than for-
mula 𝜓(denoted 𝜙=𝜓), if every model of 𝜙also satisfies 𝜓. The unsatisfi-
ability of formula 𝜙is denoted 𝜙= , and we also write Mto indicate
that no model Mof the formula (which is clear from the context) exists. By
writing 𝜓(𝑥), we denote a predicate over free variables 𝑥.
Constrained Horn clauses (CHC) are used as intermediate verification lan-
guage used by both verification frontends and backend SMT solves. This allows
to split efforts while designing a verification tool for a new language: while fo-
cusing on encoding programs to CHCs, researchers rely on advances of CHC
solvers that will solve these CHCs. Thus, by demonstrating our algorithms at
the level of CHCs, we allow for many their particular instantiations for various
programming languages (that support CHC encoding).
Definition 1. Alinear constrained Horn clause (CHC) over a set of uninter-
preted relation symbols Ris a first-order-logic formula having the form of either:
𝜙(𝑥1) =𝑖𝑛𝑣1(𝑥1)
𝑖𝑛𝑣𝑖(𝑥𝑖)𝜙(𝑥𝑖, 𝑥𝑗) =𝑖𝑛𝑣𝑗(𝑥𝑗)
𝑖𝑛𝑣𝑛(𝑥𝑛)𝜙(𝑥𝑛) =
where all 𝑖𝑛𝑣𝑖Rare uninterpreted symbols, all 𝑥𝑖are vectors of variables,
and 𝜙is a fully interpreted formula called constraint.
These types of implications are called respectively a fact, an inductive clause,
and a query. Note that constraint 𝜙𝐶of each CHC 𝐶does not have applications
of any predicates from R. Further, by body(𝐶), we denote the premise of 𝐶,
by src(𝐶) an application of 𝑖𝑛𝑣 Rin body(𝐶) (but if 𝐶is a fact, we write
src(𝐶)def
=). Similarly, by head(𝐶), we denote the conclusion of 𝐶, and by
dst(𝐶) we denote an application of 𝑖𝑛𝑣 Rin head(𝐶) (and if 𝐶is a query, we
write dst(𝐶)def
=).
Intuitively, CHCs allow to generate program encodings with “holes” that
represent unrollings of unknown lengths. Then, possible instantiations of these
holes can be used in the discovery of meaningful information about the program,
such as loop invariants, or function summaries.
Maximizing Branch Coverage with Constrained Horn Clauses 257
Definition 2. Given a set Rof uninterpreted predicates and a set 𝑆of CHCs
over R, we say that 𝑆is satisfiable if there exists an interpretation for every
𝑖𝑛𝑣 Rthat makes all implications in 𝑆valid.
CHCs are useful also when there is a need to access various pieces of program
encoding and pose reachability queries. In particular, it is straightforward to
design a Bounded Model Checking (BMC) [5] tool on top of CHCs and use it for
test-case generation. Specifically, by traversing the graph structure imposed on
the CHCs, we can access all possible program traces and create the corresponding
unrollings.
Definition 3. Given a system 𝑆of CHCs over R, an unrolling of 𝑆of length 𝑘
is a conjunction 𝜋𝐶0,...,𝐶𝑘
def
=
0𝑖𝑘
𝜙𝐶𝑖(𝑥𝑖, 𝑥𝑖+1), such that 1) 𝐶0is a fact, 2)
each 𝐶𝑖𝑆, 3) for each pair 𝐶𝑖and 𝐶𝑖+1, rel (dst (𝐶𝑖)) = rel(src(𝐶𝑖+1 )), and
variables of each 𝑥𝑖are shared only between 𝜙𝐶𝑖1(𝑥𝑖1, 𝑥𝑖)and 𝜙𝐶𝑖(𝑥𝑖, 𝑥𝑖+1).
For bug finding, it is essential to enumerate various unrollings and check their
satisfiability. Once a satisfiable formula 𝜋𝐶0,...,𝐶𝑘is found for some query 𝐶𝑘,
the bug is found (and its counterexample can be obtained from the model), and
thus no interpretation for predicates in Rexists.
Lemma 1. Given a system of CHCs 𝑆, let 𝜋𝐶0,...,𝐶𝑘be one of its unrollings,
such that 𝐶0is a fact, and 𝐶𝑘is the query. Then if 𝜋𝐶0,...,𝐶𝑘is satisfiable then
𝑆is unsatisfiable.
In the next section, we expand on the notions of CHCs and unrollings, give
examples, and present the application to test-case generation.
5 Test-case Generation for Branch Coverage
The concept of constrained Horn clauses is convenient for formulating the prob-
lem of constructing a maximal branch coverage (MBC) of a given program. At
the highest level, the problem of finding MBC is concerned with finding a set of
program executions that visit all reachable program branches. Given the CHC
encoding of the program, this can be reduced to a problem of finding a set of
satisfiable unrollings that involve the maximal number of CHCs. However, to
guarantee maximality, this needs a special property of the CHC encoding: the
constraint 𝜙in each CHC should represent a straight-line code sequence with no
branches (a.k.a. basic block ). Technically, this can be formulated as the require-
ment for each CHC to have a conjunction of literals (a.k.a. cube), i.e., have no
disjunctions, in its body.
Example 1. Fig. 2gives a CHC encoding of the program in Fig. 1. There are eight
CHCs over four uninterpreted predicates 𝐴,𝐵,𝐶, and 𝐷. The program entry
is encoded in the first CHC (i.e., the only fact with the dst -predicate 𝐴), and
its exit in the last CHC (i.e., with the dst-predicate 𝐷). All other CHCs encode
258 I. Zlatkin and G. Fedyukovich
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
𝑥= 0 =𝐴(𝑥, 𝑦, 𝑧 )
𝐴(𝑥, 𝑦, 𝑧)𝑥5𝑥=𝑥𝑦=𝑦+ 1 𝑧=𝑧=𝐵(𝑥, 𝑦, 𝑧 )
𝐴(𝑥, 𝑦, 𝑧)𝑥 < 5𝑥=𝑥+ 1 𝑦=𝑦𝑧=𝑧=𝐵(𝑥, 𝑦 , 𝑧)
𝐵(𝑥, 𝑦, 𝑧)𝑦5𝑥=𝑥𝑦=𝑦𝑧=𝑧+ 1 =𝐶(𝑥, 𝑦, 𝑧)
𝐵(𝑥, 𝑦, 𝑧)𝑦 > 5𝑥 > 𝑦 𝑥=𝑥𝑦=𝑦+ 1 𝑧=𝑧=𝐶(𝑥, 𝑦, 𝑧 )
𝐵(𝑥, 𝑦, 𝑧)𝑦 > 5𝑥𝑦𝑥= 0 𝑦=𝑦𝑧=𝑧=𝐶(𝑥, 𝑦, 𝑧)
𝐶(𝑥, 𝑦, 𝑧)𝑧= 0 =𝐴(𝑥, 𝑦, 𝑧)
𝐶(𝑥, 𝑦, 𝑧)𝑧= 0 =𝐷(𝑥, 𝑦, 𝑧)
init
A
B
C
D
Fig. 2: CHCs of the motivating example (left) and src/dst-dependency graph (right).
the loop with the total of six symbolic paths, following 𝐴𝐵𝐶𝐴(as
can be seen from the graphic representation) but involving different CHCs. Each
CHCs has no disjunctions in the body: it has the conjunction of the (negation
of) guard and the encoding of program instructions following the corresponding
branch until either a next conditional or the join occurs. Note that there are no
queries in this system since there are no assertions in the program.
To formulate the MBC problem at the level of CHCs, it is convenient to
introduce the concept of a src/dst-dependency graph for a system of CHCs.
Definition 4. Given a system 𝑆of CHCs over a set of uninterpreted predicate
symbols R, its src/dst -dependency graph R, 𝐸 is a directed graph with edges
labeled by CHCs from 𝑆:
𝐸def
={⟨rel(src(𝐶)), 𝐶, rel(dst (𝐶)) | 𝐶𝑆}.
Because we are bound in this paper to use only disjunction-free CHCs, the
points of control-flow divergence in a program encoded in these CHCs are cap-
tured by vertices in the src /dst -dependency graph that have more than one
outgoing edge1. To generate a test case visiting some block of code encoded
in a CHC 𝐶𝑘, it is enough to find an unrolling 𝜋𝐶0,...,𝐶𝑘and show that this
unrolling is satisfiable. In this case, the CHC is called reachable: i.e., the satis-
fying assignment would naturally correspond to a program trace beginning at
1Thus, in this case, the src/dst-dependency graph can be seen as a control-flow graph
(CFG) of the encoded program. In practice, many verification tools that are based
on CHC do not generate CHCs in such form but apply some generalization and
compression to CFG during preprocessing. This results in CHCs with disjunctive
bodies that are unsuitable for our approach. In these cases, we explicitly convert the
body of each CHC to a disjunctive normal form (DNF) and clone the CHC for each
cube in the DNF. The CHCs after this transformation is still a correct encoding of
the original program, and its src/dst-dependency graph is suitable for our approach,
but it may not exactly match the CFG of the original program.
Maximizing Branch Coverage with Constrained Horn Clauses 259
the program entry point and reaching the code in that branch. Furthermore, if
the execution depends on some input values, these values can also be extracted
from the satisfying assignment.
Example 2. According to Fig. 2, the first point of control-flow divergence is
predicate 𝐴. To show that CHC 3is reachable, we create the following unrolling
from bodies of CHCs 1and 3:
𝑥= 0 𝑥 < 5𝑥=𝑥+ 1 𝑦=𝑦𝑧=𝑧.
This formula is satisfiable, and there exists a model M={𝑥↦→ 0, 𝑦 ↦→ 0, 𝑧 ↦→
0, . . .}, thus giving us two values for input variables 𝑦and 𝑧(both zeroes).
It can also be seen that some CHCs cannot be visited by any trace. To find
them, we can pose additional safety verification queries and aim at generating
an appropriate invariant.
Lemma 2. Given a system 𝑆of CHCs over some R, and let 𝐶be some CHC
from 𝑆. If the extended CHC system 𝑆 {src(𝐶)𝜙𝐶= ⊥} is satisfiable,
then 𝐶is unreachable.
The proof of the lemma follows directly from Lemma 1.
Example 3. In the CHC system in Fig. 2, CHC 5is never reachable. We introduce
a new query CHC 𝑄as follows:
𝐵(𝑥, 𝑦, 𝑧)𝑦 > 5𝑥>𝑦𝑥=𝑥𝑦=𝑦+ 1 𝑧=𝑧=
The system 𝑆 {𝑄}is satisfiable, with the following interpretation M:
M(𝐴) = M(𝐵) = M(𝐶) = 𝜆𝑥, 𝑦, 𝑧 . 𝑥 5
Because 𝑥5𝑦 > 5𝑥>𝑦 is unsatisfiable, CHC 5is unreachable.
These ingredients lets us state the MBC problem formally.
Definition 5 (MBC). Given a system 𝑆of CHCs over some R, the problem
of maximizing branch coverage of 𝑆is concerned with 1) determining a subset
𝑆𝑢𝑆of CHCs which are provably unreachable (i.e., Lemma 2applies), and
2) finding satisfiable unrollings for all CHCs from 𝑆𝑆𝑢.
The practical significance of the MBC problem consists in allowing the test-
generation tools that are based on bounded model checking, e.g., [1], to terminate
earlier. The invariants discovered while iteratively applying Lemma 2can serve
as annotations of various nodes of the program CFG, which further enables to
prune the search space of the test cases. In particular, for our running example
in Fig. 1, line 13 is provably unreachable, thus it makes no sense to search for
its test case.
Furthermore, with the invariant that blocks a branch at hand, the tools can
now explore fewer unrollings leading to other branches in the next iterations of
the loop. Specifically, to reach line 6, five iterations of the loop will provably
skip line 13, so instead of (2 *3)5= 7776 unrollings, the tool should only explore
(2 *2)5= 1024 unrollings.
260 I. Zlatkin and G. Fedyukovich
6 Solving the MBC problem
In this section, we introduce our novel approach to constructing the maximal
branch coverage using a system of disjunction-free CHCs. We begin with outlin-
ing our key ideas that can be implemented on top of existing test-case generators
and invariant generators, and then proceed to describing our efficient implemen-
tation.
6.1 Key Insights
The approach has a simple high-level structure. Because the number of CHCs
in a program encoding is always finite, we can pose a safety verification query
for each of them.
Existing CHC solvers are equipped with the functionality to generate both,
the counterexamples and safety invariants. However, recent evaluation [17] show
that the bounded-model-checking implementations often outperform general-
purpose solvers on unsatisfiable CHC instances (likely, because they do not invest
efforts in generating invariants). This suggests that for performance reasons, it
makes sense to alternate between separate runs of a counterexample generator
(via enumerating the unrollings) and an invariant generator. This allows for two
main benefits, outlined in the next two paragraphs.
Acounterexample generator, in the MBC setting, should handle a large num-
ber of unrollings. Many of the unrollings are unsatisfiable since some sequentially
aligned branches might be incompatible, and some other branches might be wait-
ing for a certain loop iteration. It is thus essential to share the information about
conflicting paths’ segments (e.g., unsatisfiable prefixes, as in our implementation)
to accelerate the search. Dually, satisfiable unrollings can often be extended to
unrollings for other reachable CHCs, and this information can be exploited in
the enumerative search for the remaining branches.
An invariant generator, invoked multiple times throughout the process, deals
with many largely similar safety verification instances (since all CHCs are the
same, and only queries are different). Thus, a lot of information can be reused
between verification runs, opening the opportunities for incremental verifica-
tion [13]. Formally, all invariants that are discovered while proving the unreach-
ability of a CHC will remain valid after switching to another CHC. Even more,
the solvers that target conjunctive invariant generation, e.g., [15,24] can output
“partial” invariants (i.e., some lemmas) even for unsatisfiable CHC instances,
which then can be reused/completed in the next runs of the solver.
These observations let us to conclude that despite using off-the-shelf tools for
bounded model checking and invariant generation is possible, an MBC will likely
exhibit a more optimized performance through the design of new algorithms that
incorporates the aforementioned insights.
6.2 General Driver
The pseudocode of our approach is given in Alg. 1. The algorithm begins with
identifying a subset cur of CHCs that need to be considered in its iterations.
Maximizing Branch Coverage with Constrained Horn Clauses 261
Algorithm 1: CHC-based test-case generator.
Input: 𝑆: a CHC system over R
Output: 𝑇: a set of satisfying assignments to variables in 𝑆
Data: invs: mapping from Rto invariants, 𝐺=R, 𝐸: an edge-labeled
graph, cur 𝑆: a subset of CHCs to consider, length: counter
representing the length of the current unrollings, traces: a (global) set
of traces to consider
1R, 𝐸 src/dst -dependency graph of 𝑆;
2cur {𝐶| 𝑢, 𝐶, 𝑣1 𝐸and ∃⟨𝑢, , 𝑣2 𝐸where 𝑣1=𝑣2};
3if cur =then cur {𝐶|src(𝐶) = ⊤};
4length 1;
5while 𝑐𝑢𝑟 =do
6for chc cur do
7res,invs ,cex solveCHCs(𝑆 {body(chc ) = ⊥},invs );
8if res =sat then
9cur cur {chc};
10 𝐸 {⟨𝑢, 𝐶, 𝑣 | 𝑢, 𝐶, 𝑣 𝐸and 𝐶=chc};
11 else if res =unsat then
12 cur cur {chc};
13 𝑇𝑇 {cex};
14 else
15 traces ;
16 GetTraces(𝐸, ,chc,length,nil,prefixes,traces);
17 for 𝑡traces do
18 res,M checkSAT(unroll(𝑆, 𝑡));
19 if res =sat then
20 𝑇𝑇 {M};
21 cur cur {chc};
22 break;
23 else
24 prefixes prefixes {𝑡};
25 length length + 1;
We say that a CHC 𝐶opens a branch, if the outdegree of rel(src(𝐶)) in the
src/dst-dependency graph is greater than one (line 2). Thus, to generate a test
case visiting a branch, it is enough to find an unrolling 𝜋𝐶0,...,𝐶𝑘where 𝐶𝑘
opens that branch and show that this unrolling is satisfiable. If, however, there
are no branches in the given program at all, then cur gets all facts of the CHC
system (line 3), and the remaining coverage generation is straightforward.
The rest of the algorithm is organized as a big loop that decides if any CHC
from cur are (un)reachable and terminates when cur is empty. At each iteration
of the loop, all CHCs from cur are enumerated, and the algorithm seeks to
apply Lemma 2, i.e., extends 𝑆with one query and solves these CHC (line 7).
The algorithm can use any CHC solving algorithm that decides the satisfiability
262 I. Zlatkin and G. Fedyukovich
Algorithm 2: GetTraces: trace enumerator.
Input: 𝐸2R×𝑆×R: labeled edges, 𝑢R,chc 𝑆,length: length of trace, 𝑡:
trace prefix, prefixes: to avoid
Output: traces : global set of traces of length beginning with relation 𝑢and
ending with chc
1if 𝑝prefixes .𝑖 . 𝑖 [0,|𝑝|) =𝑝𝑖=𝑡𝑖then
2return;
3if length = 1 then
4if 𝑢, chc, 𝐸then
5traces traces {𝑡@chc}
6else
7for {𝑢, 𝐶, 𝑣} 𝐸do
8GetTraces(𝐸, 𝑣, chc ,length 1, 𝑡@𝐶, prefixes,traces);
of CHCs, returns inductive invariants (line 8) or (optionally2) a counterexample
(line 11). In both cases, the CHC is excluded from cur . Additionally, if satisfiable,
this CHC cannot be used in any unrolling, and it is excluded also from the
auxiliary graph (line 10, to prune the search space of the remaining test cases).
If a counterexample is returned, the branch is reachable, and the test case is
extracted from this counterexample (line 13).
It is also possible (and in practice, very likely) that the CHC solver returns
unknown (because the problem is undecidable, and invariant generators are
often limited to either a fixed shape of invariants, or a certain timeout). In this
case (lines 16-22), the algorithm proceeds with an explicit enumeration of un-
rollings of a predetermined length (line 16). Each trace 𝑡=𝑡0, 𝑡1, . . . , 𝑡length1
has an associated unrolling 𝜋𝐶𝑡0,...,𝐶𝑡length 1which is checked for the satisfi-
ability (line 18) with an off-the-shelf SMT solver. If satisfiable (line 19), the
branch opened by the current CHC is reachable, the test case is generated from
the model, and the CHC is excluded from cur . If unsatisfiable (line 23), the
algorithm registers this 𝑡as an unsatisfiable prefix to be avoided in the trace
generation in the next iterations (see Alg. 2).
Theorem 1. When Alg. 1terminates, the resulting set 𝑇contains all the vari-
able assignments needed for maximal coverage.
In the next two paragraphs we discuss two important design choices that do
not affect the correctness of our implementation, but optimize it.
2In fact, the counterexample detection in some CHC solvers, e.g., [24] proceeds in a
similar fashion as described in our algorithm, but if invoked multiple times through-
out the algorithm, it is likely that the CHC solver will perform many redundant
actions. We thus do not use this functionality in our experiments (and our Alg. 3),
but leave it in the pseudocode for the sake of completeness of presentation.
Maximizing Branch Coverage with Constrained Horn Clauses 263
Algorithm 3: solveCHCs:
Input: 𝑆: a CHC system over R,invs: mapping from Rto invariants
Output: res sat,unsat,invs: updated mapping, [cex : counterexample]
1𝑆;
2for chc 𝑆do
3𝑆𝑆 {src(chc)body(chc)[R↦→ invs] =dst (chc)};
4if 𝜆𝑥.is a solution for 𝑆then
5return invs;
6return res,invs , FreqHorn(𝑆);
6.3 Incremental Trace Enumeration
Our algorithm allows for sharing the information obtained during its iterations
using two global data structures: the set of unsatisfiable prefixes discovered dur-
ing the trace enumeration and the graph-structure R, 𝐸representing poten-
tially reachable CHCs (line 10 of Alg. 1). Intuitively, the latter is constructed by
an iterative removal of edges from the src/dst-dependency graph, thus allowing
for a more focused search of suitable traces. Both data structures are used in
Alg. 2that is called at the next algorithm iteration.
Conceptually, Alg. 2is a dynamic-programming implementation of a path
finder in an arbitrary directed graph. Given a length of path, its starting point
and ending points, the algorithm recursively visits the graph edges and stores
them in vectors3. In our setting, the algorithm is optimized in two ways. First,
at line 1, it skips paths with unsatisfiable prefixes (because the corresponding
unrollings will be unsatisfiable too). Second, at lines 4and 7, it excludes all the
unreachable CHCs that have been excluded from the graph previously.
Example 4. Recall our running example for program encoded as CHCs in Fig. 2.
For length = 2 and CHC (2), Alg. 2, constructs a single trace (1),(2), that
corresponds to an unsatisfiable unrolling, found by Alg. 1, and thus added to
prefixes. Consequently, for length = 3, traces (1),(2),(4)and (1),(2),(6)are
not generated. Furthermore, because (5) is never reachable, then edge 𝐵, (5), 𝐶
is excluded from 𝐸permanently.
6.4 Incremental Invariant Discovery
Alg. 3gives the main idea of our CHC solver, which relies on the FreqHorn [15]
algorithm to synthesize invariants (any other CHC solver could be used as well).
However, in addition, it recycles the invariants invs generated in all previous
runs. Specifically, it substitutes interpretations for each 𝑟Rin the body of
each CHC (line 3). Because each such formula represents an over-approximation
3We use the notation 𝑡@𝐶to represent the “push back” operation over a vector 𝑡and
an element 𝐶.
264 I. Zlatkin and G. Fedyukovich
of the set of reachable states at a particular program location, this substitution
is sound.
If it appears that after the substitution, all the remaining invariants are
simply true-formulas (line 4) then invs is already a solution, and the CHCs
solver is not needed. On the other hand, invariants could be generated by an
external tool.
While the pseudocode of FreqHorn is omitted from Alg. 3for simplicity,
we list its distinguishing features here. The approach is driven by Syntax Guided
Synthesis (SyGuS) [2], and it supports (possibly, non-linear) arithmetic and ar-
rays [16]. It automatically constructs formal grammars 𝐺(𝑖𝑛𝑣) for each 𝑖𝑛𝑣 R
based on either source code [14], or program behaviors [15,30]. Importantly, these
grammars are conjunction-free, and they allow for only a finite number of candi-
dates. FreqHorn iteratively attempts to apply production rules of each 𝐺(𝑖𝑛𝑣)
to sample a candidate and checks it with an SMT solver (successfully checked
candidate is then called a lemma). The process continues either until a con-
junction of lemmas is sufficient, or the search space is exhausted. To make the
process less dependent on the order in which candidates are considered, Freq-
Horn uses batching [12] (e.g., checks several candidates at the same time) and
effectively filters them using the well known Houdini algorithm [18].
These features make FreqHorn especially useful for the application to test-
case generation. Behaviors and counterexamples can be obtained from traces
as outlined in Sect. 6.3. Each new counterexample potentially contributes to a
new data candidate to be considered in the next invocations of the algorithm.
Then, following our incremental schema, new candidates are used in conjunction
with previously generated invariants, and either added to invs, or dropped. Note
that even if FreqHorn returns unknown indicating that it is unable to find
a strong enough invariant, it almost always finds some lemmas that might be
useful for the next iterations of our main algorithm.
7 Evaluation
We have implemented the approach in a tool called Horntinuum4. The backend
of Horntinuum is developed on top of FreqHorn [14] and uses it for CHC
solving. All the symbolic reasoning in our backend is performed by the Z3 [27]
SMT solver, v4.8.10. For encoding C benchmarks to CHCs in our frontend, we
use the SeaHorn [21] verification framework, v10.0.0-rc0, via its Docker image5.
Implementation details. The success of our approach largely depends on
the preprocessing performed by SeaHorn while producing the CHC encoding.
Since our algorithm works on disjunction-free CHCs (recall Sect. 6.1), we con-
figure SeaHorn to perform a small-step encoding, i.e., introducing a CHC per
4The source code of the tool is publicly available at https://github.com/izlatkin/
HornLauncher with the CHC-based backend at https://github.com/izlatkin/aeval/
tree/tg.
5https://hub.docker.com/r/seahorn/seahorn-llvm10.
Maximizing Branch Coverage with Constrained Horn Clauses 265
each basic block (via the --step=small option). However, the encoder, based
on LLVM, additionally performs several LLVM transformations6and auxiliary
SeaHorn’s passes that may introduce disjunctions to CHCs. Since this recipe
is not configurable in SeaHorn yet, we additionally get rid of disjunctions, by
performing a DNF-ization, over the CHCs received from SeaHorn.
We also had to overcome a relatively minor engineering obstacle to allow
recognizing multiple nondet() function calls (see an example in Fig. 1). The
CHC representation is in some sense declarative, i.e., it is not always possible
to detect the order of function calls from formulas that represent program un-
rollings. Thus, we rename each invocation of nondet() in each input C file, e.g.,
to nondet𝑖() which lets Horntinuum to associate each function invocation
with a sequence of static-single-assignment (SSA) variables that encode (possi-
bly many, if nondet𝑖() is called in a loop) outputs of nondet𝑖() occurring in
an unrolling. Further, it gives a sequence of concrete values obtained for each
of the SSA variable by the SMT solver. In a generated test case, sequences of
SSA values of each nondet𝑖are stored in a separate array (to capture values
in each loop iteration) and accessed by an automatically generated body of the
corresponding unique nondet𝑖() function.
In a sense, the final output of our tool is a set of context-specific implemen-
tations of function nondet() written in different header files. The initial C file
should include a header from this set, be compiled and run in order to reproduce
the detected test case.7
Experimental setup. To evaluate Horntinuum, we configured the gcov
tool, v9.3.0, a code coverage analysis and profiling tool that tracks all statements
visited in a single run of the program. Running gcov for each our generated test
case and merging the statistics gives the final coverage: we ultimately target to
maximize the number of code visited by at least one test case.8
We compared Horntinuum with state-of-the-art tools FuSeBMC [1], Veri-
fuzz [9], and KLEE [8]9that exhibited a decent performance in TestComp 2021.
Our experiments were run on a “Dell OptiPlex 7090 Tower” desktop computer
with 2.5 GHz Intel Core i7 8-Core (11th Gen), 16GB 3200 MHz DDR4 RAM,
and Ubuntu 20.04.1 LTS installed on it.
For the experimentation we considered 316 benchmarks from TestComp (from
loop-* tracks, excluding the programs with floating points that our CHC solver
6One transformation, for instance, removes redundant branches from the code, e.g.,
replaces if (nondet()) foo(); else foo(); by just foo. Technically, the CHC
encoding received by our tool does not represent all branches of the original program,
whilch thus leads to a smaller coverage detected. We have not seen many such
examples in our benchmarks set, however.
7Note the difference with the TestComp format [4] that keeps all values in the same
XML file. Our proposed format is more general and easily convertible to TestComp.
8The full logs and tables are available at https://www.cs.fsu.edu/grigory/
horntinuum.zip.
9All the binaries were downloaded from https://test-comp.sosy-lab.org/2021/
systems.php.
266 I. Zlatkin and G. Fedyukovich
0 20 40 60 80 100
0
20
40
60
80
100 vs Verifuzz
0 20 40 60 80 100
0
20
40
60
80
100 vs FuSeBMC
0 20 40 60 80 100
0
20
40
60
80
100 vs KLEE
Fig. 3: Coverage comparison: each point in a plot represents a pair of the coverages
(% ×%) of Horntinuum (x-axis) and a competitor (y-axis) for the same benchmarks.
does not support yet). The largest considered benchmark has >5K LoC. The per-
formance of all three competitors (using the timeout of 15 minutes) on our ma-
chine was consistent to the one exhibited in TestComp 2021: Verifuzz slightly
outperforms FuSeBMC, and both outperform KLEE.
Expectations and results. We aim to answer two main questions:
Q1Is it possible to develop a competitive test-case generator based purely on
formal verification and SMT solving, i.e., not relying on dynamic analysis
and fuzzing?
Q2In the cases when a CHC-based test-case generator yields a similar (or better)
coverage than a competitor, is it possible to achieve this result faster10?
Plots in Fig. 3and Fig. 4attempt to answer these questions, respectively.
We first give a pairwise comparison between the coverage %% reported by
the tools (Fig. 3). If a tool was unable to analyze a program, the corresponding
10 We believe the ability to successfully terminate the test-case generation early is
of great interest to software engineers. However, unfortunately, it is not the main
determining factor in testing competitions.
Maximizing Branch Coverage with Constrained Horn Clauses 267
102101100101102103104
102
101
100
101
102
103
104vs Verifuzz
102101100101102103104
102
101
100
101
102
103
104vs FuSeBMC
102101100101102103104
102
101
100
101
102
103
104vs KLEE
Fig. 4: Runtime needed to get 1% of coverage (sec ×sec) of Horntinuum (x-axis)
and a competitor (y-axis). Solid triangles represent runs (green: Horntinuum, orange:
the competitor) in which the corresponding tool detected larger coverage and took less
time. Blank triangles are the remaining (non-representative) runs. Triangles on the
boundaries represent runs in which one of the tool detected zero coverage.
point is placed on the boundary. The experiments revealed that given the same
timeout, Horntinuum generates test cases with larger or equal coverage than
KLEE on 241 programs, FuSeBMC on 178 programs, and Verifuzz on 177
programs. These numbers include cases when the competitor crashed or did not
return any coverage but exclude cases when Horntinuum did so.
A pairwise comparison between the “runtime/coverage” ratio taken by the
tools is shown in Fig. 4. For this experiment, for every plot, we only considered
benchmarks, on which either of tools generated test cases with larger coverage,
and on which it terminated before the competitor. Specifically:
177 (resp. 44) on which Verifuzz (resp. Horntinuum) was outperformed.
128 (resp. 44) on which FuSeBMC (resp. Horntinuum) was outperformed.
124 (resp. 26) on which KLEE (resp. Horntinuum) was outperformed.
These numbers lets us conclude that it is much likely that Horntinuum could
return a larger coverage in a shorter amount of time, then a competitor could
268 I. Zlatkin and G. Fedyukovich
101100101102103
101
100
101
102
103
Horntinuum w/o invariants
Horntinuum with invariants
Fig. 5: Impact of invariants: pairs of the runtimes (sec ×sec) of Horntinuum with
and without invariants.
do so. The remaining benchmarks (e.g., on which Horntinuum generates more
test cases but takes more time than a competitor) are still shown in the plot
but are excluded from the statistics: in these cases it is impossible to draw a
consistent conclusion on the tools’ performance.
Controlled experiment. Lastly we present an interesting statistic on the
effect of invariant generation on runtime of test-case generation (Fig. 5). For the
sake of experiment, we modified Alg. 1such that it skips invariant generation but
enumerates traces and exploits the unsatisfiable prefixes. It turns out that this
negatively affects 184 benchmarks, on which the modified version takes more
time. These include 12 benchmarks, on which Horntinuum with invariants
terminates before the timeout, but Horntinuum without invariants does not
terminate (represented as points on the right boundary). These benchmarks
demonstrate a possible scenario when programs under test have unreachable
branches that can be identified by a CHC solver, allowing the test-case generator
to terminate earlier.
8 Conclusion
We have shown that CHCs is a promising vehicle that a test-case generators
could use in order to improve the quality of solutions and the runtime. Specif-
ically, using CHC encodings of programs, various program unrollings are enu-
merated, and test cases are extracted from models of satisfiable formulas. Our
novel CHC-based approach and its implementation in Horntinuum use SMT
solvers incrementally. In the future we are going to extend our support for data
types and optimize the algorithm for searching deep counterexamples a la [6].
Acknowledgments The work is supported in parts by a gift from Amazon
Web Services and a grant from FSU’s Council on Research & Creativity.
Maximizing Branch Coverage with Constrained Horn Clauses 269
References
1. Alshmrany, K.M., Aldughaim, M., Bhayat, A., Cordeiro, L.C.: FuSeBMC: An
Energy-Efficient Test Generator for Finding Security Vulnerabilities in C Pro-
grams. In: TAP. Lecture Notes in Computer Science, vol. 12740, pp. 85–105.
Springer (2021)
2. Alur, R., Bod´ık, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A.,
Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-Guided Synthesis. In:
FMCAD. pp. 1–17. IEEE (2013)
3. Anand, S., Godefroid, P., Tillmann, N.: Demand-driven compositional symbolic
execution. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS. Lecture Notes in
Computer Science, vol. 4963, pp. 367–381. Springer (2008)
4. Beyer, D., Lemberger, T.: Testcov: Robust test-suite execution and coverage mea-
surement. In: ASE. pp. 1074–1077. IEEE (2019)
5. Biere, A., Cimatti, A., Clarke, E.M., Zhu, Y.: Symbolic Model Checking without
BDDs. In: TACAS. LNCS, vol. 1579, pp. 193–207. Springer (1999)
6. Blicha, M., Fedyukovich, G., Hyv¨arinen, A.E.J., Sharygina, N.: Transition Power
Abstractions for Deep Counterexample Detection. In: Fisman, D., Rosu, G. (eds.)
Tools and Algorithms for the Construction and Analysis of Systems. Springer
Berlin Heidelberg (2022)
7. ohme, M., Pham, V., Roychoudhury, A.: Coverage-based greybox fuzzing as
markov chain. IEEE Trans. Software Eng. 45(5), 489–506 (2019)
8. Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation
of high-coverage tests for complex systems programs. In: Draves, R., van Renesse,
R. (eds.) OSDI. pp. 209–224. USENIX Association (2008)
9. Chowdhury, A.B., Medicherla, R.K., Venkatesh, R.: Verifuzz: Program aware
fuzzing - (competition contribution). In: Beyer, D., Huisman, M., Kordon, F., Stef-
fen, B. (eds.) TACAS, Part III. Lecture Notes in Computer Science, vol. 11429,
pp. 244–249. Springer (2019)
10. Clarke, E., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs. In:
TACAS. LNCS, vol. 2988, pp. 168–176. Springer (2004)
11. Csallner, C., Smaragdakis, Y.: Check ’n’ crash: combining static checking and
testing. In: Roman, G., Griswold, W.G., Nuseibeh, B. (eds.) ICSE. pp. 422–431.
ACM (2005)
12. Fedyukovich, G., Bod´ık, R.: Accelerating Syntax-Guided Invariant Synthesis. In:
TACAS, Part I. LNCS, vol. 10805, pp. 251–269. Springer (2018)
13. Fedyukovich, G., Gurfinkel, A., Sharygina, N.: Property directed equivalence via
abstract simulation. In: CAV. LNCS, vol. 9780, Part II, pp. 433–453. Springer
(2016)
14. Fedyukovich, G., Kaufman, S., Bod´ık, R.: Sampling Invariants from Frequency
Distributions. In: FMCAD. pp. 100–107. IEEE (2017)
15. Fedyukovich, G., Prabhu, S., Madhukar, K., Gupta, A.: Solving Constrained Horn
Clauses Using Syntax and Data. In: FMCAD. pp. 170–178. IEEE (2018)
16. Fedyukovich, G., Prabhu, S., Madhukar, K., Gupta, A.: Quantified Invariants via
Syntax-Guided Synthesis. In: CAV, Part I. LNCS, vol. 11561, pp. 259–277. Springer
(2019)
17. Fedyukovich, G., ummer, P.: Competition report: CHC-COMP-21. In: Hojjat,
H., Kafle, B. (eds.) HCVS@ETAPS. EPTCS, vol. 344, pp. 91–108 (2021)
18. Flanagan, C., Leino, K.R.M.: Houdini: an Annotation Assistant for ESC/Java. In:
FME. LNCS, vol. 2021, pp. 500–517. Springer (2001)
270 I. Zlatkin and G. Fedyukovich
19. Gadelha, M.Y.R., Monteiro, F.R., Cordeiro, L.C., Nicole, D.A.: ESBMC v6.0: Ver-
ifying C programs using k-induction and invariant inference - (competition contri-
bution). In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) TACAS:, Part
III. LNCS, vol. 11429, pp. 209–213. Springer (2019)
20. Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based whitebox fuzzing. In:
Gupta, R., Amarasinghe, S.P. (eds.) PLDI. pp. 206–215. ACM (2008)
21. Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn Verification
Framework. In: CAV. LNCS, vol. 9206, pp. 343–361. Springer (2015)
22. Jaffar, J., Murali, V., Navas, J.A.: Boosting concolic testing via interpolation. In:
Meyer, B., Baresi, L., Mezini, M. (eds.) ESEC/FSE. pp. 48–58. ACM (2013)
23. King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385–
394 (1976)
24. Komuravelli, A., Gurfinkel, A., Chaki, S.: SMT-Based Model Checking for Recur-
sive Programs. In: CAV. LNCS, vol. 8559, pp. 17–34 (2014)
25. Le, H.M.: Llvm-based hybrid fuzzing with libkluzzer (competition contribution).
In: Wehrheim, H., Cabot, J. (eds.) FASE. LNCS, vol. 12076, pp. 535–539. Springer
(2020)
26. Mathis, B., Gopinath, R., Mera, M., Kampmann, A., oschele, M., Zeller, A.:
Parser-directed fuzzing. In: McKinley, K.S., Fisher, K. (eds.) PLDI. pp. 548–560.
ACM (2019)
27. de Moura, L.M., Bjørner, N.: Z3: An Efficient SMT Solver. In: TACAS. LNCS,
vol. 4963, pp. 337–340. Springer (2008)
28. Sen, K., Marinov, D., Agha, G.: CUTE: a concolic unit testing engine for C. In:
Wermelinger, M., Gall, H.C. (eds.) FSE. pp. 263–272. ACM (2005)
29. Serebryany, K.: Continuous fuzzing with libfuzzer and addresssanitizer. In: SecDev.
p. 157. IEEE Computer Society (2016)
30. Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Liang, P., Nori, A.V.: A data
driven approach for algebraic loop invariants. In: ESOP. LNCS, vol. 7792, pp.
574–592. Springer (2013)
31. Vikram, V., Padhye, R., Sen, K.: Growing A test corpus with bonsai fuzzing. In:
ICSE. pp. 723–735. IEEE (2021)
32. Visser, W., Pasareanu, C.S., Khurshid, S.: Test input generation with java
pathfinder. In: Avrunin, G.S., Rothermel, G. (eds.) ISSTA. pp. 97–107. ACM
(2004)
33. ustholz, V., Christakis, M.: Targeted greybox fuzzing with static lookahead anal-
ysis. In: Rothermel, G., Bae, D. (eds.) ICSE. pp. 789–800. ACM (2020)
34. Zalewski, M.: American Fuzzy Lop, https://lcamtuf.coredump.cx/afl/
Maximizing Branch Coverage with Constrained Horn Clauses 271
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
272 I. Zlatkin and G. Fedyukovich
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
Efficient Analysis of Cyclic Redundancy
Architectures via Boolean Fault Propagation
Marco Bozzano , Alessandro Cimatti ,
Alberto Griggio , and Martin Jon´s()
Fondazione Bruno Kessler, Trento, Italy
{cimatti,bozzano,griggio,mjonas}@fbk.eu
Abstract. Many safety critical systems guarantee fault-tolerance by us-
ing several redundant copies of their components. When designing such
redundancy architectures, it is crucial to analyze their fault trees, which
describe combinations of faults of individual components that may cause
malfunction of the system. State-of-the-art techniques for fault tree com-
putation use first-order formulas with uninterpreted functions to model
the transformations of signals performed by the redundancy system and
an AllSMT query for computation of the fault tree from this encoding.
Scalability of the analysis can be further improved by techniques such as
predicate abstraction, which reduces the problem to Boolean case.
In this paper, we show that as far as fault trees of redundancy archi-
tectures are concerned, signal transformation can be equivalently viewed
in a purely Boolean way as fault propagation. This alternative view has
important practical consequences. First, it applies also to general re-
dundancy architectures with cyclic dependencies among components, to
which the current state-of-the-art methods based on AllSMT are not
applicable, and which currently require expensive sequential reasoning.
Second, it allows for a simpler encoding of the problem and usage of
efficient algorithms for analysis of fault propagation, which can signif-
icantly improve the runtime of the analyses. A thorough experimental
evaluation demonstrates the superiority of the proposed techniques.
1 Introduction
Fault-tolerance is a fundamental property of safety critical systems that enables
their safe operation even in the presence of faults. There are many ways to
ensure fault-tolerance, often based on redundancy: spare parts are available for
backup and are ready to take over with different degrees of promptness (e.g.,
hot/warm/cold standby), or with multiple replicas running in parallel. The latter
is a common approach to fault-tolerance in computer-based control systems,
where the results computed by the independent replicas are combined together
by means of voters. The idea dates back to the pioneering space application in
Saturn Launch Vehicle [12], and has then been adopted in the Primary Flight
Computer [19] of the Boeing 777. The idea is becoming prominent with the
advent of modern Integrated Modular Avionics [16], a cost-effective solution for
the management of highly intensive software control systems.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 273–291, 2022.
https://doi.org/10.1007/978-3-030-99527-0_15
M3
M1
M2
(a) Reference non-redundant system.
M3
M1
M1
M1
V
M2
M2
M2
V
(b) TMR redundant system with three
replicas of modules M1,M2, whose results
are combined by a voter.
Fig. 1: Network of computational modules with cyclic dependencies, extended by
triple modular redundancy.
M
M
M
V
(a) V111
M
M
M V
(b) V001
M
M
M V
(c) V011
M
M
M
V
V
(d) V122
M
M
M
V
V
V
(e) V123
Fig. 2: Selected ways of extending a single reference module Mwith triple mod-
ular redundancy (using 1, 2, and 3 voters) [6].
One of the most used instances of the approach to redundancy by using
module replicas is the triple modular redundancy (tmr) schema, in which the
computational modules are replaced by three redundant copies, whose results can
be combined by one to three voters. An example of using tmr to add redundancy
to a reference non-redundant architecture is shown in Figure 1. Note that there
are multiple ways of combining the results of a single triplicated computational
module by voters, some of which are shown in Figure 2[6].
Assessing the actual degree of fault-tolerance of a redundant architecture
is directly related to the construction and analysis of the corresponding fault
tree [17]. A fault tree describes the combinations of failures of individual com-
ponents that may cause higher-level malfunction, e.g., bring the system into a
dangerous state. Such combinations are traditionally called cut sets. Given the
set of all cut sets of the system, a fault tree can be reconstructed. Subsequently,
from the fault tree expressed as a Binary Decision Diagram, it is possible to
compute the reliability of the system from the reliability measures of the com-
ponents, and to synthesize the analytical form of the reliability function [6].
In this paper, we tackle the problem of automatically analyzing the reliabil-
ity of redundancy architectures with parallel replicas and voting. We propose a
general framework that encompasses also redundancy architectures with cyclic
dependencies among components, such as the system from Figure 1, to which
274 M. Bozzano et al.
current state-of-the-art approaches [6] are not applicable. The modeling is based
on symbolic transition systems over the quantifier-free theory of linear real arith-
metic and uninterpreted functions (UFLRA). In particular, real numbers are
used to represent the signals of the architecture and multiple instances of the
same uninterpreted function symbol are used to represent component replicas.
The modeling framework is a strict generalization of the combinational approach
proposed in [4,5], that only allows for acyclic architectures.
As the main contribution, we propose an analysis technique based on the
reduction to fault propagation graphs over Boolean structures [7]. We prove that
the reduction is correct: the signal transformation performed by a redundancy
architecture can be equivalently viewed in a Boolean way as fault propagation.
We carry out a systematic experimental evaluation on the set of redundancy
architectures with cyclic dependencies to evaluate scalability of the proposed so-
lution. Moreover, we perform evaluation on acyclic redundancy architectures to
compare the performance against the state-of-the-art approach based on pred-
icate abstraction [5,6], which can be applied only to redundancy architectures
without cycles. The proposed approach proves to be very scalable, being able to
analyze cyclic architectures with thousands of nodes, and is dramatically more
efficient than a direct reduction to model checking of symbolic transition systems
over UFLRA. In the restricted set of acyclic benchmarks, the proposed approach
provides better performance even over the optimized method proposed in [5] and
extended in [6] that adopts a structural form of predicate abstraction to improve
over basic AllSMT [14].
The paper is structured as follows. In Section 2, we present logical preliminar-
ies and basic notions of fault propagation graphs. In Section 3, we describe the
framework of redundancy architectures with cycles. In Section 4, we present the
reduction to fault propagation and prove its correctness. In Section 5, we discuss
the related work. The experiments are presented in Section 6. In Section 7, we
draw some conclusions and discuss some directions for future work.
2 Preliminaries
2.1 General Background
In this section, we explain the basic mathematical conventions that are used in
the paper. We assume that the reader is familiar with standard first-order logic
and the basic ideas of Satisfiability Modulo Theories (smt), as presented e.g.
in [1]. A theory in the smt sense is a pair (Σ, C), where Σis a first-order signature
and Cis a class of models over Σ. We use the standard notions of interpretation,
assignment, model, satisfiability, validity, and logical consequence. We refer to
0-arity predicates as Boolean variables, and to 0-arity uninterpreted functions
as (theory) variables. We denote variables with x, y, . . . , formulas with φ, ψ, . . . ,
and uninterpreted functions with f, g, . . . , possibly with subscripts. We denote
vectors with ·(e.g. x), and individual components with subscripts (e.g. xj). We
denote the domain of Booleans with B={⊤,⊥}. If x1, . . . , xnare variables and
Efficient Analysis of Cyclic Redundancy Architectures via BFP 275
φis a formula, we write φ(x1, . . . , xn) to indicate that all the variables occurring
free in φare in x1, . . . , xn. If φis a formula without uninterpreted functions and
µis a function that maps each free variable of φto a value of the corresponding
sort, [[φ]]µdenotes the result of the evaluation of φunder this assignment. A
Boolean formula is called positive if it does not use other logical connectives
than conjunctions and disjunctions.
In this paper, we shall use the theory of linear real arithmetic (LRA), in
which the numeric constants and the arithmetic and relational operators have
their standard meaning, extended with uninterpreted functions (UF), whose in-
terpretation is not fixed in C, and with voters (V), which are k-ary functions
whose interpretation is the majority function defined as below. For simplicity,
we consider only voters with odd arity as even-arity voters are rarely used in
practice. However, our approach can be extended to support even-arity voters.
Definition 1. The k-ary majority function majority :RkRfor an odd k > 0
is defined by majority(x) = yif there is ysuch that y=xjfor at least k/2
distinct jand majority(x) = x1otherwise.
Given a set of variables x, we denote with xthe set {x|xx}. A symbolic
transition system Sis a triple (x, I(x), T (x, x)), where xis a set of variables, and
I(x), T(x, x) are formulae over some signature. An assignment to the variables
in xis a state of S. A state sis initial iff it is a model of I(x), i.e., s|=I(x).
The states s, sdenote a transition iff ss|=T(x, x), also written T(s, s). A
trace is a sequence of states s0, s1, . . . such that s0is initial and T(si, s
i+1) for
all i. We denote traces with π, and with πjthe j-th element of π. A state sis
reachable in Siff there exists a trace πsuch that πi=sfor some i.
2.2 Fault Propagation Graphs
In this section we briefly introduce the necessary notions of fault propagation,
and in particular the formalism of symbolic fault propagation graphs. Intuitively,
fault propagation graphs can be used to describe how failures of some compo-
nents of a given system can cause the failure of other components of a system.
In an explicit (hyper)graph representation, components can be represented by
nodes, and dependencies by edges among them, with the meaning that an edge
from component c1to component c2states that the failure of c1can cause
the failure (propagation) of c2. In the symbolic representation adopted here,
we model components as Boolean variables (where means “not failed” and
means “failed”), and express the dependencies as Boolean formulae encoding the
conditions that can lead to the failure of each component. The basic concepts
are formalized in the following definitions. For more information, we refer to [7].
Definition 2 (Fault propagation graph). A symbolic fault propagation graph
(fpg) is a pair (C, canFail), where Cis a finite set of system components
and canFail is a function that assigns to each component ca Boolean formula
canFail (c)over the set of variables C.
276 M. Bozzano et al.
Definition 3 (Trace of FPG). Let Gbe a fault propagation graph (C, canFail ).
Astate of Gis a function from Cto B. A trace of Gis a sequence of states
π=π0π1. . . (BC)ωsuch that all i > 0and cCsatisfy (i) πi(c) = πi1(c)
or (ii) πi1(c) = and πi(c) = [[canFail (c)]]πi1.
Example 1 ([7]). Consider a system with components control on ground (g),
hydraulic control (h), and electric control (e) such that gcan fail if both h
and ehave failed, hcan fail if ehas failed, and ecan fail if hhas failed. This
system can be modeled by a fault propagation graph ({g,e,h},canFail), where
canFail (g) = he,canFail(h) = e, and canFail(e) = h.
One of the traces of this system is {g7→ ,h7→ ,e7→ ⊥}{g7→ ,h7→
,e7→ ⊤}{g7→ ,h7→ ,e7→ ⊤}ω, where his failed initially, which causes
failure of ein the second step, and the failures of hand etogether cause a failure
of gin the third step.
Fault propagation graphs are often used to identify sets of initial faults that
can lead the system to a dangerous or unwanted state (usually called a top level
event). Such sets of initial faults are called cut sets.
Definition 4 (Cut set). Let Gbe a fault propagation graph G= (C, canFail )
and φa positive Boolean formula, called top level event. The assignment cs :C
Bis called a cut set of Gfor φif there is a trace πof Gthat starts in the state cs
and there is some k0such that πk|=φ. A cut set cs is called minimal cut set
if it is minimal with respect to the pointwise ordering of functions BC, i.e., there
is no other cut set cssuch that {cC|cs(c) = ⊤} {cC|cs(c) = ⊤}.
For brevity, when talking about cut sets, we often mention only the compo-
nents that are set to by the cut set.
Example 2 ([7]). The minimal cut sets of the fpg from Example 1for the top
level event φ=gare {g},{h}, and {e}. These three cut sets are witnessed by
the following traces:
1. {g7→ ,h7→ ,e7→ ⊥}ω,
2. {g7→ ,h7→ ,e7→ ⊥}{g7→ ,h7→ ,e7→ ⊤}{g7→ ,h7→ ,e7→ ⊤}ω,
3. {g7→ ,h7→ ,e7→ ⊤}{g7→ ,h7→ ,e7→ ⊤}{g7→ ,h7→ ,e7→ ⊤}ω.
Note that the fpg has also other cut sets, such as {g,e},{h,e}, and {g,h,e},
which are not minimal.
In the following, we work with fault propagation graphs whose all canFail
formulas are positive. Such fault propagation graphs are called monotone. Note
that the definition of trace ensures that in each trace, if a component cis set to
in a state πi, it is in all the subsequent states πjfor j > i. This ensures
that each trace eventually reaches a fixed point. Moreover, before reaching this
fixed point, the trace can contain at most |C|distinct states.
For monotone fpgs, there is an efficient algorithm for minimal cut set enu-
meration [7]. This approach consists in enumerating of the minimal models of a
specific LRA formula, in which theory constraints are used only if the input fpg
contains cycles (and which therefore is purely Boolean for acyclic fpgs).
Efficient Analysis of Cyclic Redundancy Architectures via BFP 277
3 Cyclic Redundancy Architectures
In this section, we describe the framework adopted to model redundancy ar-
chitectures, in form of a restricted class of symbolic transition systems modulo
UFLRA. We call this restricted class transition systems with uninterpreted func-
tions and voters (UF+V TS).1This modeling framework is more expressive than
mere smt formulas modulo UFLRA, which were used in the previous works on
analysis of redundancy architectures [6], as it can express architectures that
contain cyclic dependencies among the modules.
Definition 5 (UF+V transition system). Atransition system with unin-
terpreted functions and voters is a tuple (VS, Vin, Vinit , Tnext , Tinit), where
VSis a finite set of real-valued signal variables;
Vin with VSVin =is a finite set of real-valued input variables;
Vinit is a finite set of real-valued initial value variables;
Tnext :VSExpr is a transition function, where Expr is the set of all
expressions of form f(x1, x2, . . . , xk)for k0,xi(VSVin), and where f
is either an uninterpreted function symbol of arity kor the function symbol
voterkwith an odd k > 0;
Tinit is an initial value mapping that assigns an initial value variable Tinit(v)
Vinit to each signal vVSfor which Tnext(v) = f(x)for an uninterpreted f.
A UF+V transition system is called well formed if it does not contain cyclic
dependencies among voters, i.e., there is no sequence v1. . . vnof signal variables
such that v1=vnand each viwith i > 0 satisfies Tnext(vi) = voterk(x1, . . . , xk)
with xj=vi1for some 1 jk. For well formed UF+V TS, we can define
voter depth vd :VSVin Nas the unique solution to the following set of
equations: vd(in) = 0 for each in Vin ,vd(s) = 0 for each vVSsuch that
Tnext(v) = f(x1, x2, . . . , xk), and vd(v) = max{vd(xi)|1ik}+ 1 for each
vVSsuch that Tnext(v) = voterk(x1, x2, . . . , xk).
In the rest of the paper, we assume that all UF+V TS are well formed. In the
rest of this section, let us fix an arbitrary well formed UF+V transition system
S= (VS, Vin, Vinit , Tnext , Tinit).
We now give a formal definition of the behavior of the UF+V system in pres-
ence of faults. Intuitively, we are given the set Faults of faulty signal-producing
components of the system, which do not have to behave correctly: a faulty com-
ponent neither has to start in its specified initial value nor respect its transition
function.
Definition 6 (Trace of UF+V TS). A state of a UF+V transition system S
is an arbitrary assignment of real numbers to signal and input variables s: (VS
Vin)R.
1Note than although UF+V TS and the related concepts can be defined directly
in terms of UFLRA symbolic transition systems, we chose to make the definition
explicit to simplify the presentation and proofs.
278 M. Bozzano et al.
Efficient Analysis of Cyclic Redundancy Architectures via BFP 279
The sequence of states π=π0π1. . . (RVSVin )ωis called a trace of the
system Sfor the fault set Faults VS, input stream ι=ι0ι1. . . (RVin )ω,
initial value assignment Init: Vinit R, and interpretation [[ ]], which to each
uninterpreted function symbol of arity kassigns a function [[f]] : RkR, if:
πi(in) = ιi(in)for all i0and in Vin .
For vVS\Faults such that Tnext(v) = f(x1, . . . , xk)with an uninterpreted
function symbol f, it is the case that π0(v) = Init(Tinit (v)) and all i > 0
satisfy πi(v) = [[f]](πi1(x1), . . . , πi1(xk)).
For all i0and vVS\Faults such that Tnext (v) = voter k(x1, . . . , xk), it
is the case that πi(v) = majority(πi(x1), . . . , πi(xk)).
Traces for the fault set Faults =are called nominal.
Note that each uninterpreted module needs one time step to compute its
result, while the results of voters are instantaneous. The time delay for modules
allows cyclic dependencies among modules, while no delay for voters gives the
expected semantics to architectures where some replicas of a module are guarded
by a voter and others are not, such as in schemas from Figures 2b and 2c.
Example 3. Consider the example from Figure 1, where the reference system
with 3 modules M1,M2, and M3is extended with tmr such that the modules
M1and M2are replaced by three replicas whose results are combined by a voter.
We can represent the redundancy version of the system as a UF+V TS as
follows. The nominal behavior of the modules M1,M2, and M3is represented by
binary uninterpreted functions f1,f2, and f3, respectively. Further, we represent
initial values of M1,M2,M3by variables initm1,init m2, and initm3respectively.
Finally, we represent the output of i-th replica of each module Mjby a signal
variable xi
jand the output of the voter corresponding to the module Mjby a
signal variable xv
j.
This gives the UF+V transition system S= (VS,{in1,in 2}, Vinit , Tnext , Tinit),
with VS={x1
1, x2
1, x3
1, xv
1, x1
2, x2
2, x3
2, xv
2, x1
3},Vinit ={initmj|j {1,2,3}}, and
Tnext(xi
1) = f1(in1, xv
2) for 1 i3, Tinit (xi
1) = initm1for 1 i3,
Tnext(xi
2) = f2(in2, xv
1) for 1 i3, Tinit (xi
2) = initm2for 1 i3,
Tnext(x1
3) = f3(xv
1, xv
2), Tinit(x1
3) = initm3,
Tnext(xv
j) = voter3(x1
j, x2
j, x3
j) for j {1,2}.
We define the class of redundancy transition systems, where the only pur-
pose of all voters is to recognize and repair outputs of failed components; more
specifically, if all components behave correctly, the voters are not necessary.
Definition 7 (Redundancy UF+V TS). We call the system Saredundancy
UF+V transition system if in all its nominal traces, all inputs of each voter are
always identical. Formally, if πis any nominal trace of Sand if vis a variable
for which Tnext(v) = voter k(x), then
{πi(xj)|1jk}
= 1 for all i0.
Similarly to fpgs, a cut set is a set of faults that leads to the undesired behavior
of the system. In particular, given a set of signals that are considered as output
signals (or outputs) of the system, a cut set of the given UF+V TS is a set of
faults that can cause an incorrect value of at least one output.
Definition 8 ((Minimal) cut set). A fault set Faults VSis called a cut set
of Sfor a set of output signals Vout VSif there exist an input stream, initial
value assignment, and an interpretation such that values of output signals of
some trace πfor the fault set Faults differ from the outputs of the nominal trace
πnom with the same input stream, initial values, and interpretation, i.e., there
is c0and oVout for which πc(o)=πnom
c(o). A cut set is called minimal
(mcs) if it is minimal in terms of set inclusion.
Since the redundancy UF+V TS form a subclass of UFLRA transition systems,
there is a straightforward procedure for minimal cut set enumeration. As in
the case of combinational systems [6], one can construct a miter system, which
consists of two copies of the architecture: the first is allowed to fail and the second
is constrained to behave nominally. Minimal cut sets can then be obtained by
using a technique based on symbolic model checking [3] to enumerate all minimal
assignments to fault variables under which it is possible to reach some state in
which the outputs of the two copies differ.
4 Reducing Redundancy UF+V TS to Fault Propagation
Graphs
In this section, we show the main result of the paper, which is that minimal
cut set enumeration of redundancy UF+V transition systems can be reduced to
minimal cut set enumeration of Boolean fault propagation graphs, which is more
efficient than mcs enumeration based on miter construction and model checking.
4.1 Reduction
We for each UF+V system Sdefine a corresponding fpg SB. The components
of SBcorrespond to the signal variables of the original system S. With a slight
abuse of notation, we use the same names for the original real-valued signal
variables of Sand the components of SB, although they have different types.
Intuitively, the reduction ensures that each component vof SBcan fail if and
only if there is a trace of Sin which the value of the signal variable vdeviates
from its nominal value.
Definition 9. Let S= (VS, Vin, Vinit, Tnext , Tinit )be a UF+V TS. We define
a corresponding fpg SB= (VS,canFail ), where canFail (v) = WvxVSvif
Tnext(v) = f(x)and canFail(v) = atLeastk/2(xVS)if Tnext (v) = voter k(x),
using the definition atLeast m(X) = WYX
|Y|=m
VyYy.2
2Note that there are more efficient and compact encodings for the atLeast con-
straint [18]; we use the most simple one for presentation purposes.
280 M. Bozzano et al.
Example 4. Consider the transition system Sfrom Example 3. The correspond-
ing fault propagation graph is SB= ({x1
1, x2
1, x3
1, xv
1, x1
2, x2
2, x3
2, xv
2, x1
3},canFail ),
where
canFail (xi
1) = xv
2for all 1 i3,canFail(xi
2) = xv
1for all 1 i3,
canFail (x1
3) = xv
1xv
2,
canFail (xv
1) = atLeast2(x1
1, x2
1, x3
1),canFail (xv
2) = atLeast2(x1
2, x2
2, x3
2).
4.2 Correctness
We show that the reduction preserves the cut sets. In the rest of the section, let
S= (VS, Vin, Vinit , Tnext , Tinit) be an arbitrary redundancy UF+V TS, Faults
VSbe an arbitrary fault set, and Vout VSbe an arbitrary set of output signals.
First, we show that each cut set of Scorresponds to a cut set of SB.
Lemma 1. If Faults is a cut set of Sfor the set of outputs Vout, then cs defined
as cs(v) = iff vFaults is a cut set of SBfor the top level event WoVout o.
Proof. Let Faults be a cut set of Sfor some trace πfor some ι, Init, and [[ ]].
Let πnom be the corresponding nominal trace. Define the trace πBof SBas
πB
0=cs and for all i > 0 define πB
iby πB
i(v) = if πB
i1(v) = and πB
i(v) =
[[canFail (v)]]πB
i1if πB
i1(v) = . In other words, πBis the unique trace starting
in cs in which all the components fail as soon as possible. By monotonicity, the
trace πBhas a fixed point, i.e., there is nsuch that πB
n=πB
nfor all n> n.
We show that πBsatisfies πB
n(o) = for some oVout and thus cs is a cut
set for the top level event WoVout o. To do this, we prove by induction on iand
on the voter depth vd(v)3that for all vVSand i0, πi(v)=πnom
i(v) implies
πB
n(v) = . We distinguish three cases:
If vFaults, then πB
0(v) = . From the definition of πB, this implies that
πB
l(v) = for all l0. In particular, πB
n(v) = .
If v∈ Faults and Tnext(v) = f(x1, . . . , xk), we distinguish two cases:
If i= 0: since π0(v)=πnom
0(v), then it must be the case that π0(v)=
Init(Tinit(v)), therefore vFaults. This is a contradiction.
If i > 0: then πi(v)=πnom
i(v) by definition implies
[[f]](πi1(x1), . . . , πi1(xk)) = [[f]](πnom
i1(x1), . . . , πnom
i1(xk))
and hence πi1(xj)=πnom
i1(xj) for some 1 jkbecause [[f]] is a
function. Since πi1(in) = πnom
i1(in) holds for all in Vin, we know that
xjVS. Therefore the induction hypothesis implies πB
n(xj) = and
thus πB
n+1(v) = because πB
nsatisfies canFail (v). Since πB
nwas chosen
as the fixed point of πB, this implies πB
n(v) = πB
n+1(v) = .
3Induction on the voter depth is employed because UF+V transition systems propa-
gate results of voters instantaneously.
Efficient Analysis of Cyclic Redundancy Architectures via BFP 281
If v∈ Faults and Tnext(v) = voter k(x1, . . . , xk), then πi(v)=πnom
i(v) for
any i0 by definition implies
majority(πi(x1), . . . , πi(xk)) =majority(πnom
i(x1), . . . , πnom
i(xk)).(1)
Since Sis a redundancy TS, all πnom
i(xj) are equal and the disequality (1)
implies that πi(xj)=πnom
i(xj) for at least k/2of xj. All these xjare not in
Vin and must therefore be in VS. By definition of voter depth, vd(xj)< vd(v)
for all these xj. Therefore by the induction hypothesis πB
n(xj) = for at
least k/2of xjand thus πB
n+1(v) = because πB
nsatisfies canFail (v). This
again implies πB
n(v) = πB
n+1(v) = because πB
nis the fixed point of πB.
This finishes the proof: if Faults is a cut set, πc(o)=πnom
c(o) for some c0
and oVout, and thus πB
n(o) = . Therefore we know that πB
n|=WoVout oand
thus cs is a cut set of SB.
For the converse direction, we for each fault set devise a trace of the UF+V
TS Sthat propagates all the possible deviations from the nominal value. We call
this trace maximally fault-propagating. In this trace, all signal values are from
the set {0,1}, all nominal signal values are 0 and become 1 only as a result of
a fault. Moreover, if there is a trace for the given fault set in which a signal
deviates from its nominal value, the value of the corresponding signal in the
maximally fault-propagating will be 1.
Definition 10 (Maximally fault-propagating trace). Let Sbe a UF+V
TS. Define
ιi(vin )=0for all i0,vin Vin, i.e., ιis a stream of constant zero inputs;
Init(vinit )=0for each vinit Vinit; and
[[f]](x1, . . . , xk)=1Q1ik(1xi)for each uninterpreted f, i.e., the output
is 0if all inputs are 0; it is 1if at least one input is 1.
The maximally fault-propagating trace of Sfor a fault set Faults, denoted as πfp,
is the unique trace of Sfor the above input stream, initial values, interpretation,
and the given fault set that for all i0and vsatisfies πfp
i(v)=1whenever
vFaults.
Observe that the trace πfp is monotone, i.e., once a signal gets set to 1, it
stays set to 1 for the rest of the trace. This is formalized by the following lemma,
which can be proven by induction on i,ji, and voter depth of v.
Lemma 2. Let Sbe a UF+V TS, Faults a fault set, and πfp the corresponding
maximally fault-propagating trace. Then πfp
i(v) = 1 for each i0and vVS
implies πfp
j(v)=1for all j > i.
We can now show that if a trace of the fpg version SBof a UF+V TS S
triggers the top level event for some initial fault assignment, there is a trace in
the original system Sfor the corresponding fault set whose output deviates from
the nominal one; namely it is the trace πfp .
282 M. Bozzano et al.
Lemma 3. If cs defined as cs(v) = iff vFaults is a cut set of SBfor the
top level event WoVout o, then Faults is a cut set of Sfor the set of outputs Vout.
Proof. Suppose that the trace πBof SBwith the initial state cs satisfies πB
c(o) =
for some c0 and oVout. We show that Faults is a cut set of Sfor the set
of output signals Vout. Let πfp be the maximally fault-propagating trace of Sfor
Faults and πnom the corresponding nominal trace.
We show that for each i0 and vVS, the condition πB
i(v) = implies
πfp
i(v)=πnom
i(v). We proceed by induction on i:
For i= 0: If cs =πB
0(v) = , then vFaults and thus πfp
0(v)=πnom
0(v)
because πfp
0(v) = 1 and πnom
0(v) = 0.
For i > 0: Assume that πB
i(v) = . We distinguish four cases:
If vFaults then πfp
i(v) = 1 and so πfp
i(v)=πnom
i(v) = 0.
If πB
i1(v) = , then we get that πfp
i1(v)=πnom
i1(v) from the induction
hypothesis, and thus πfp
i(v)=πnom
i(v) by Lemma 2.
If v∈ Faults,πB
i1(v) = , and Tnext(v) = f(x1, . . . , xk), then πB
i(v) =
implies that πB
i1(xj) = for at least one xjVS. From the induction
hypothesis, we get that πfp
i1(xj)=πnom
i1(xj) and since πnom
i1(xj) = 0,
we know that πfp
i1(xj) = 1. By the definition of [[f]] in πfp
i, we know that
also πfp
i(v) = 1, which is not equal to πnom
i(v) = 0.
If v∈ Faults,πB
i1(v) = , and Tnext(v) = voter k(x1, . . . , xk), then
πB
i(v) = implies that at least k/2of xjVSsatisfy πB
i1(xj) = .
From the induction hypothesis we get that πfp
i1(xj)=πnom
i1(xj) for
these xjand since πnom
i1(xj) = 0, we know that πfp
i1(xj) = 1 for at least
k/2of xj. By the definition of majority function, we know that also
πfp
i1(v) = 1 and thus, by Lemma 2, also πfp
i(v)=1=0=πnom
i(v).
Therefore πB
c(o) = implies πfp
c(o)=πnom
c(o) and Faults is a cut set of S.
Theorem 1. For each fault set Faults, the following two claims are equivalent:
1. The set Faults is a cut set of Sfor the set of output signals Vout.
2. The assignment cs defined as cs(v) = iff vFaults is a cut set of SBfor
the top level event WoVout o.
5 Related Work
Approaches to the analysis of redundant architectures include [6], which ad-
dresses the generation of the reliability function for a class of generic architec-
tures including tree- and dag-like structures. The computation of the reliability
is based on predicate abstraction and bdds. Our work extends and improves
the approach of [6] in several directions. First, it supports cyclic architectures,
to which predicate abstraction as defined in [6] cannot be applied. Second, it
does not require that the redundancy is localized within small blocks (manually
Efficient Analysis of Cyclic Redundancy Architectures via BFP 283
defined by the user or in a library), to which the predicate abstraction can be
applied. In contrast, our approach applies the abstraction directly on the level of
individual modules and voters. Moreover, the approach of [6] needs to compute
the abstracted versions of the specified blocks upfront by quantifier elimination.
Finally, our approach outperforms the approach of [6].
Other works on redundant architecture analysis are either based on ad-hoc
algorithms [13] which are not fully automated, and require discretization and
additional input data from the user, or use simulation techniques such as Monte
Carlo analysis [15], which do not examine the system behaviors exhaustively.
A classification of fault tolerant architectures is presented in [10]. The clas-
sification is based on three different patterns, namely comparison, voting, and
sparing, that can be composed to define generic and possibly cyclic architectures.
A follow-up work [11] builds upon these patterns and introduces strategies to
evaluate several architectures at once (family-based analysis of redundant ar-
chitectures) by reduction to Discrete Time Markov Chains. Our techniques are
orthogonal, and could be applied on top of the approach proposed in [11].
The concept of maximally fault-propagating trace used to prove Lemma 3is
similar to the concept of maximally diverse interpretations [8], which can be used
to efficiently reduce a formula in the positive fragment of EUF logic to a sat
formula. Both concepts restrict the interpretations of uninterpreted functions to
a specific subclass, which exhibits all the relevant behaviors.
6 Experimental Evaluation
We have performed an experimental evaluation of the proposed approach for
minimal cut set enumeration in order to answer the following research questions:
RQ1 How does the new approach scale on redundancy architectures with cycles?
RQ2 On redundancy architectures with cycles, how do the run-times compare
against the approach based on the enumeration of minimal cut sets of the
miter system by a model checker?
RQ3 On redundancy architectures without cycles, how do the run-times com-
pare against the approach based on predicate abstraction (pa) and bdd-
based enumeration [6]?
RQ4 On redundancy architectures without cycles, what part of the runtime
difference is caused by the different reduction to a Boolean problem (fpg vs
pa) and what part is caused by a different solving approach of the resulting
Boolean problem (sat-based vs bdd-based)?
6.1 Benchmarks and Setup
To answer these research questions, we used four sets of redundancy systems:
Scalable cyclic systems This benchmark set contains two kinds of bench-
marks. For evaluation on redundancy architectures with a linear number
284 M. Bozzano et al.
M1
M2
M3
M4
. . .
. . .
M2k1
M2k
(a) Ladder.
M1
M2
M3
M4
. . .
. . .
M2k1
M2k
(b) Radiator.
M1M2. . . Mk
(c) Linear.
M1
M2
M3
M4
. . .
. . .
M2k1
M2k
(d) Rectangular.
Fig. 3: Scalable architectures used in the experimental evaluation.
of cycles, we have generated ladder-shaped (Figure 3a) architectures of all
lengths between 1 and 100. For evaluation on redundancy architectures with
a large number of cycles, we have generated radiator-shaped (Figure 3b) ar-
chitectures of all lengths between 1 and 50. For each of the architectures, we
have generated its three redundant versions by replacing each module by a
tmr block with one to three voters by using schemas from Figures 2b,2d,
and 2e. This yields systems with 2 ·length ·(3 + numVoters) signals.
Random cyclic systems We have generated 250 random cyclic redundancy
UF+V systems with 1 to 150 modules of arity between 1 and 3, randomly
generated 1 to 6 replicas of each module, and 1 to 6 voters of arity 3 or 5,
randomly connected to the replicas.
Scalable acyclic systems This benchmark set contains linear-shaped (Fig-
ure 3c) and rectangular-shaped (Figure 3d) architectures of all lengths be-
tween 1 and 200 that were used for evaluation of predicate abstraction tech-
nique [6]. As in the original paper, we have used redundant versions of the
systems with the modules replaced by a tmr block with one to three voters.
Random acyclic systems We have used randomly generated acyclic architec-
tures composed of randomly chosen tmr blocks that were also used in [6].
We have evaluated the following approaches for minimal cut set enumeration:
For the systems with cycles, we have generated their fpg version as described
in Section 4and also the UFLRA transition system implementing the miter
construction in the smv format, For enumeration of the minimal cut sets
of the fault propagation graphs, we have used the tool SMT-PGFDS [7]
(denoted as fpg in the experiments); for enumeration of the minimal cut
sets of miter systems, we have used the tool xSAP [2], which internally uses
an algorithm based on parametric IC3 [3] (denoted as ParamIC3).
For the systems without cycles, we have generated both their fpg version and
the description in the format of the tool OCRA [9] as used in [6]. Although
the fpgs could be solved by the tool SMT-PGFDS and the OCRA systems
can be solved by predicate abstraction, which is implemented in xSAP, and
its bdd-based engine [6], this would not compare only the effect of the re-
duction to the Boolean case, but also a confounding factor of the underlying
Efficient Analysis of Cyclic Redundancy Architectures via BFP 285
1
2
3
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
0.1
1
10
100
1000
T/O
Size of the architecture
Time (s)
Method
FPG
ParamIC3
Fig. 4: Solving time on ladder-shaped benchmarks. Divided according to the
number of voters per one reference module.
1
2
3
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
0.1
1
10
100
1000
T/O
Size of the architecture
Time (s)
Method
FPG
ParamIC3
Fig. 5: Solving time on radiator-shaped benchmarks. Divided according to the
number of voters per one reference module.
backend (sat-based in SMT-PGFDS and bdd-based in xSAP). To answer
RQ4, we have thus performed more fine-grained analysis as follows.
From each fpg, we generated the corresponding Boolean formula, which
is possible since the graph is acyclic [7]. We also generated the Boolean
formula obtained by predicate abstraction from each OCRA encoding. We
thus obtained two Boolean formulas for each system: one by reduction to
fault propagation (fp), and one by reduction by predicate abstraction (pa).
We have then used the sat-based enumeration algorithm of SMT-PGFDS
and also bdd-based enumeration algorithm of xSAP on both of these Boolean
formulas. This gives 4 combinations: fp-sat,fp-bdd,pa-sat,pa-bdd.
All experiments were executed on a cluster of 9 computational nodes, each
with Intel Xeon CPU X5650 @ 2.67GHz, 12 cpu and 96 GiB of ram. We have
used time limit 1 hour of wall-clock time and memory limit 16 GB for each
benchmark-solver pair. The detailed experimental results can be found at https:
//es-static.fbk.eu/people/mjonas/papers/tacas22 redarchs/.
6.2 Results for Cyclic Benchmarks
The comparison of running times of fpg-based and of model-checking-based ap-
proaches on the scalable cyclic benchmarks is shown in Figures 4and 5. Figure 4
shows a significant benefit of the technique based on fault propagation on the
ladder-shaped benchmarks; not only that it can enumerate cut sets of all the used
benchmarks, but its run-times are dramatically better. However, as can be seen
on Figure 5, the situation is different on the radiator-shaped benchmarks, which
contain a large number of cycles. Although the performance of technique based
286 M. Bozzano et al.
linear
rectangular
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
0.1
1
10
100
1000
T/O
0.1
1
10
100
1000
T/O
Size of the architecture
Time (s)
Method
FP−BDD
FP−SAT
PA−BDD
PA−SAT
Fig. 7: Solving time on scalable acyclic benchmarks. Divided by the architecture
and number of voters per one reference module.
on fault propagation is still superior to the model-checking-based technique, it
scales poorly on the systems with 2 and 3 voters per one tmr block. The answer
to RQ1 is thus that the proposed approach scales well if the number of cycles in
the system is not too large; if the number of cycles is large, the technique scales
worse, but nevertheless significantly better than the state-of-the-art technique
based on miter construction and model checking [3].
0.1
1
10
100
1000
T/O
0.1 1 10 100 1000T/O
Time ParamIC3 (s)
Time FPG (s)
Fig. 6: Solving time on
random cyclic bench-
marks.
The run-times on random cyclic benchmarks are
shown in Figure 6. The figure shows that the perfor-
mance of the proposed technique is better by sev-
eral orders of magnitude and can enumerate mini-
mal cut sets of 59 random systems that are out of
reach for the technique based on model checking.
Note that some of the systems are hard for both of
the approaches: both approaches timed out on 66
of the 250 benchmarks. Together with the results
for the ladder-shaped and radiator-shaped systems,
this answers RQ2: the technique proposed in this
paper has significantly better performance than the
state-of-the-art technique based on model checking.
There are two reasons of the observed perfor-
mance difference. First is the reduction of UFLRA
transition system to the Boolean one, which has
been also observed to bring significant benefit on acyclic systems in the case of
predicate abstraction [6]. Second is the underlying mcs-enumeration technique
applied the resulting fpg. This technique reduces the expensive sequential rea-
soning to an enumeration of minimal models of a single smt formula, which can
significantly improve performance [7].
6.3 Results for Acyclic Benchmarks
The comparison of the performance on acyclic scalable benchmarks is shown in
Figure 7. The results are divided according to the method used to reduce the
Efficient Analysis of Cyclic Redundancy Architectures via BFP 287
0.1
1
10
100
1000
0.1 1 10 100 1000
Time PA−BDD (s)
Time FP−SAT (s)
0.1
1
10
100
1000
0.1 1 10 100 1000
Time PA−SAT (s)
Time FP−SAT (s)
Fig. 8: Solving time on random acyclic benchmarks.
problem to Boolean case (fp vs. pa) and the technique used to enumerate the
minimal cut sets of the Boolean system (sat vs. bdd). Scatter plots of solving
times on random acyclic benchmarks can be seen on Figure 8.
The results show that the reduction of the problem to fault propagation and
using an off-the-shelf solver for enumeration of minimal cut sets of the resulting
Boolean system (i.e., fp-sat) is clearly superior to the state-of-the-art approach
based on predicate abstraction and bdd-based mcs enumeration (i.e., pa-bdd).
The difference between these two approaches is even several orders of magnitude
on scalable benchmarks and grows with the size of the system and its complexity.
The performance is also significantly better on the random benchmarks. This
answers RQ3 in favor of the technique proposed in this paper.
As for RQ4, Figures 7and 8show that both the different reduction technique
(fp vs. pa) and the solving technique (sat vs. bdd) play a role in this differ-
ence. However, the larger part of the runtime difference between the proposed
approach (fp-sat) and the state-of-the-art approach (pa-bdd) [6] is due to bet-
ter performance of sat-based enumeration. This insight is additional interesting
outcome of our our experiments. Nevertheless, for both of the enumeration ap-
proaches, the proposed reduction based on fault propagation provides better
performance than the state-of-the-art reduction by predicate abstraction.
7 Conclusions and Future Work
We have presented a framework for modeling redundancy architectures with
possible cyclic dependencies among the computational modules and we have
developed an efficient approach for enumeration of minimal cut sets of such
architectures. The experimental evaluation has shown that this approach dra-
matically outperforms the state-of-the-art approach based on model checking on
cyclic redundancy architectures and has a better performance than the state-of-
the-art approach based on predicate abstraction on acyclic architectures.
In the future, we plan to extend the approach to a more general class of voters
than majority voters. We also plan to extend the approach to support common
cause analysis for different component faults and possibly to synthesize an opti-
mal distribution of the modules of the architecture between the computational
nodes of a system such as Integrated Modular Avionics.
288 M. Bozzano et al.
References
1. Barrett, C.W., Sebastiani, R., Seshia, S.A., Tinelli, C.: Satisfiability modulo the-
ories. In: Biere, A., Heule, M., van Maaren, H., Walsh, T. (eds.) Handbook of
Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185, pp.
825–885. IOS Press (2009). https://doi.org/10.3233/978-1-58603-929-5-825
2. Bittner, B., Bozzano, M., Cavada, R., Cimatti, A., Gario, M., Griggio, A., Mattarei,
C., Micheli, A., Zampedri, G.: The xSAP safety analysis platform. In: Chechik, M.,
Raskin, J. (eds.) Tools and Algorithms for the Construction and Analysis of Sys-
tems - 22nd International Conference, TACAS 2016, Held as Part of the European
Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven,
The Netherlands, April 2-8, 2016, Proceedings. Lecture Notes in Computer Science,
vol. 9636, pp. 533–539. Springer (2016). https://doi.org/10.1007/978-3-662-49674-
9 31
3. Bozzano, M., Cimatti, A., Griggio, A., Mattarei, C.: Efficient anytime techniques
for model-based safety analysis. In: Kroening, D., Pasareanu, C.S. (eds.) Computer
Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA,
USA, July 18-24, 2015, Proceedings, Part I. Lecture Notes in Computer Science,
vol. 9206, pp. 603–621. Springer (2015). https://doi.org/10.1007/978-3-319-21690-
4 41
4. Bozzano, M., Cimatti, A., Mattarei, C.: Automated analysis of reliability architec-
tures. In: 2013 18th International Conference on Engineering of Complex Computer
Systems, Singapore, July 17-19, 2013. pp. 198–207. IEEE Computer Society (2013).
https://doi.org/10.1109/ICECCS.2013.37
5. Bozzano, M., Cimatti, A., Mattarei, C.: Efficient analysis of reliability architectures
via predicate abstraction. In: Bertacco, V., Legay, A. (eds.) Hardware and Software:
Verification and Testing - 9th International Haifa Verification Conference, HVC
2013, Haifa, Israel, November 5-7, 2013, Proceedings. Lecture Notes in Computer
Science, vol. 8244, pp. 279–294. Springer (2013). https://doi.org/10.1007/978-3-
319-03077-7 19
6. Bozzano, M., Cimatti, A., Mattarei, C.: Formal reliability analysis of
redundancy architectures. Formal Aspects Comput. 31(1), 59–94 (2019).
https://doi.org/10.1007/s00165-018-0475-1
7. Bozzano, M., Cimatti, A., Pires, A.F., Griggio, A., Jon´s, M., Kimberly, G.: Ef-
ficient SMT-Based Analysis of Failure Propagation. In: Silva, A., Leino, K.R.M.
(eds.) Computer Aided Verification - 33rd International Conference, CAV 2021,
Virtual Event, July 20-23, 2021, Proceedings, Part II. Lecture Notes in Computer
Science, vol. 12760, pp. 209–230. Springer (2021). https://doi.org/10.1007/978-3-
030-81688-9 10
8. Bryant, R.E., German, S.M., Velev, M.N.: Exploiting positive equality in a logic
of equality with uninterpreted functions. In: Halbwachs, N., Peled, D.A. (eds.)
Computer Aided Verification, 11th International Conference, CAV ’99, Trento,
Italy, July 6-10, 1999, Proceedings. Lecture Notes in Computer Science, vol. 1633,
pp. 470–482. Springer (1999). https://doi.org/10.1007/3-540-48683-6 40
9. Cimatti, A., Dorigatti, M., Tonetta, S.: OCRA: A tool for checking the refine-
ment of temporal contracts. In: Denney, E., Bultan, T., Zeller, A. (eds.) 2013 28th
IEEE/ACM International Conference on Automated Software Engineering, ASE
2013, Silicon Valley, CA, USA, November 11-15, 2013. pp. 702–705. IEEE (2013).
https://doi.org/10.1109/ASE.2013.6693137
Efficient Analysis of Cyclic Redundancy Architectures via BFP 289
10. Ding, K., Morozov, A., Janschek, K.: Classification of hierarchical fault-tolerant
design patterns. In: 15th IEEE Intl Conf on Dependable, Autonomic and Se-
cure Computing, 15th Intl Conf on Pervasive Intelligence and Computing,
3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science
and Technology Congress, DASC/PiCom/DataCom/CyberSciTech 2017, Orlando,
FL, USA, November 6-10, 2017. pp. 612–619. IEEE Computer Society (2017).
https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.108
11. Dubslaff, C., Ding, K., Morozov, A., Baier, C., Janschek, K.: Breaking the limits
of redundancy systems analysis. CoRR abs/1912.05364 (2019), http://arxiv.org/
abs/1912.05364
12. Haeussermann, W.: Description and Performance of the Saturn Launch Vehicle’s
Navigation, Guidance, and Control System. IFAC Proceedings Volumes 3(1), 275–
312 (1970). https://doi.org/https://doi.org/10.1016/S1474-6670(17)68785-8, 3rd
International IFAC Conference on Automatic Control in Space, Toulouse, France,
March 2-6, 1970
13. Hamamatsu, M., Tsuchiya, T., Kikuno, T.: On the Reliability of Cascaded TMR
Systems. In: Ishikawa, Y., Tang, D., Nakamura, H. (eds.) 16th IEEE Pacific
Rim International Symposium on Dependable Computing, PRDC 2010, Tokyo,
Japan, December 13-15, 2010. pp. 184–190. IEEE Computer Society (2010).
https://doi.org/10.1109/PRDC.2010.45
14. Lahiri, S.K., Nieuwenhuis, R., Oliveras, A.: SMT techniques for fast predicate
abstraction. In: Ball, T., Jones, R.B. (eds.) Computer Aided Verification, 18th
International Conference, CAV 2006, Seattle, WA, USA, August 17-20, 2006, Pro-
ceedings. Lecture Notes in Computer Science, vol. 4144, pp. 424–437. Springer
(2006). https://doi.org/10.1007/11817963 39
15. Lee, S., Jung, J., Lee, I.: Voting structures for cascaded triple modu-
lar redundant modules. IEICE Electronic Express 4(21), 657–664 (2007).
https://doi.org/10.1587/elex.4.657
16. Prisaznuk, P.J.: Integrated modular avionics. In: Proceedings of the IEEE 1992
National Aerospace and Electronics Conference@ m NAECON 1992. pp. 39–45.
IEEE (1992)
17. Ruijters, E., Stoelinga, M.: Fault tree analysis: A survey of the state-of-the-
art in modeling, analysis and tools. Comput. Sci. Rev. 15, 29–62 (2015).
https://doi.org/10.1016/j.cosrev.2015.03.001
18. Wynn, E.: A comparison of encodings for cardinality constraints in a SAT solver.
CoRR abs/1810.12975 (2018), http://arxiv.org/abs/1810.12975
19. Yeh, Y.: Triple-triple redundant 777 primary flight computer. In: 1996 IEEE
Aerospace Applications Conference. Proceedings. vol. 1, pp. 293–307 vol.1 (1996).
https://doi.org/10.1109/AERO.1996.495891
290 M. Bozzano et al.
Efficient Analysis of Cyclic Redundancy Architectures via BFP 291
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
Tools | Optimizations, Repair and
Explainability
Adiar
Binary Decision Diagrams in External Memory
Steffan Christ Sølvsten (
) , Jaco van de Pol ,
Anna Blume Jakobsen, and Mathias Weller Berg Thomasen
Aarhus University, Denmark {soelvsten,jaco}@cs.au.dk
Abstract. We follow up on the idea of Lars Arge to rephrase the Reduce
and Apply operations of Binary Decision Diagrams (BDDs) as iterative
I/O-efficient algorithms. We identify multiple avenues to simplify and
improve the performance of his proposed algorithms. Furthermore, we
extend the technique to other common BDD operations, many of which
are not derivable using Apply operations alone. We provide asymptotic
improvements to the few procedures that can be derived using Apply.
Our work has culminated in a BDD package named Adiar that is able
to efficiently manipulate BDDs that outgrow main memory. This makes
Adiar surpass the limits of conventional BDD packages that use recur-
sive depth-first algorithms. It is able to do so while still achieving a sat-
isfactory performance compared to other BDD packages: Adiar, in parts
using the disk, is on instances larger than 9.5 GiB only 1.47 to 3.69 times
slower compared to CUDD and Sylvan, exclusively using main memory.
Yet, Adiar is able to obtain this performance at a fraction of the main
memory needed by conventional BDD packages to function.
Keywords: Time-forward Processing ·External Memory Algorithms ·
Binary Decision Diagrams
1 Introduction
A Binary Decision Diagram (BDD) provides a canonical and concise representa-
tion of a boolean function as an acyclic rooted graph. This turns manipulation
of boolean functions into manipulation of graphs [10,11].
Their ability to compress the representation of a boolean function has made
them widely used within the field of verification. BDDs have especially found use
in model checking, since they can efficiently represent both the set of states and
the state-transition function [11]. Examples are the symbolic model checkers
NuSMV [14,15], MCK [17], LTSmin [19], and MCMAS [24] and the recently
envisioned symbolic model checking algorithms for CTL* in [3] and for CTLK
in [18]. Hence, continuous research effort is devoted to improve the performance
of this data structure. For example, despite the fact that BDDs were initially
envisioned back in 1986, BDD manipulation was first parallelised in 2014 by
Velev and Gao [35] for the GPU and in 2016 by Van Dijk and Van de Pol [16]
for multi-core processors [12].
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 295–313, 2022.
https://doi.org/10.1007/978-3-030-99527-0_16
The most widely used implementations of decision diagrams make use of
recursive depth-first algorithms and a unique node table [16,23,34]. Lookup of
nodes in this table and following pointers in the data structure during recursion
both pause the entire computation while missing data is fetched [21,26]. For large
enough instances, data has to reside on disk and the resulting I/O-operations
that ensue become the bottle-neck. So in practice, the limit of the computer’s
main memory becomes the upper limit on the size of the BDDs.
Related Work. Prior work has been done to overcome the I/Os spent while
computing on BDDs. David Long [25] achieved a performance increase of a fac-
tor of two by blocking all nodes in the unique node table based on their time
of creation, i.e. with a depth-first blocking. But, in [6] this was shown to only
improve the worst-case behaviour by a constant. Ochi, Yasuoka, and Yajima [28]
designed in 1993 breadth-first BDD algorithms that exploit a levelwise locality
on disk. Their technique was improved by Ashar and Cheong [8] in 1994 and
by Sanghavi et al. [31] in 1996. The fruits of their labour was the BDD library
CAL capable of manipulating BDDs larger than available main memory. Kun-
kle, Slavici and Cooperman [22] extended in 2010 the breadth-first approach to
distributed BDD manipulation.
The breadth-first algorithms in [8,28,31] are not optimal in the I/O-model,
since they still use a single hash table for each level. This works well in practice,
as long as a single level of the BDD can fit into main memory. If not, they still
exhibit the same worst-case I/O behaviour as other algorithms [6].
In 1995, Arge [5,6] proposed optimal I/O algorithms for the basic BDD
operations Apply and Reduce. To this end, he dropped all use of hash tables.
Instead, he exploited a total and topological ordering of all nodes within the
graph. This is used to store all recursion requests in priority queues, so they
get synchronized with the iteration through the sorted input stream of nodes.
Martin ˇ
Sm´erek implemented these algorithms in 2009 as they were described,
but the performance was disappointing, since the intermediate unreduced BDD
grew too large to handle in practice [personal communication, Sep 2021].
Contributions. Our work directly follows up on the theoretical contributions
of Arge in [5,6]. We simplified and improved on his I/O-optimal Apply and
Reduce algorithms. In particular, we modified and pruned the intermediate rep-
resentation, to prevent data duplication and to save on the number of sorting
operations. We also provide I/O-efficient versions of several other standard BDD
operations, where we obtain asymptotic improvements for the operations that
are derivable from Apply.
Our proposed algorithms and data structures have been implemented to cre-
ate a new easy-to-use and open-source BDD package, named Adiar. Our experi-
mental evaluation shows that Adiar is able to manipulate BDDs larger than the
given main memory available, with only an acceptable slowdown compared to a
conventional BDD library running exclusively in main memory.
296 S. C. Sølvsten et al.
1.1 Overview
The rest of the paper is organised as follows. Section 2covers preliminaries on
the I/O-model and Binary Decision Diagrams. We present our algorithms for
I/O-efficient BDD manipulation in Section 3. Section 4provides an overview
of the resulting BDD package, Adiar, and Section 5contains an experimental
evaluation of it. Our conclusions and future work are in Section 6.
2 Preliminaries
2.1 The I/O-Model
The I/O-model [1] allows one to reason about the number of data transfers be-
tween two levels of the memory hierarchy, while abstracting away from technical
details of the hardware, to make a theoretical analysis manageable.
An I/O-algorithm takes inputs of size N, residing on the higher level of
the two, i.e. in external storage (e.g. on a disk). The algorithm can only do
computations on data that reside on the lower level, i.e. in internal storage
(e.g. main memory). This internal storage can only hold a smaller and finite
number of Melements. Data is transferred between these two levels in blocks
of Bconsecutive elements [1]. Here, Bis a constant size not only encapsulating
the page size or the size of a cache-line but more generally how expensive it is
to transfer information between the two levels. The cost of an algorithm is the
number of data transfers, i.e. the number of I/O-operations, or just I/Os, it uses.
For all realistic values of N,M, and Bwe have that N/B < sort(N)N,
where sort(N),N/B ·logM/B(N/B) [1,7] is the sorting lower bound, i.e. it
takes (sort(N)) I/Os in the worst-case to sort a list of Nelements [1]. With
an M/B-way merge sort algorithm, one can obtain an optimal O(sort(N)) I/O
sorting algorithm [1], and with the addition of buffers to lazily update a tree
structure, one can obtain an I/O-efficient priority queue capable of inserting
and extracting Nelements in O(sort(N)) I/Os [4].
TPIE. The TPIE library [36] provides an implementation of I/O-efficient al-
gorithms and data structures such that the use of B-sized buffers is completely
transparent to the programmer. Elements can be stored in files that act like
lists. One can push new elements to the end of a file and read the next ele-
ments from the file in either direction, provided has next returns true. One can
also peek the next element without moving the read head. TPIE provides an
optimal O(sort(N)) external memory merge sort algorithm for its files. Further-
more, it provides an implementation of the I/O-efficient priority queue of [30] as
developed in [29], which supports the push,top and pop operations.
2.2 Binary Decision Diagrams
A Binary Decision Diagram (BDD) [10], as depicted in Fig. 1, is a rooted directed
acyclic graph (DAG) that concisely represents a boolean function BnB,
Adiar: Binary Decision Diagrams in External Memory 297
x2
>
(a) x2
x0
x1
>
(b) x0x1
x0
x1x1
>
(c) x0x1
x1
x2
>
(d) x1x2
Fig. 1: Examples of Reduced Ordered Binary Decision Diagrams. Leaves are
drawn as boxes with the boolean value and internal nodes as circles with the
decision variable. Low edges are drawn dashed while high edges are solid.
where B={>,⊥}. The leaves contain the boolean values and >that define
the output of the function. Each internal node contains the label iof the input
variable xiit represents, together with two outgoing arcs: a low arc for when
xi=and a high arc for when xi=>. We only consider Ordered Binary
Decision Diagrams (OBDD), where each unique label may only occur once and
the labels must occur in sorted order on all paths. The set of all nodes with label
jis said to belong to the jth level in the DAG.
If one exhaustively (1) skips all nodes with identical children and (2) removes
any duplicate nodes, then one obtains the Reduced Ordered Binary Decision Di-
agram (ROBDD) of the given OBDD. If the variable order is fixed, this reduced
OBDD is a unique canonical form of the function it represents [10].
The two primary algorithms for BDD manipulation are called Apply and
Reduce. The Apply computes the OBDD h=fgwhere fand gare OBDDs and
is a function B×BB. This is essentially done by recursively computing the
product construction of the two BDDs fand gand applying when recursing
to pairs of leaves. The Reduce applies the two reduction rules on an OBDD
bottom-up to obtain the corresponding ROBDD [10].
Common implementations of BDDs use recursive depth-first procedures that
traverse the BDD and the unique nodes are managed through a hash table [9,
16,20,23,34]. The latter allows one to directly incorporate the Reduce algorithm
of [10] within each node lookup [9,27]. They also use a memoisation table to
minimise the number of duplicate computations [16,23,34]. If the size Nfand
Ngof two BDDs are considerably larger than the memory Mavailable, each
recursion request of the Apply algorithm will in the worst case result in an I/O,
caused by looking up a node within the memoisation and following the low and
high arcs [6,21]. Since there are up to Nf·Ngrecursion requests, this results in
up to O(Nf·Ng) I/Os in the worst case. The Reduce operation transparently
built into the unique node table with a find-or-insert function can also cause an
I/O for each lookup within this table [21]. This adds yet another O(N) I/Os,
where Nis the number of nodes in the unreduced BDD.
Lars Arge provided in [5,6] a description of an Apply algorithm that is capable
of only using O(sort(Nf·Ng)) I/Os and a Reduce that uses O(sort(N)) I/Os
298 S. C. Sølvsten et al.
(see [6] for a detailed description). He also proved this to be optimal for both
algorithms, assuming a levelwise ordering of nodes on disk [6]. Our algorithms,
implemented in Adiar, differ from Arge’s in subtle non-trivial ways. We will not
elaborate further on his original proposal, since our algorithms are simpler and
better at conveying the time-forward processing technique he used. Instead, we
will mention where our Reduce and Apply algorithms differ from his.
3 BDD Manipulation by Time-forward Processing
Our algorithms exploit the total and topological ordering of the internal nodes
in the BDD depicted in (1) below, where parents precede their children. It is
topological by ordering a node by its label,i:N, and total by secondly ordering
on a node’s identifier,id :N. This identifier only needs to be unique on each
level as nodes are still uniquely identifiable by the combination of their label and
identifier.
(i1,id1)<(i2,id 2)i1< i2(i1=i2id 1<id2) (1)
We write the unique identifier (i, id ) : N×Nfor a node as xi,id .
BDD nodes do not contain an explicit pointer to their children but instead
the children’s unique identifier. Following the same notion, leaf values are stored
directly in the leaf’s parents. This makes a node a triple (uid,low ,high ) where
uid :N×Nis its unique identifier and low and high : (N×N)+Bare its children.
The ordering in (1) is lifted to compare the uids of two nodes, and so a BDD is
represented by a file with BDD nodes in sorted order. For example, the BDDs
in Fig. 1would be represented as the lists depicted in Fig. 2.
The Apply algorithm in [6] produces an unreduced OBDD, which is turned
into an ROBDD with Reduce. The original algorithms of Arge solely work on a
node-based representation. Arge briefly notes that with an arc-based represen-
tation, the Apply algorithm is able to output its arcs in the order needed by the
following Reduce, and vice versa. Here, an arc is a triple (source,is high,target)
(written as source is high
target) where source :N×N,is high :B, and tar-
get : (N×N)+ B, i.e. source and target contain the level and identifier of internal
nodes. We have further pursued this idea of an arc-based representation and can
conclude that the algorithms indeed become simpler and more efficient with an
arc-based output from Apply. On the other hand, we see no such benefit over
the more compact node-based representation in the case of Reduce. Hence as
is depicted in Fig. 3, our algorithms work in tandem by cycling between the
node-based and arc-based representation.
1a:[(x2,0,,>) ]
1b:[(x0,0,, x1,0) , (x1,0,,>) ]
1c:[(x0,0, x1,0, x1,1),(x1,0,,>),(x1,1,>,) ]
1d:[(x1,0, x2,0,>) , (x2,0,,>) ]
Fig. 2: In-order representation of BDDs of Fig. 1
Adiar: Binary Decision Diagrams in External Memory 299
Apply Reduce
fnodes
gnodes
internal arcs
fgarcs
leaf arcs
fgnodes
Fig. 3: The Apply–Reduce pipeline of our proposed algorithms
x0,0
(x2,0, x0,0)
x1,0
(x2,0, x1,0)
x2,0
(x2,0,)
x2,1
(x2,0,>)
>
(a) Semi-transposed graph. (pairs indicate
nodes in Fig. 1a and 1b, respectively)
internal arcs leaf arcs
[x0,0
>
x1,0, [ x2,0
>,
x1,0
x2,0,x2,0
>
,
x0,0
x2,0,x2,1
>,
x1,0
>
x2,1]x2,1
>
>]
(b) In-order arc-based representation.
Fig. 4: Unreduced output of Apply when computing x2(x0x1)
Notice that our Apply outputs two files containing arcs: arcs to internal
nodes (blue) and arcs to leaves (red). Internal arcs are output at the time their
targets are processed, and since nodes are processed in ascending order, internal
arcs end up being sorted with respect to the unique identifier of their target.
This groups all in-going arcs to the same node together and effectively reverses
internal arcs. Arcs to leaves, on the other hand, are output when their source is
processed, which groups all out-going arcs to leaves together. These two outputs
of Apply represent a semi-transposed graph, which is exactly of the form needed
by the following Reduce. For example, the Apply on the node-based ROBDDs in
Fig. 1a and 1b with logical implication as the operator will yield the arc-based
unreduced OBDD depicted in Fig. 4.
For simplicity, we will ignore any cases of leaf-only BDDs in our presentation
of the algorithms. They are easily extended to also deal with those cases.
3.1 Apply
Our Apply algorithm works by a single top-down sweep through the input DAGs.
Internal arcs are reversed due to this top-down nature, since an arc between two
internal nodes can first be resolved and output at the time of the arc’s target.
These arcs are placed in the file Finternal . Arcs from nodes to leaves are placed
in the file Fleaf .
The algorithm itself essentially works like the standard Apply algorithm.
Given a recursion request for a pair of input nodes vffrom fand vgfrom g,
a single node is created with label min(vf.uid .label, vg.uid .label) and recursion
requests rlow and rhigh are created for its two children. If the label of vf.uid and
300 S. C. Sølvsten et al.
1Apply(f,g,)
2Finternal [ ] ; Fleaf [ ] ; Qapp:1 ;Qapp:2
3vff. n ex t ( ) ; vgg. n e x t ( ) ; id 0 ; l a b e l undefined
4
5/I n s e r t r e q u e s t fo r ro o t (vf, vg)/
6Qapp:1 . p us h (NIL undefined
(vf.uid , v g.uid) )
7
8/P ro c es s r e q u e s t s i n t o p o l o g i c a l or d er /
9while Qapp:1 6= Qapp:2 6=do
10 (sis high
(tf, tg) , low ,high )TopOf(Qapp:1 ,Qapp :2 )
11
12 tseek i f low ,high = NIL then m in ( tf,tg)else max(tf,tg)
13 while vf.uid < tseek f. h a s n e x t ( ) do vff. n e x t ( ) od
14 while vg.uid < tseek g. h a s n e x t ( ) do vgg. n e x t ( ) od
15
16 i f low = NIL high = NIL tf6∈ {⊥,>} tg6∈ {⊥,>}
17 tf.label =tg.label tf.id 6=tg.id
18 then /Fo rw ar d i n f o r m a t i o n o f min(tf, tg)to max(tf, tg)/
19 vi f tseek =vfthen vfelse vg
20 while Qapp:1 . to p ( ) matches (tf, tg)do
21 (sis high
(tf, tg) ) Qapp:1 . p op ( )
22 Qapp:2 . p us h (sis high
(tf, tg) , v.low ,v.high )
23 od
24 else /P ro c e ss r e q u e s t (tf, tg)/
25 i d i f label 6=tseek .label then 0else i d +1
26 l a b e l tseek .label
27
28 /Fo rwa rd or o u t p u t o u t g o i n g a r cs /
29 rlow ,rhigh RequestsFor ( ( tf,tg) , vf,vg,low ,high ,)
30 ( i f rlow {⊥,>} the n Fleaf else Qapp:1 ) . p us h (xlabel,id
rlow )
31 ( i f rhigh {⊥,>} the n Fleaf else Qapp:1 ) . p us h (xlabel,id
>
rhigh )
32
33 /Ou t p u t i n g o i n g a r c s /
34 while Qapp:1 6= Qapp:1 . to p ( ) matches ( (tf, tg) ) do
35 (sis high
(tf, tg) ) Qapp:1 . p op ( )
36 i f s6= NIL then Finternal . pus h (sis high
xlabel,id )
37 od
38 while Qapp:1 6= Qapp:2 . to p ( ) matches ( (tf, tg) , , ) do
39 (sis high
(tf, tg) , , ) Qapp :2 . p op ( )
40 i f s6= NIL then Finternal . pus h (sis high
xlabel,id )
41 od
42 od
43 return Finternal ,Fleaf
Fig. 5: The Apply algorithm
Adiar: Binary Decision Diagrams in External Memory 301
vg.uid are equal, then rlow = (vf.low, vg.low ) and rhigh = (vf.high , vg.high ).
Otherwise, rlow , resp. rhigh , contains the uid of the low child, resp. the high
child, of min(vf, vg), whereas max(vf.uid, vg.uid) is kept as is.
The pseudocode for the Apply procedure is shown in Fig. 5, where the Re-
questsFor function computes rlow and rhigh for the pair of nodes (tf, tg). The
goal of the rest of the algorithm is to obtain the information that RequestsFor
needs in an I/O-efficient way. To this end, the two priority queues Qapp:1 and
Qapp:2 are used to synchronise recursion requests for a pair of nodes (tf, tg) with
the sequential order of reading nodes in fand g.Qapp:1 has elements of the
form (sis high
(tf, tg)) and Qapp :2 has elements (sis high
(tf, tg),low ,high). The
boolean is high and the unique identifer s, being the request’s origin, are used
on lines 33 41, to output all ingoing arcs when the request is resolved.
Elements in Qapp:1 are sorted in ascending order by min(tf, tg), i.e. the node
encountered first from fand g. Requests to the same (tf, tg) are grouped together
by secondarily sorting the tuple lexicographically. Qapp:2 is sorted in ascending
order by max(tf, tg), i.e. the second of the two to be visited, and ties are again
broken lexicographically. This second priority queue is used in the case where
tf.label =tg.label but tf.id 6=tg.id , i.e. when both are needed to resolve the
request but they are not necessarily available at the same time. To this end, the
given request is moved from Qapp:1 into Qapp:2 on lines 19 23. Here, the request
is extended with the unique identifiers low and high of min(vf, vg), which makes
the children of min(vf, vg) available at max(vf, vg).
The next request to process from Qapp :1 or Qapp:2 is dictated by the TopOf
function on line 10. In the case that both Qapp :1 and Qapp:2 are non-empty, let
r1= (s1
is high1
(tf:1, tg:1 )) be the top element of Qapp :1 and let the top element
of Qapp:2 be r2= (s2
is high2
(tf:2, tg:2 ),low ,high ). TopOf(Qapp:1, Qapp :2 ) re-
turns (r1,Nil,Nil) if min(tf:1, tg:1 )<max(tf:2 , tg:2) and r2otherwise. If either
one is empty, then it equivalently outputs the top request of the other.
The arc-based output greatly simplifies the algorithm compared to the orig-
inal proposal of Arge in [6]. Our algorithm only uses two priority queues rather
than four. Arge’s algorithm, like ours, resolves a node before its children, but due
to the node-based output it has to output this entire node before its children.
Hence, it has to identify all nodes by the tuple (tf, tg), doubling the space used.
Instead, the arc-based output allows us to output the information at the time of
the children and hence we are able to generate the label and its new identifier for
both parent and child. Arge’s algorithm also did not forwarded a request’s source
s, so repeated requests to the same pair of nodes were merely discarded upon
retrieval from the priority queue, since they carried no relevant information. Our
arc-based output, on the other hand, makes every element placed in the priority
queue forward the source s, vital for the creation of the semi-transposed graph.
Proposition 1 (Following Arge 1996 [6]). The Apply algorithm in Fig. 5has
I/O complexity O(sort(Nf·Ng)) and O((Nf·Ng)·log(Nf·Ng)) time complexity,
where Nfand Ngare the respective sizes of the BDDs for fand g.
See the full paper [33] for the proof.
302 S. C. Sølvsten et al.
Pruning by shortcutting the operator The Apply procedure above, like
Arge’s original algorithm, follows recursion requests until a pair of leaves is met.
Yet, for example in Fig. 4the node for the request (x2,0,>) is unnecessary to
resolve, since all leaves of this subgraph trivially will be >due to the implication
operator. The subsequent Reduce will remove this node and its children in favour
of the >leaf. Hence, the RequestsFor function can instead immediately create a
request for the leaf. We implemented this in Adiar, since it considerably decreases
the size of Qapp:1 ,Qapp:2 , and of the output.
3.2 Reduce
Our Reduce algorithm in Fig. 6works like other explicit variants with a single
bottom-up sweep through the OBDD. Since the nodes are resolved and output
in a bottom-up descending order, the output is exactly in the reverse order as
it is needed for any following Apply. We have so far ignored this detail, but the
only change necessary to the Apply algorithm in Section 3.1 is for it to read the
list of nodes of fand gin reverse.
The priority queue Qred is used to forward the reduction result of a node v
to its parents in an I/O-efficient way. Qred contains arcs from unresolved sources
sin the given unreduced OBDD to already resolved targets t0in the ROBDD
under construction. The bottom-up traversal corresponds to resolving all nodes
in descending order. Hence, arcs sis high
t0in Qred are first sorted on sand
secondly on is high; the latter simplifies retrieving the low and high arcs on lines
8 and 9. The base-cases for the Reduce algorithm are the arcs to leaves in Fleaf ,
which follow the exact same ordering. Hence, on lines 8 and 9, arcs in Qred
and Fleaf are merged using the PopMax function that retrieves the arc that is
maximal with respect to this ordering.
Since nodes are resolved in descending order, Finternal follows this ordering
on the arc’s target when elements are read in reverse. The reversal of arcs in
Finternal makes the parents of a node v, to which the reduction result is to be
forwarded, readily available on lines 26 32.
The algorithm otherwise proceeds similarly to the standard Reduce algorithm
[10]. For each level j, all nodes vof that level are created from their high and
low arcs, ehigh and elow , taken out of Qred and Fleaf . The nodes are split into the
two temporary files Fj:1 and Fj:2 that contain the mapping [uid 7→ uid0] from a
node in the given unreduced OBDD to its equivalent node in the output. Fj:1
contains the nodes vremoved due to the first reduction rule and is populated
on lines 7 12: if both children of vare the same then [v.uid 7→ v.low ] is pushed
to this file. Fj:2 contains the mappings for the second rule and is populated on
lines 15 24. Nodes not placed in Fj:1 are placed in an intermediate file Fj
and sorted by their children. This makes duplicate nodes immediate successors.
Every unique node encountered in Fjis output to Fout before mapping itself
and all its duplicates to it in Fj:2. Since nodes are output out-of-order compared
to the input and it is unknown how many will be output for said level, they
are given new decreasing identifiers starting from the maximal possible value
MAX ID. Finally, Fj:2 is sorted back in order of Finternal to forward the results
Adiar: Binary Decision Diagrams in External Memory 303
1Reduce(Finternal ,Fleaf )
2Fout [ ] ; Qred
3while Qred 6=do
4 j Qred . to p ( ) . s o u r c e . l a b e l ; i d MAX ID ;
5Fj[ ] ; Fj:1 [ ] ; Fj:2 [ ]
6
7while Qred . t o p ( ) . s o u r c e . l a b e l = j do
8ehigh PopMax(Qred ,Fleaf )
9elow PopMax(Qred ,Fleaf )
10 i f ehigh . t a r g e t = elow . target
11 then Fj:1 . p ush ( [ elow . source 7→ elow . t a r g e t ] )
12 else Fj. p us h ( ( elow . s o u rce , elow . t a r g e t , ehigh . target ))
13 od
14
15 s o r t vFjby v. lo w an d s e c o n d l y b y v. high
16 v0undefined
17 for each vFjdo
18 i f v0is undefined or v. lo w 6=v0. low or v. high 6=v0. high
19 then
20 i d i d 1
21 v0(xj,id ,v. low , v. hi gh )
22 Fout . p ush ( v)
23 Fj:2 . p ush ( [ v. ui d 7→ v0. u i d ] )
24 od
25
26 s o r t [ uid 7→ uid 0]Fj:2 by uid i n d e s c e n d i n g o r d e r
27 for each [uid 7→ uid 0]MergeMaxUid (Fj:1 ,Fj:2 )do
28 while a r c s f ro m Finternal . p ee k ( ) matches uid do
29 (sis high
uid )Finternal . ne x t ( )
30 Qred . pu sh ( sis high
uid 0)
31 od
32 od
33 od
34 return Fout
Fig. 6: The Reduce algorithm
in both Fj:1 and Fj:2 to their parents on lines 26 32. Here, MergeMaxUid
merges the mappings [uid 7→ uid0] in Fj:1 and Fj:2 by always taking the mapping
with the largest uid from either file.
Since the original algorithm of Arge in [6] takes a node-based OBDD as
an input and internally uses node-based auxiliary data structures, his Reduce
algorithm had to create two copies of the input to reverse all internal arcs: a
copy sorted by the nodes’ low child and one sorted by their high children. Since
Finternal already has its arcs reversed, our design eliminates two expensive sorting
steps and more than halves the memory used.
304 S. C. Sølvsten et al.
Another consequence of Arge’s node-based representation is that his algo-
rithm had to move all arcs to leaves into Qred rather than merging requests from
Qred with the base-cases from Fleaf . The semi-transposed input allows us to de-
crease the number of I/Os due to Qred by Θ(sort(N`)) where N`are the number
of arcs to leaves (see [33] for the proof). In practice, together with pruning the
recursion during Apply, this can provide up to a factor 2 speedup [33].
Proposition 2 (Following Arge 1996 [6]). The Reduce algorithm in Fig. 6
has an O(sort(N)) I/O complexity and an O(Nlog N)time complexity.
See the full paper [33] for the proof. Arge proved in [6] that this O(sort(N))
I/O complexity is optimal for the input, assuming a levelwise ordering of nodes.
3.3 Other BDD Algorithms
By applying the above algorithmic techniques, one can obtain all other singly-
recursive BDD algorithms; see [33] for the details. We now design asymptotically
better variants of Negation and Equality Checking than what is possible by
deriving them using Apply.
Negation A BDD is negated by inverting the value in its nodes’ leaf children.
This is an O(1) I/O-operation if a negation flag is used to mark whether the
nodes should be negated on-the-fly as they are read from the stream.
Proposition 3. Negation has I/O, space, and time complexity O(1).
This is an improvement over the O(sort(N)) I/Os spent by Apply to compute
f>, where is exclusive or. Furthermore, disk space is shared between BDDs.
Equality Checking To check for fgone has to check the DAG of fbeing
isomorphic to the one for g[10]. This makes fand gtrivially inequivalent when
the number of nodes, number of levels, or the label or size of each of the Llevels
do not match. This can be checked in O(1) and O(L/B) I/Os if the Reduce
algorithm in Fig. 6is made to also output the relevant meta-information.
If fg, the isomorphism relates the roots of the BDDs for fand g. For
any node vfof fand vgof g, if (vf, vg) is uniquely related by the isomorphism,
then so should (vf.low, vg.low ) and (vf.high , vg.high ). Hence, one can check for
equality by traversing the product of both BDDs (as in Apply) and check for
one of the following two conditions being violated.
The children of the given recursion request (tf, tg) should either both be the
same leaf or an internal node with the same label.
On level i, exactly Niunique recursion requests should be retrieved from the
priority queues, where Niare the number of nodes on level i.
If the first condition is never violated, it is guaranteed that fg, and so >
is output. The second ensures that the algorithm terminates earlier on negative
cases and lowers the provable complexity bound; see [33] for the proof.
Adiar: Binary Decision Diagrams in External Memory 305
Proposition 4. Equality Checking has I/O complexity O(sort(N)) and time
complexity O(Nlog N), where N= min(Nf, Ng)is the minimum of the respec-
tive sizes of the BDDs for fand g.
If (1) on page 5is extended such that ,>succeed all unique identifiers and
<>, then Fig. 6actually enforces a much stricter ordering; it outputs nodes
in an order purely based on their label and the unique identifier of their children.
Proposition 5. If Gfand Ggare outputs of Reduce in Fig. 6, then fgif
and only if the ith nodes of Gfand Ggmatch numerically.
See the full paper [33] for the proof. The negation operation breaks this property
by changing the leaf values without changing their order. So, in the case where f
or g, but not both, have their negation flag set, one still has to use the O(sort(N))
algorithm above, but otherwise a simple linear scan of both BDDs suffices.
Corollary 1. If the negation flag of the BDDs for fand gare equal, then
Equality Checking can be done in 2·N/B I/Os and O(N)time, where N=
min(Nf, Ng)is the minimum of the respective sizes of the BDDs for fand g.
Both Proposition 4and Corollary 1are an asymptotic improvement on the
O(sort(N2)) equality checking algorithm by computing fgwith Apply and
Reduce and then test whether the output is the >leaf.
4 Adiar: An Implementation
The algorithms and data structures described in Section 3have been imple-
mented in a new BDD package, named Adiar1,2. The most important opera-
tions are shown in Table 1. Interaction with the BDD package is done through
C++ programs that include the <adiar/adiar.h> header file and are built and
linked with CMake. Its two dependencies are the Boost library and the TPIE
library; the latter is included as a submodule of the Adiar repository, leaving it
to CMake to build TPIE and link it to Adiar.
Adiar is initialised with the adiar init(memory, temp dir) function, where
memory is the memory (in bytes) dedicated to Adiar and temp dir is the directory
where temporary files will be placed, e.g. a dedicated harddisk. The BDD package
is deinitialised by calling the adiar deinit() function.
The bdd object in Adiar is a container for the underlying files for each BDD,
while a bdd object is used for possibly unreduced arc-based OBDDs. Reference
counting on the underlying files is used to reuse the same files and to immediately
delete them when the reference count decrements to 0. Files are deleted as early
as possible by use of implicit conversions between the bdd and bdd objects and
an overloaded assignment operator, making the concurrently occupied space on
disk minimal.
1adiar hportuguese i(verb) : to defer, to postpone
2Source code is publicly available at github.com/ssoelvsten/adiar
306 S. C. Sølvsten et al.
Adiar function Operation I/O complexity Justification
bdd apply(f,g,)fg O(sort(NfNg)) Prop. 1,2
bdd ite(f,g,h)f?g:h O(sort(NfNgNh)) [33], Prop. 2
bdd restrict(f,i,v)f|xi=vO(sort(Nf)) [33], Prop. 2
bdd exists(f,i)v:f|xi=vO(sort(N2
f)) [33], Prop. 2
bdd forall(f,i)v:f|xi=vO(sort(N2
f)) [33], Prop. 2
bdd not(f)¬f O(1) Prop. 3
bdd satcount(f)#x:f(x)O(sort(Nf)) [33]
bdd nodecount(f)NfO(1) Section 3.3
f== g f g O(sort(min(Nf, Ng))) Prop. 4
Table 1: Some of the operations supported by Adiar and their I/O-complexity.
5 Experimental Evaluation
While time-forwarding may be an asymptotic improvement over the recursive
approach in the I/O-model, its usability in practice is another question entirely.
We have compared Adiar 1.0.1 to the recursive BDD packages CUDD 3.0.0 [34]
and Sylvan 1.5.0 [16] (in single-core mode). We constructed BDDs for some
benchmarks in all tools in a similar manner, ensuring the same variable ordering.
The experimental results3were obtained on server nodes of the Grendel clus-
ter at the Centre for Scientific Computing Aarhus. Each node has two 48-core 3.0
GHz Intel Xeon Gold 6248R processors, 384 GiB of RAM, 3.5 TiB of available
SSD disk, run CentOS Linux, and compile code with GCC 10.1.0. We report the
minimum measured running time, since it minimises any error caused by the
CPU, memory and disk [13]; using the average or median does not significantly
change any of our results. For comparability all compute nodes are set to use
350 GiB of the available RAM, while each BDD package is given 300 GiB of it.
Sylvan was set to not use any parallelisation, given a ratio between the node
table and the cache of 64:1 and set to start its data structures 212 times smaller
than the final 262 GiB it may occupy, i.e. at first with a table and cache that
occupies 66 MiB. The size of the CUDD cache was set such it would have the
same node table to cache ratio when reaching 300 GiB.
5.1 Queens
The solution to the Queens problem is the number of arrangements of Nqueens
on an N×Nboard, such that no queen is threatened by another. Our bench-
mark follows the description in [22]: the variable xij represents whether a queen
is placed on the ith row and the jth column and the solution to the prob-
lem then corresponds to the number of satisfying assignments to the formula
VN1
i=0 WN1
j=0 (xij ¬has threat (i, j)), where has threat(i, j ) is true, if a queen is
placed on a tile (k, l), that would be in conflict with a queen placed on (i, j).
3Available at Zenodo [32] and at github.com/ssoelvsten/bdd-benchmark
Adiar: Binary Decision Diagrams in External Memory 307
12 13 14 15 16 17
101
102
103
104
105
N
s
12 13 14 15 16 17
0.5
1
1.5
2
N
µs / BDD node
Adiar CUDD Sylvan
Fig. 7: Running time solving N-Queens (lower is better).
The ROBDD of the innermost conjunction can be directly constructed, without
any BDD operations.
The current version of Adiar is implemented purely using external memory
algorithms. These perform poorly when given small amounts of data. Hence, it
is not meaningful to compare performance for N < 12 where the BDDs involved
are 23.5 MiB or smaller. For N12, Fig. 7shows how the gap in running time
between Adiar and other BDD packages shrinks as instances grow. At N= 15,
which is the largest instance solved by Sylvan and CUDD, Adiar is 1.47 times
slower than CUDD and 2.15 times slower than Sylvan.
The largest instance solved by Adiar is N= 17 where the largest BDD
constructed is 719 GiB in size. In contrast, Sylvan only constructed a 12.9 GiB
sized BDD for N= 15. Even though Adiar has to use disk, it only becomes
1.8 times slower per processed node compared to its highest performance at
N= 13. Conversely, Adiar is able to solve the N= 15 problem with much less
main memory than both Sylvan and CUDD. Fig. 8shows the running time on
the same machine with its memory, including its file system cache, limited with
cgroups to be 1 GiB more than given to the BDD package. Yet, Adiar is only 1.39
times slower when decreasing its memory down to 2 GiB, while Sylvan cannot
function with less than 56 GiB of memory available.
0 50 100 150 200 250
0
1,000
2,000
3,000
4,000
5,000
6,000
Memory (GiB)
s
Adiar
CUDD
Sylvan
Fig. 8: Running time of 15-Queens with variable memory (lower is better).
308 S. C. Sølvsten et al.
We also ran experiments on counting the number of draw positions in a 3D-
version of Tic-Tac-Toe, derived from [22]. Our results [33] paint a similar picture:
Adiar is only 2.50 times slower than Sylvan for Sylvan’s largest solved instance;
Sylvan only creates BDDs of up to 34.4 GiB in size, whereas Adiar constructs
a 902 GiB sized BDD; Adiar only slows down by a factor of 2.49 per processed
node when using the disk extensively to solve the larger instances.
5.2 Combinatorial Circuit Verification
The EPFL Combinational Benchmark Suite [2] consists of 23 combinatorial cir-
cuits designed for logic optimisation and synthesis. 20 of these are split into the
two categories random/control and arithmetic, and each of these original cir-
cuits Cois distributed together with one circuit optimised for size Csand one
circuit optimised for depth Cd. The last three are the More than a Million Gates
benchmarks, which we will ignore as they come without optimised versions.
Based on the approach of the Nanotrav program as distributed with CUDD,
we verify the functional equivalence between each output gate of Coand the
corresponding gate in each optimised circuits Cd, and Cs. The BDDs are com-
puted by representing every input gate by a decision variable, and computing the
BDD of all other gates from the BDDs of their input wires. Finally, the BDDs
for every pair of corresponding output gates are tested for equality. Memoisation
ensures that the same gate is not computed twice, while a reference counter is
maintained for each gate such that dead BDDs in the memoisation table may
be garbage collected. Recall that Adiar stores each BDD in a separate file, while
Sylvan and CUDD share nodes between different BDDs in a forest.
Table 2shows the number of verified instances with each BDD package within
a 15 days time limit. Adiar is able to verify three more benchmarks than both
other BDD packages. This is despite the fact that most instances include hun-
dreds of concurrent BDDs, while the disk is only 12 times larger than main
memory. For example, the largest verified benchmark, mem ctrl, has up to 1231
BDDs existing at the same time.
Table 3shows the time it took Adiar to verify equality between the original
and each of the optimised circuits, for the three largest cases verified. The table
also shows the sum of the sizes of the output BDDs that represent each circuit.
Throughout all solved benchmarks, equality checking took less than 1.47% of
the total construction time and the O(N/B) algorithm could be used in 71.6%
of all BDD comparisons. The voter benchmark with its single output shows that
# solved # out-of-space # time-out
Adiar 23 6 11
CUDD 20 19 1
Sylvan 20 13 7
Table 2: Number of verified arithmetic and random/control circuits from [2]
Adiar: Binary Decision Diagrams in External Memory 309
depth size
Time (s) 5862 5868
O(sort(N)) 496 476
O(N/B) 735 755
N(MiB) 614313
(a) mem ctrl
depth size
Time (s) 3.89 3.27
O(sort(N)) 22 22
O(N/B) 3 3
N(MiB) 3589
(b) sin
depth size
Time (s) 0.058 0.006
O(sort(N)) 1 0
O(N/B) 0 1
N(MiB) 5.74
(c) voter
Table 3: Running time for equivalence testing. O(sort(N)) and O(N/B) is the
number of times the respective algorithm in Section 3.3 was used.
the O(N/B) algorithm is about 10 times faster than the O(sort(N)) algorithm
and can compare at least 2 ·5.75 MiB/0.006 s = 1.86 GiB/s.
6 Conclusions and Future Work
Adiar provides an I/O-efficient implementation of BDDs. The iterative BDD
algorithms exploit a topological ordering of the BDD nodes in external memory,
by use of priority queues and sorting algorithms. All recursion requests for a
single node are processed together, eliminating the need for a memoisation table.
The performance of Adiar is very promising in practice for instances larger
than a few hundred MiB. As the size of the BDDs increase, the performance of
Adiar gets closer to conventional recursive BDD implementations for BDDs
larger than a few GiB the use of Adiar has at most resulted in a 3.69 factor
slowdown. Simultaneously, the design of our algorithms allow us to compute on
BDDs that outgrow main memory with only a 2.49 factor slowdown, which is
negligible compared to use of swap memory with conventional BDD packages.
This performance comes at the cost of Adiar not being able to share nodes
between BDDs. Yet, this increase in space usage is not a problem in practice
and it makes garbage collection a trivial and cheap deletion of files on disk. On
the other hand, the lack of sharing makes it impossible to check for functional
equivalence with a mere pointer comparison. Instead, one has to explicitly check
for the two DAGS being isomorphic. We have improved the asymptotic and
practical performance of equality checking such that it is negligible in practice.
This lays the foundation on which we intend to develop external memory ver-
sions of the BDD algorithms that are still missing for symbolic model checking.
Specifically, we intend to improve the performance of quantifying multiple vari-
ables and designing a relational product operation. Furthermore, we will improve
performance for small instances that fit entirely into internal memory.
Acknowledgements
Thanks to the late Lars Arge, to Gerth S. Brodal, and to Mathias Rav for
their inputs. Furthermore, thanks to the Centre for Scientific Computing Aarhus
(phys.au.dk/forskning/cscaa/) for running our experiments on their cluster.
310 S. C. Sølvsten et al.
References
1. Aggarwal, A., Vitter, Jeffrey, S.: The input/output complexity of sorting
and related problems. Communications of the ACM 31(9), 1116–1127 (1988).
https://doi.org/10.1145/48529.48535
2. Amar´u, L., Gaillardon, P.E., De Micheli, G.: The EPFL combinational benchmark
suite. In: 24th International Workshop on Logic & Synthesis (2015)
3. Amparore, E., Donatelli, S., Gall`a, F.: A CTL* model checker for Petri nets. In:
Application and Theory of Petri Nets and Concurrency. Lecture Notes in Computer
Science, vol. 12152, pp. 403–413. Springer (2020). https://doi.org/10.1007/978-3-
030-51831-8 21
4. Arge, L.: The buffer tree: A new technique for optimal I/O-algorithms. In:
Workshop on Algorithms and Data Structures (WADS). Lecture Notes in
Computer Science, vol. 955, pp. 334–345. Springer, Berlin, Heidelberg (1995).
https://doi.org/10.1007/3-540-60220-8 74
5. Arge, L.: The I/O-complexity of ordered binary-decision diagram manipu-
lation. In: 6th International Symposium on Algorithms and Computations
(ISAAC). Lecture Notes in Computer Science, vol. 1004, pp. 82–91 (1995).
https://doi.org/10.1007/BFb0015411
6. Arge, L.: The I/O-complexity of ordered binary-decision diagram. In: BRICS RS
preprint series. vol. 29. Department of Computer Science, University of Aarhus
(1996). https://doi.org/10.7146/brics.v3i29.20010
7. Arge, L.: External geometric data structures. In: 10th International Computing
and Combinatorics Conference (COCOON). Lecture Notes in Computer Science,
vol. 3106 (2004). https://doi.org/10.1007/978-3-540-27798-9 1
8. Ashar, P., Cheong, M.: Efficient breadth-first manipulation of binary de-
cision diagrams. In: IEEE/ACM International Conference on Computer-
Aided Design (ICCAD). pp. 622–627. IEEE Computer Society Press (1994).
https://doi.org/10.1109/ICCAD.1994.629886
9. Brace, K.S., Rudell, R.L., Bryant, R.E.: Efficient implementation of a BDD pack-
age. In: 27th Design Automation Conference (DAC). pp. 40–45. Association for
Computing Machinery (1990). https://doi.org/10.1109/DAC.1990.114826
10. Bryant, R.E.: Graph-based algorithms for Boolean function manip-
ulation. IEEE Transactions on Computers C-35(8), 677–691 (1986).
https://doi.org/10.1109/TC.1986.1676819
11. Bryant, R.E.: Symbolic Boolean manipulation with ordered binary-
decision diagrams. ACM Computing Surveys 24(3), 293–318 (1992).
https://doi.org/10.1145/136035.136043
12. Bryant, R.E.: Binary decision diagrams. In: Clarke, E.M., Henzinger, T.A., Veith,
H., Bloem, R. (eds.) Handbook of Model Checking, pp. 191–217. Springer Inter-
national Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8
13. Chen, J., Revels, J.: Robust benchmarking in noisy environments. arXiv (2016),
https://arxiv.org/abs/1608.04295
14. Cimatti, A., Clarke, E., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M.,
Sebastiani, R., Tacchella, A.: NuSMV 2: An opensource tool for symbolic model
checking. In: International Conference on Computer Aided Verification (CAV).
Lecture Notes in Computer Science, vol. 2404, pp. 359–364. Springer, Berlin, Hei-
delberg (2002). https://doi.org/10.1007/3-540-45657-0 29
15. Cimatti, A., Clarke, E., Giunchiglia, F., Roveri, M.: NuSMV: a new symbolic model
checker. International Journal on Software Tools for Technology Transfer 2, 410–
425 (2000). https://doi.org/10.1007/s100090050046
Adiar: Binary Decision Diagrams in External Memory 311
16. Van Dijk, T., Van de Pol, J.: Sylvan: multi-core framework for decision diagrams.
International Journal on Software Tools for Technology Transfer 19, 675–696
(2016). https://doi.org/10.1007/s10009-016-0433-2
17. Gammie, P., Van der Meyden, R.: MCK: Model checking the logic of knowledge.
In: Computer Aided Verification. Lecture Notes in Computer Science, vol. 3114,
pp. 479–483. Springer, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3-
540-27813-9 41
18. He, L., Liu, G.: Petri net based symbolic model checking for computation tree logic
of knowledge. arXiv (2020), https://arxiv.org/abs/2012.10126
19. Kant, G., Laarman, A., Meijer, J., Van de Pol, J., Blom, S., Van Dijk, T.:
LTSmin: High-performance language-independent model checking. In: Tools and
Algorithms for the Construction and Analysis of Systems (TACAS). Lecture Notes
in Computer Science, vol. 9035, pp. 692–707. Springer, Berlin, Heidelberg (2015).
https://doi.org/10.1007/978-3-662-46681-0 61
20. Karplus, K.: Representing Boolean functions with if-then-else DAGs. Tech. rep.,
University of California at Santa Cruz, USA (1988)
21. Klarlund, N., Rauhe, T.: BDD algorithms and cache misses. In: BRICS Report
Series. vol. 26 (1996). https://doi.org/10.7146/brics.v3i26.20007
22. Kunkle, D., Slavici, V., Cooperman, G.: Parallel disk-based computa-
tion for large, monolithic binary decision diagrams. In: 4th International
Workshop on Parallel Symbolic Computation (PASCO). pp. 63–72 (2010).
https://doi.org/10.1145/1837210.1837222
23. Lind-Nielsen, J.: BuDDy: A binary decision diagram package. Tech. rep., Depart-
ment of Information Technology, Technical University of Denmark (1999)
24. Lomuscio, A., Qu, H., Raimondi, F.: MCMAS: an open-source model checker for
the verification of multi-agent systems. International Journal on Software Tools for
Technology Transfer 19, 9–30 (2017). https://doi.org/10.1007/s10009-015-0378-x
25. Long, D.E.: The design of a cache-friendly BDD library. In: Proceedings of the
1998 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
pp. 639–645. Association for Computing Machinery (1998)
26. Minato, S.i., Ishihara, S.: Streaming BDD manipulation for large-scale combinato-
rial problems. In: Design, Automation and Test in Europe Conference and Exhi-
bition. pp. 702–707 (2001). https://doi.org/10.1109/DATE.2001.915104
27. Minato, S.i., Ishiura, N., Yajima, S.: Shared binary decision diagram with at-
tributed edges for efficient Boolean function manipulation. In: 27th Design Au-
tomation Conference (DAC). pp. 52–57. Association for Computing Machinery
(1990). https://doi.org/10.1145/123186.123225
28. Ochi, H., Yasuoka, K., Yajima, S.: Breadth-first manipulation of very
large binary-decision diagrams. In: International Conference on Computer
Aided Design (ICCAD). pp. 48–55. IEEE Computer Society Press (1993).
https://doi.org/10.1109/ICCAD.1993.580030
29. Petersen, L.H.: External Priority Queues in Practice. Master’s thesis, Department
of Computer Science, University of Aarhus (2007)
30. Sanders, P.: Fast priority queues for cached memory. ACM Journal of Experimental
Algorithmics 5, 7–32 (2000). https://doi.org/10.1145/351827.384249
31. Sanghavi, J.V., Ranjan, R.K., Brayton, R.K., Sangiovanni-Vincentelli, A.: High
performance BDD package by exploiting memory hierarchy. In: 33rd Design Au-
tomation Conference (DAC). pp. 635–640. Association for Computing Machinery
(1996). https://doi.org/10.1145/240518.240638
32. Sølvsten, S.C., Van de Pol, J.: Adiar v1.0.1 : TACAS 2022 artifact. Zenodo (2021).
https://doi.org/10.5281/zenodo.5638335
312 S. C. Sølvsten et al.
33. Sølvsten, S.C., Van de Pol, J., Jakobsen, A.B., Thomasen, M.W.B.: Efficient binary
decision diagram manipulation in external memory. arXiv (2021), https://arxiv.
org/abs/2104.12101
34. Somenzi, F.: CUDD: CU decision diagram package, 3.0. Tech. rep., University of
Colorado at Boulder (2015)
35. Velev, M.N., Gao, P.: Efficient parallel GPU algorithms for BDD ma-
nipulation. In: 19th Asia and South Pacific Design Automation Con-
ference (ASP-DAC). pp. 750–755. IEEE Computer Society Press (2014).
https://doi.org/10.1109/ASPDAC.2014.6742980
36. Vengroff, D.E.: A Transparent Parallel I/O Environment. In: DAGS Symposium
on Parallel Computation. pp. 117–134 (1994)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Adiar: Binary Decision Diagrams in External Memory 313
Forest GUMP: A Tool for Explanation
Alnis Murtovi(), Alexander Bainczyk, and Bernhard Steffen
Chair for Programming Systems, TU Dortmund University, Dortmund, Germany
{alnis.murtovi,alexander.bainczyk,bernhard.steffen}@tu-dortmund.de
Abstract. In this paper, we present Forest GUMP (for Generalized,
Unifying Merge Process) a tool for providing tangible experience with
three concepts of explanation. Besides the well-known model explanation
and outcome explanation, Forest GUMP also supports class character-
ization, i.e., the precise characterization of all samples with the same
classification. Key technology to achieve these results is algebraic aggre-
gation, i.e., the transformation of a Random Forest into a semantically
equivalent, concise white-box representation in terms of Algebraic Deci-
sion Diagrams (ADDs). The paper sketches the method and illustrates
the use of Forest GUMP along an illustrative example taken from the
literature. This way readers should acquire an intuition about the tool,
and the way how it should be used to increase the understanding not
only of the considered dataset, but also of the character of Random
Forests and the ADD technology, here enriched to comprise infeasible
path elimination.
Keywords: Random Forest, Binary/Algebraic Decision Diagram, Ag-
gregation, Infeasible Paths, Explainability, Random Seed
1 Introduction
Random Forests are one of the most widely known classifiers in machine learn-
ing [3,17]. The method is easy to understand and implement, and at the same
time achieves impressive classification accuracies in many applications. Com-
pared to other methods, Random Forests are fast to train and they are clearly
more suitable for smaller datasets. In contrast to a single decision tree, Random
Forests, a collection of many trees, do not overfit as easily on a dataset and their
variance decreases with their size. On the other hand, Random Forests are con-
sidered black-box models because of their highly parallel nature: following the
execution of Random Forests means, in particular, following the execution in all
the involved trees. Such black-box executions are hard to explain to a human
user even for very small examples.
In contrast, decision trees are considered white-box models because of their
sequential evaluation nature. Even if a tree is large in size, a human can easily
follow its computation step by step by evaluating (simple) decisions at each node
from the root to a leaf. Indeed, the set of decisions along such an execution path
precisely explains why a certain choice has been taken.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 314–331, 2022.
https://doi.org/10.1007/978-3-030-99527-0_17
Popular methods towards explainability try to establish some user intuition.
For example, they may hint at the most influential input data, like highlighting
or framing the area of a picture where a face has been identified. Such informa-
tion is very helpful, and it helps in particular to reveal some of the “popular”
drastic mismatches incurred by neural networks: if the framed area of the image
does not contain the “tagged” object, the identification is clearly questionable.
However, even in a correct classification, the tag by itself gives no reason why
the identification is indeed correct.
More ambitious are methods that try to turn black-box model into white-
box models, ideally preserving the semantics of the classification function. For
Random Forests this has been achieved for the first time in [10,14] using the ‘ag-
gregating power’ of Algebraic Decision Diagrams (ADDs) and Binary Decision
Diagrams (BDDs). ADDs are essentially decision trees whose leaves are labelled
with elements of some algebra, whereas BBDs are the special case for the al-
gebra of Boolean values. Lifting the algebraic operations from the leaves to the
entire ADDs/BDDs allows one to aggregate entire Random Forests into single
semantically equivalent ADDs, the precondition for solving three explainability
problems:
The Model Explanation Problem [15], i.e. the problem of making the model
as a whole interpretable, is solved in terms of an ADD that specifies pre-
cisely the same classification function as the original Random Forest (cf.
Section 6.2).
The Class Characterization Problem, i.e. the problem, given a class c, char-
acterizing the set of all samples that are classified by the Random Forest as
c. This problem is solved in terms of a BDD which precisely characterizes
this set of samples (cf. Section 6.3).
The Outcome Explanation Problem [15], i.e. the problem of explaining a con-
crete classification, is solved in terms of a minimal conjunction of (negated)
decisions that are sufficient to guide the sample into the considered class (cf.
Section 6.4).
In this paper, we present Forest GUMP (for Generalized, Unifying Merge Pro-
cess) a tool for providing a tangible experience with the described concepts of
explanation. Experimentation with Forest GUMP does not only yield semanti-
cally equivalent, concise white-box representations for a given Random Forest
which reveal characteristics of the underlying datasets, but it also allows one to
experience, e.g., the impact of random seeds on both the quality of prediction
and the size of the explaining models (cf. Section 6). Our implementation relies
on the standard Random Forest implementation in Weka [28] and on the ADD
implementation of the ADD-Lib [9,12,26]. For a more detailed description of the
transformations and a quantitative analysis we refer the reader to [10,11,14].
Related Work: Various methods for making Random Forests interpretable exist
such as extracting decision rules from the considered black-box model [6], meth-
ods that are agnostic to the black-box model under consideration [20,24] or by
deriving a single decision tree from the black-box model [5,7,16,27,29]. In this
Forest GUMP: A Tool for Explanation 315
context, single decision trees are considered key to a solution of both, the model
explanation and outcome explanation problem. State of the art solutions to de-
rive a single decision tree from a Random Forest are approximative [5,7,16,27,29].
Thus, their derived explanations are not fully faithful to the original semantics
of the considered Random Forest. This is in contrast to our ADD-based aggre-
gation, which precisely reflects the semantics of the original Random Forest.
After a short introduction to Random Forests in Section 2, we present our ap-
proach to their aggregation in Section 3which is followed by an elimination
of redundant predicates from the decision diagrams in Section 4and a non-
compositional abstraction in Section 5. Section 6introduces Forest GUMP and
solutions to the three explainability problems. In the end, we summarize the
lessons we have learned using Forest GUMP in Section 7which is followed by a
conclusion and direction to future work in Section 8.
2 Random Forests
Learning Random Forests is a quite popular, and algorithmically relatively sim-
ple classification technique that yields good results for many real-world appli-
cations. Its decision model generalises a training dataset that holds examples
of input data labelled with the desired output, also called class. As its name
suggests, an ensemble of decision trees constitutes a Random Forest. Each of
these trees is itself a classifier that was learned from a random sample of the
training dataset. Consequently, all trees are different in structure, they represent
different decision functions, and can yield different decisions for the very same
input data.
To apply a Random Forest to previously unseen input data, every decision
tree is evaluated separately: Tracing the trees from their root down to one of the
leaves yields one decision per tree, i.e. the predicted class. The overall decision
of the Random Forest is then derived as the most frequently chosen class, an
aggregation commonly referred to as majority vote. The key advantage of this
approach is, compared to single decision trees, the reduced variance. A detailed
introduction to Random Forests, decision trees, and their learning procedures
can be found in [3,17,23].
In this paper, we use Weka [28] as our reference implementation of Random
Forests. However, our approach does not depend on implementation details and
can be easily adapted to other implementations.
Figure 1shows a small Random Forests that was learned from the popular
Iris dataset [8]. The dataset lists dimensions of Iris flowers’ sepals and petals
for three different species. Using this forest to decide the species on the basis
of given measurements requires to first evaluate the three trees individually and
to subsequently determine the majority vote. This effort clearly grows linearly
with the size of the forest. In the following we use this example to illustrate our
approach of forest aggregation for explainability.
316 Murtovi et al.
Tree
1
petallength
<
2.6
Iris-setosa petalwidth
<
1.65
sepallength
<
6.05 Iris-virginica
Iris-versicolor sepalwidth
<
2.7
sepalwidth
<
2.55 petallength
<
5.0
Iris-versicolor Iris-virginica Iris-versicolor Iris-virginica
Tree
2
petallength
<
2.45
Iris-setosa petallength
<
4.95
petalwidth
<
1.65 petalwidth
<
1.7
Iris-versicolor Iris-virginica sepallength
<
6.15 Iris-virginica
Iris-versicolor Iris-virginica
Tree
3
petallength
<
2.7
Iris-setosa petalwidth
<
1.65
sepallength
<
7.05 petallength
<
4.85
petallength
<
4.95 Iris-virginica sepallength
<
5.4 Iris-virginica
Iris-versicolor sepalwidth
<
2.65
Iris-virginica Iris-versicolor
Iris-virginica Iris-versicolor
Fig. 1. Random Forest learned from the Iris dataset [8] (39 nodes).
Key idea behind our approach is to partially evaluate the Random Forests
at construction time which, in particular, eliminates redundancies between the
individual trees of a Random Forest. E.g., in our accompanying Iris flower exam-
ple (cf. Fig. 1) the predicate petalwidth < 1.65 is used in all three trees. This
can easily lead to cases where the same predicate is evaluated many times in the
classification process. The partial evaluation proposed in this paper transforms
Random Forests into decision structures where such redundancies are totally
eliminated.
An adequate data structure to achieve this goal for binary decisions are
Binary Decision Diagrams [1,4,19] (BDDs): For a given predicate ordering, they
constitute a normal form where each predicate is evaluated at most once, and
only if required to determine the final outcome.
Algebraic Decision Diagrams (ADDs) [2] generalise BDDs to capture func-
tions of the type BP Cnwhich are exactly what we need to specify the
semantics of Random Forests for a classification domain C. Moreover, in analogy
to BDDs, which inherit the algebraic structure of their co-domain B, ADDs also
inherit the algebraic structure of their co-domains if available.
We exploit this property during the partial evaluation of Random Forests by
considering the class vector co-domain (cf. Sect. 3). The aggregation to achieve
the corresponding optimised decision structures is then a straightforward conse-
quence of the used ADD technology.
3 Class Vector Aggregation
Class vectors faithfully represent the information about how many trees of the
original Random Forest voted for a certain outcome. Obviously, this informa-
tion is sufficient to obtain the precise results of a corresponding majority vote.
Formally, the domain of class vectors forms a monoid
V:= (N|C| ,+,0)
where addition + is defined component-wise and 0is the neutral element.
Forest GUMP: A Tool for Explanation 317
petallength
<
2.6
petalwidth
<
1.65
petalwidth
<
1.65
petallength
<
2.45
sepallength
<
6.05
petallength
<
2.45petallength
<
2.45
petallength
<
4.95
petallength
<
2.7
sepalwidth
<
2.7
petallength
<
2.45
petalwidth
<
1.7
petallength
<
2.7
petallength
<
4.85(Iris-setosa=2;
Iris-versicolor=0;
Iris-virginica=1)sepallength
<
6.15
petallength
<
4.85 (Iris-setosa=1;
Iris-versicolor=0;
Iris-virginica=2)
petallength
<
2.7
(Iris-setosa=0;
Iris-versicolor=0;
Iris-virginica=3)
sepallength
<
5.4
(Iris-setosa=0;
Iris-versicolor=1;
Iris-virginica=2)
petallength
<
4.85(Iris-setosa=1;
Iris-versicolor=1;
Iris-virginica=1)
sepallength
<
5.4
(Iris-setosa=0;
Iris-versicolor=2;
Iris-virginica=1)
sepallength
<
5.4
petallength
<
5.0sepalwidth
<
2.55
petallength
<
4.95 petallength
<
4.95
petallength
<
2.45
petallength
<
4.95 petallength
<
4.95
petalwidth
<
1.7
petallength
<
2.7
petallength
<
2.7petallength
<
2.7
petallength
<
2.7
sepallength
<
6.15
sepallength
<
7.05
sepallength
<
7.05
petallength
<
2.7
sepalwidth
<
2.65
sepallength
<
7.05
sepalwidth
<
2.65
sepallength
<
7.05
sepallength
<
7.05 sepalwidth
<
2.65
petalwidth
<
1.7
petallength
<
2.7
petallength
<
2.7petallength
<
2.7
sepallength
<
6.15
(Iris-setosa=1;
Iris-versicolor=2;
Iris-virginica=0)
sepallength
<
7.05
petallength
<
2.7
sepallength
<
7.05
sepalwidth
<
2.65
(Iris-setosa=0;
Iris-versicolor=3;
Iris-virginica=0)
sepallength
<
7.05 (Iris-setosa=2;
Iris-versicolor=1;
Iris-virginica=0)
sepallength
<
7.05 sepalwidth
<
2.65
petallength
<
4.95
petallength
<
2.7
petallength
<
4.95 petallength
<
4.95
petalwidth
<
1.7
petallength
<
4.85(Iris-setosa=3;
Iris-versicolor=0;
Iris-virginica=0) sepallength
<
6.15
petallength
<
2.7
petallength
<
4.85
sepallength
<
5.4
sepallength
<
5.4
petalwidth
<
1.7 petallength
<
2.7petallength
<
2.7
sepallength
<
6.15 sepallength
<
7.05
sepallength
<
7.05 sepalwidth
<
2.65
Fig. 2. Class vector aggregation of the Random Forest (83 nodes).
petallength
<
2_6
petalwidth
<
1_65 petalwidth
<
1_65
petallength
<
4_95
sepallength
<
6_05 petallength
<
2_45 petallength
<
2_45
petalwidth
<
1_7
petallength
<
2_7
sepalwidth
<
2_7
petallength
<
4_95
(Iris-setosa=0;Iris-versicolor=0;Iris-virginica=3)
sepallength
<
6_15
petallength
<
4_85(Iris-setosa=1;Iris-versicolor=0;Iris-virginica=2)
(Iris-setosa=0;Iris-versicolor=1;Iris-virginica=2)
sepallength
<
5_4
petallength
<
5_0 sepalwidth
<
2_55
sepalwidth
<
2_65 petallength
<
2_7sepallength
<
6_15
petallength
<
4_95petallength
<
4_95 petallength
<
4_95
sepallength
<
7_05 (Iris-setosa=0;Iris-versicolor=2;Iris-virginica=1)
sepallength
<
6_15 petallength
<
2_7
sepallength
<
7_05
(Iris-setosa=0;Iris-versicolor=3;Iris-virginica=0)
sepallength
<
7_05
(Iris-setosa=1;Iris-versicolor=2;Iris-virginica=0)
sepallength
<
6_15 petallength
<
2_7
sepallength
<
6_15sepallength
<
7_05 sepalwidth
<
2_65 (Iris-setosa=1;Iris-versicolor=1;Iris-virginica=1)
sepalwidth
<
2_65
(Iris-setosa=2;Iris-versicolor=0;Iris-virginica=1) (Iris-setosa=3;Iris-versicolor=0;Iris-virginica=0) (Iris-setosa=2;Iris-versicolor=1;Iris-virginica=0)
Fig. 3. Class vector aggregation of the Random Forest without semantically redundant
nodes (43 nodes).
With the compositionality of the algebraic structure Vand the corresponding
ADDs DV, we can transform any Random Forest incrementally into a semanti-
cally equivalent ADD. Starting with the empty Random Forest, i.e. the neutral
element 0, we consider one tree after the other, aggregating a growing sequence
of decision trees until the entire forest is entailed in the new decision diagram.
The details of this transformation are described in [14]. Figure 2shows the result
of this transformation for our running example.
4 Infeasible Path Elimination
When aggregating the trees of a Random Forest they all use varying sets of
predicates. In contrast to simple Boolean variables, predicates are not indepen-
dent on one another, i.e. the evaluation of one predicate may yield some degree
of knowledge about other predicates. E.g., the predicate petallength < 2.45
induces knowledge about other predicates that reason about petallength: When
the petal length is smaller than 2.45 it cannot possibly be greater or equal to
2.7 at the same time. This is not taken care of by the symbolic treatment of
predicates we followed until now. In fact, predicates are typically considered
independent in the ADD/BDD community.
Infeasible path elimination, as illustrated by the difference between Figure 2
and Figure 3for our running example, leverages the potential of a semantic
318 Murtovi et al.
treatment of predicates with significant effect on the size of the resulting ADDs.
In fact, the experiments with thousands of trees reported in [14] would not have
been successful without infeasible path elimination.
Please note that infeasible path elimination
is only required after aggregation: The trees in the original Random Forest
have no infeasible paths by construction. They are introduced in the course
of our symbolic aggregation, which is insensitive to semantic properties.
is compositional and can therefore be applied during the stepwise transfor-
mation, before the final most frequent label abstraction (cf. Sect. 5), and at
the very end.
does not support normal forms: Whereas class vector abstraction is canon-
ical for a given variable ordering, infeasible path elimination is not! Thus
our approach may yield different decision diagrams depending on the order
of tree aggregation. It is guaranteed, however, that the resulting decision
diagrams are minimal.
Infeasible path elimination is a hard problem in general.1Our corresponding
implementation uses SMT-solving [21] to eliminate all infeasible paths. An in-
depth discussion of infeasible path elimination is a topic in its own and beyond
the scope of this paper.
Class vector aggregation and infeasible path elimination are both compositional
and can therefore be applied in arbitrary order without changing the seman-
tics. The majority vote at compile time described in the next section is not
compositional and must therefore be applied at the very end.
5 Majority Vote at Compile Time
As mentioned above, maintaining the information about the result of the major-
ity votes is not compositional. In fact, knowing the result of the majority votes
for two Random Forest gives no clue about the majority vote of the combined
forest. Thus the majority vote abstraction can only be applied at the very end,
after the entire aggregation has been computed compositionally.
The result of the compositional aggregation process, including infeasible path
elimination, is a decision diagram d DVwith class vectors in its terminal nodes.
The majority vote abstraction C:DV DCcan now be defined as the lifted
version of the majority vote abstraction on class vectors vN|C| (cf. [14]):
δC(v) := arg max
c∈C
vc.
Note that δCdoes not project into the same carrier set but rather from one
algebraic structure Vinto another C. However, these transformations can be
applied to the corresponding decision diagrams in the very same way. Fig. 4
shows the result of the most frequent class abstraction for our running example.
1For the cases considered here it is polynomial, but there are of course theories for
which it becomes exponentially hard or even undecidable.
Forest GUMP: A Tool for Explanation 319
petallength
<
2.6
petalwidth
<
1.65
Iris-setosa
Iris-virginica
sepallength
<
6.05
sepalwidth
<
2.7
Iris-versicolor
petallength
<
5.0sepalwidth
<
2.55
sepallength
<
6.15
petallength
<
4.95petallength
<
4.95
petallength
<
4.95 sepallength
<
6.15
sepallength
<
7.05
sepallength
<
6.15 petallength
<
2.7
sepalwidth
<
2.65
Fig. 4. Most frequent label abstraction of the aggregated Random Forest (majority
vote) without semantically redundant nodes (18 nodes).
6 Forest GUMP and Three Problems of Explanability
Forest GUMP2(Generalized Unifying Merge Process) is a tool we developed to
illustrate the power of algebraic aggregation for the optimization and explanation
of Random Forests. It is designed to allow everyone, in particular people without
IT or machine learning knowledge, to experience the nature of Random Forests.
To avoid unnecessary entry hurdles, we decided to implement Forest GUMP as
a simple to use web application. It allows the user to experience the methods
described in the previous sections and the proposed solutions to the explanability
problems which will be illustrated in the following sections. We will first give a
brief overview of Forest GUMP and then showcase its potential in the following
sections.
Forest GUMP’s user interface (see Figure 5) is essentially divided into two
parts. On the left side the user can input the necessary data to learn a Random
Forest and subsequently visualize it while the currently chosen representation
will be visualized on the right side. First, the user has to upload a dataset or
choose one of six datasets that we provide (cf. (1) in Fig. 5) on which the Random
Forest will be learned. Next, the hyperparameters necessary for the learning
procedure have to be selected, such as the number of trees to be learned (cf.
(2) in Fig. 5). Then, one can choose different aggregation methods, i.e. the ones
2A link to a running instance of Forest GUMP is available at https://gitlab.com/
scce/forest-gump.
320 Murtovi et al.
8
76
5
4
3
2
1
Fig. 5. Overview of Forest GUMP. The visualized ADD is our solution to the class
class characterization problem (cf. Sect. 6.3) for the class Iris-Setosa.
Fig. 6. The execution history in Forest GUMP.
mentioned in the previous sections and further ones which will be explained in
the following Sections (cf. (3) in Fig. 5). It it also possible to input a sample,
classify it with the ADD and highlight the path from the root the leaf (satisfied
predicates are highlighted in green, unsatisfied predicates are highlighted in red).
In the end, the currently visualized ADD can be exported as Forest GUMP
provides code generators for Java, C++, Python and GraphViz’s dot format (cf.
(4) in Fig. 5). Additionally, the currently visualized ADD can be exported as an
SVG to be viewed locally (cf. (4) in Fig. 5).
The grey rectangle (cf. (6) in Fig. 5) points to the root of the currently
visualized ADD. One can zoom into/out which can be helpful when the ADDs
are rather large (cf. (6) in Fig. 5). On the top left the number of nodes and
the length of the currently highlighted path are displayed (cf. (7) in Fig. 5). On
the bottom right, one can open a history of all the representations one chose to
visualize (cf. (8) in Fig. 5).
Figure 6shows the expanded execution history. For each visualized ADD, the
execution history lists the aggregation variant, the hyperparameters used to learn
Forest GUMP: A Tool for Explanation 321
Fig. 7. The user can either choose to upload their own dataset or select one of six
exemplary datasets.
the Random Forest and the size (i.e. the number of nodes) and the maximum
depth which is the longest path from root to leaf. The execution history also
allows one to replay an experiment by clicking on the button on the right side of
a row which allows one to compare different ADD variants. One can also delete
the individual entries or the whole history and export the history to a CSV.
6.1 A Walkthrough of Forest GUMP
In the following we will see how hard it is to understand how a Random Forest
comes to its decision and provide methods for solving the three explainability
problems with absolute precision.
Learning a Random Forest To begin, we need a Random Forest which re-
quires a dataset on which it will be learned. In Forest GUMP, the user can
upload their own dataset in the Attribute-Relation File Format (ARFF) [28].
Alternatively, we provide six exemplary datasets from which a user can select
one to directly start using the tool. Figure 7illustrates how this looks like in
Forest GUMP. Having chosen a dataset, next, the hyperparameters necessary for
the learning procedure of the Random Forest have to be specified (see Figure 8).
The inputs are the following:
the number of trees to be learned,
the bagging size, i.e. the fraction of samples to be used to learn each tree and
aseed to be able to reproduce the setting.3
Additionally, the user can decide to eliminate the infeasible paths as this can
strongly reduce the size of the ADDs (see Section 4). While the predicate order is
fixed by default, the user can decide to let Forest GUMP optimize the predicate
order as the order can also greatly impact the size of the ADDs. A more in
3One can generate a random seed by clicking on the button next to the input field.
322 Murtovi et al.
Fig. 8. The user has to specify the necessary hyperparameters to be able to learn
a Random Forest. While the first three hyperparameters are needed for the learning
procedure, the elimination of the infeasible paths and the optimization of the predicate
order are specific to our aggregation method.
depth discussion on the interplay between the infeasible path elimination and
the predicate order will follow. Figure 9shows a Random Forest that was learned
on the Iris dataset, consisting of 20 trees4, a bagging size of 100% and 58 as the
seed. If we now want to classify a given input, for each tree we would have to
traverse from the root to the leaf and receive one predicted class per tree. The
class which was predicted most often is the final result. Trying to understand
why the Random Forest predicted this specific class is seemingly impossible. In
the following we will show how we can do better.
6.2 Model Explanation Problem
The canonical white-box model corresponding to the Random Forest of Figure 9
can be constructed through the most frequent label abstraction (see Sect. 5) of
the aggregated Random Forest (see Sect. 3), whose infeasible paths are elimi-
nated (see Sect. 4). This solves the Model Explanation Problem.
Figure 10 sketches the result of this construction: A canonical white-box
model with 310 nodes. Admittedly, this model is still frightening, but given a
sample, it allows one to easily follow the corresponding classification process, and
in this case it may require at most 19 individual decisions based on the petal
4Note that each decision tree is represented as an ADD.
Forest GUMP: A Tool for Explanation 323
Fig. 9. A Random Forest consisting of 20 individual decision trees (191 number of
nodes, longest path consists of 9 nodes). Note that each decision tree is represented as
an ADD and that all ADDs share common subfunctions, i.e. it is essentially a shared
ADD forest. The actual Random Forest, where nothing is shared, contains 284 nodes.
Iris-versicolor
Iris-setosa
petallength
<
2.6
petallength
<
4.85
Iris-virginica
petallength
<
5.25
petallength
<
5.35
petallength
<
5.25
sepalwidth
<
2.25
sepalwidth
<
2.35
sepalwidth
<
2.55
petallength
<
5.45
petallength
<
5.05
petallength
<
5.25
sepalwidth
<
2.35
petalwidth
<
0.7
petallength
<
5.05
petallength
<
5.25
sepalwidth
<
2.25
sepalwidth
<
2.35
sepallength
<
6.15
petallength
<
5.15
petalwidth
<
0.7
petallength
<
5.05
petallength
<
5.25
petalwidth
<
0.7
petallength
<
5.05
petallength
<
5.25
sepalwidth
<
2.25
sepalwidth
<
2.35
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepalwidth
<
2.55
petallength
<
5.45
sepallength
<
5.95
petallength
<
5.45 sepalwidth
<
2.75
petallength
<
5.15
petallength
<
5.05
petallength
<
5.25
petallength
<
5.05
petallength
<
5.25
sepallength
<
6.5
sepallength
<
6.05
petallength
<
5.45
sepallength
<
5.95
sepalwidth
<
2.65
petallength
<
4.95
petallength
<
2.45
sepalwidth
<
2.45
petalwidth
<
1.65
sepallength
<
4.95
petallength
<
2.45
petalwidth
<
1.55
petallength
<
4.75
petalwidth
<
1.65
sepallength
<
4.95
petalwidth
<
1.55
petallength
<
2.6
petallength
<
2.6
sepallength
<
5.95
petallength
<
4.85
sepalwidth
<
2.9
petalwidth
<
1.65
sepallength
<
4.95
petallength
<
2.45
petalwidth
<
1.55
sepalwidth
<
2.9
sepalwidth
<
3.1
petallength
<
4.75
petalwidth
<
1.65
sepallength
<
4.95
petalwidth
<
1.55
petallength
<
2.6
sepallength
<
5.95
petallength
<
4.85
sepalwidth
<
2.65
sepalwidth
<
2.35
sepalwidth
<
2.45
sepalwidth
<
2.25
sepalwidth
<
2.35
sepalwidth
<
2.45
petalwidth
<
1.35
petalwidth
<
0.8
petalwidth
<
1.35
petalwidth
<
0.8
sepalwidth
<
2.55
petallength
<
5.45
petallength
<
5.25
sepalwidth
<
2.35
sepalwidth
<
2.45
sepalwidth
<
2.55
petallength
<
5.45
petallength
<
5.4
sepallength
<
5.6
petallength
<
5.35
petallength
<
5.25
sepalwidth
<
2.25
sepalwidth
<
2.35
sepalwidth
<
2.45
sepalwidth
<
2.55
sepallength
<
5.6
petallength
<
5.05
petallength
<
5.25
sepallength
<
5.8
petallength
<
5.35
petallength
<
5.25
sepallength
<
5.8
sepalwidth
<
2.35
petallength
<
5.35
petallength
<
5.25
sepallength
<
5.8
sepalwidth
<
2.45
sepalwidth
<
2.55
petalwidth
<
1.65
petallength
<
5.45
petalwidth
<
1.55
petalwidth
<
1.45
sepalwidth
<
2.35
sepalwidth
<
2.45
sepalwidth
<
2.25
sepalwidth
<
2.35
sepallength
<
6.15
sepalwidth
<
2.35
sepallength
<
6.5
sepallength
<
6.05
sepalwidth
<
2.45
sepallength
<
6.15
sepalwidth
<
2.25
sepalwidth
<
2.35
sepallength
<
6.5
sepallength
<
6.05
petalwidth
<
1.35
petalwidth
<
0.8
sepallength
<
6.05sepallength
<
6.05
petalwidth
<
1.35
petalwidth
<
0.8
sepalwidth
<
2.55
petallength
<
5.45
sepalwidth
<
2.45
sepallength
<
6.15
sepalwidth
<
2.35
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.05
sepalwidth
<
2.55
petallength
<
5.45
sepalwidth
<
2.25
sepalwidth
<
2.35
sepalwidth
<
2.45
sepalwidth
<
2.35
sepalwidth
<
2.45
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepalwidth
<
2.55
sepalwidth
<
2.45 sepalwidth
<
2.45
sepallength
<
6.15
sepalwidth
<
2.35
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepalwidth
<
2.55
petalwidth
<
1.65
sepalwidth
<
2.25
sepalwidth
<
2.35
sepalwidth
<
2.55
sepalwidth
<
2.55
petalwidth
<
1.65
sepallength
<
6.6
petallength
<
5.45
petalwidth
<
1.55
petalwidth
<
1.45
sepallength
<
5.95
sepalwidth
<
2.85
petalwidth
<
1.35
petalwidth
<
0.8
petallength
<
5.45
sepalwidth
<
2.85
petallength
<
5.45
petalwidth
<
1.55
petalwidth
<
1.45
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepalwidth
<
2.85
petalwidth
<
1.35
petalwidth
<
0.8
sepallength
<
6.05
sepalwidth
<
2.85
petalwidth
<
1.35
petalwidth
<
0.8
petallength
<
5.45
petallength
<
5.05
petallength
<
5.25
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.05
sepalwidth
<
2.85
petallength
<
5.45
sepallength
<
6.15
sepallength
<
6.5
sepallength
<
6.05
sepallength
<
6.05
petalwidth
<
1.65
petalwidth
<
1.65
sepallength
<
6.6
petallength
<
5.45
petalwidth
<
1.55
petalwidth
<
1.45
sepallength
<
5.95
sepalwidth
<
2.65
petallength
<
4.95
sepallength
<
4.95
petallength
<
2.45
sepallength
<
4.95
petallength
<
2.6
sepallength
<
5.4
sepallength
<
4.95
petallength
<
2.45
petallength
<
4.75
sepallength
<
5.85
sepallength
<
5.4
sepallength
<
4.95
petallength
<
2.6
petalwidth
<
1.75
petallength
<
2.45
petallength
<
2.6
petalwidth
<
1.75
sepallength
<
5.95
sepallength
<
5.8
sepallength
<
5.2
sepalwidth
<
2.45
petalwidth
<
1.75
petalwidth
<
1.75
sepallength
<
5.95
petallength
<
4.85
sepallength
<
4.95
petallength
<
2.45
sepallength
<
4.95
petallength
<
2.6
sepallength
<
5.4
sepallength
<
4.95
petallength
<
2.45
sepalwidth
<
3.1
petallength
<
4.75sepalwidth
<
3.1
sepalwidth
<
3.0
sepalwidth
<
2.9
sepalwidth
<
3.1
petallength
<
4.75
petallength
<
4.75
sepallength
<
5.85
sepallength
<
5.4
sepallength
<
4.95
petallength
<
2.6
petalwidth
<
1.75
sepallength
<
5.95
sepallength
<
5.8
sepallength
<
5.2
sepalwidth
<
3.1
petalwidth
<
1.75
sepallength
<
5.95
petallength
<
4.85
sepalwidth
<
2.65
petallength
<
5.35
petallength
<
5.05
sepallength
<
5.8
sepalwidth
<
2.35
petallength
<
5.45
petalwidth
<
1.75
sepalwidth
<
2.35
sepalwidth
<
2.35
sepallength
<
6.5
sepallength
<
6.05
petallength
<
5.45
petalwidth
<
1.75
sepallength
<
5.95
sepallength
<
5.6
petallength
<
5.35
petallength
<
5.05
petallength
<
5.05
sepallength
<
5.8
petallength
<
5.45
petalwidth
<
1.75
sepallength
<
6.05
petallength
<
5.45
petalwidth
<
1.75
sepallength
<
5.95
sepalwidth
<
2.65
petallength
<
4.95
petalwidth
<
1.7
petalwidth
<
0.75
Model
Explanation
Fig. 10. An extract of the model explanation. The ADD is constructed from the most
frequent label abstraction of the aggregated Random Forest following an elimination
of all infeasible paths (310 nodes, longest path with length 19, the highlighted path
has a length of 9).
and sepal characteristics. This decision set is our set of predicates. The con-
junction of these predicates is a solution to the Outcome Explanation Problem.
However, more concise explanations are derived from the class characterization
BDD discussed in the following section.
Given the sample petallength = 2.4, petalwidth = 1.8, sepallength = 5.9,
sepalwidth = 2.5, the outcome explanation given by the model explanation con-
sists of the following 9 predicates (in Figure 10 satisfied predicates are highlighted
in green, unsatisfied predicates are highlighted in red):
¬(petalwidth < 0.75) ¬(petalwidth < 1.7) (petallength < 4.95)
(sepalwidth < 2.65) (petallength < 4.85) (sepallength < 5.95)
¬(petalwidth < 1.75) (petallength < 2.6) (petallength < 2.45)
324 Murtovi et al.
Fig. 11. The class characterization for the class Iris-Setosa (10 nodes, the highlighted
path is also the longest path with length 5). The leaf corresponding to Iris-Setosa
is highlighted in green, the leaf representing all other classes (i.e. Iris-Virginica and
Iris-Versicolor) is highlighted in red.
While this is already an improvement compared to the Random Forest, where
you would have to traverse all 20 decision trees, we will see how we can improve
even more in the following.
6.3 Class Characterization Problem
The class characterization problem is particularly interesting because it allows
one to ‘reverse’ the classification process. While the direct problem is ‘given
a sample, provide its classification’, the reverse problem sounds ‘given a class,
what are the characteristics of all the samples belonging to this class?’
BDD-based Class Characterisation can be defined via the following simple
transformation function: Given a class c C, we define a corresponding projec-
tion function δB(c) : C Bon the co-domain as
δB(c)(c0) := (1 if c0=c
0 otherwise.
for c0 C. Again, the function δB(c) can be lifted to operate on ADDs, yielding
B(c) : DC DB.
The BDD shown in Figure 11 is a minimal characterization of the set of all
the samples that are guaranteed to be classified as Iris-Setosa.
Forest GUMP: A Tool for Explanation 325
Fig. 12. The outcome explanation for the input petallength = 2.4, petalwidth = 1.8,
sepallength = 5.9, sepalwidth = 2.5 (10 nodes, hightlighted path of length 5).
Being able to reverse a learned classification function has a major practi-
cal importance. Think, e.g., of a marketing research scenario where data have
been collected with the aim to propose bestfitting product offers to customers
according to their user profile. This scenario can be considered as a classification
problem where the offered product plays the role of the class. Now, being able to
reverse the customer product classification function provides the marketing
team with a tailored product customer promotion process: for a given prod-
uct, it addresses all customers considered to favor this very product as in the
corresponding patent [18].
The path highlighted in Figure 11 is the path from the root to the leaf
for the same sample petallength = 2.4, petalwidth = 1.8, sepallength = 5.9,
sepalwidth = 2.5. Compared to the path with length 9 in the model explanation,
we now have a path of length 5 with the following predicates:
¬(petalwidth < 0.75) (petallength < 4.95) (petallength < 4.85)
(petallength < 2.6) (petallength < 2.45)
6.4 Outcome Explanation Problem
The previous classification formula expresses the collection of ‘conditions’ that
this sample satisfies, and it provides therefore a precise justification why it is
classified in this class. Despite the fact that the class characterization BDD is
326 Murtovi et al.
canonical, it is easy to see that there are some redundancies in the formula. For
example, a petallength < 2.45 is also inherently smaller than 2.6, 4.85 and 4.95;
therefore, for this specific sample those three predicates are redundant. This is
the result of the imposed predicate ordering in BDDs: all the BDD predicates are
listed, and they are listed in a fixed order. After eliminating these redundancies,
we are left with the following precise minimal outcome explanation: this sample
is recognized as belonging to the class Iris-Setosa because it has the properties
¬(petalwidth < 0.75) (petallength < 2.45).
In Forest GUMP we make these redundant predicates explicit by highlighting
them in blue (see Figure 12). From 9 predicates in the model explanation to 5
predicates in the class characterization, we have now arrived at an explanation
that only consists of 2 predicates.
7 Lessons Learned
Playing with Forest GUMP led to interesting observations not only concerning
the analyzed data domains but also concerning Random Forest Learning and
the applied ADD technology.
Random Forest Learning. Changing the random seed for the learning process
had a significant impact on the size of the explanation models and the class
characterizations. The observed sizes of the explanation models ranged from 138
to 519. Interesting was that the larger sizes did not necessarily imply a better
prediction quality. The same also applied to the class characterizations. In fact,
we observed a 100% prediction quality for a class characterization of only 3
nodes, while a class characterization for the same species with 40 nodes only
scored 33% prediction.
Analyzed Data Domain. The class characterizations for the three iris species
differed quite a bit. For two species the observed sizes were much bigger than the
sizes of the third species, independently of the chosen random seed and bagging
size. In fact, for Iris-Setosa we observed a class characterization with only 3
nodes implying an outcome explanation for our chosen sample with only one
predicate. Figure 13 serves for the corresponding explanation. Put it differently,
class characterizations seem to be good indications for ‘tightness’: The closer the
samples lie the more criteria are required for separation.
ADD Technology. ADDs are canonical as soon as one has chosen a predi-
cate/variable ordering. Although we could observe the effect of corresponding
optimization heuristics5, the impact was moderate and helpful mainly for model
explanation and class characterization. Figure 14 shows the the outcome ex-
planation for the same problem but where the ADD, representing the class
characterization for the class Iris-Setosa, is reordered.6While the reordering
5CUDD [25] provides a number of heuristics for optimizing variable orders.
6The used reordering method is named CUDD REORDER GROUP SIFT CONV as
it was both, fast and effective, in our experiments.
Forest GUMP: A Tool for Explanation 327
1234567
petal length (cm)
0.0
0.5
1.0
1.5
2.0
2.5
petal width (cm)
Iris-Setosa
Iris-Versicolor
Iris-Virginica
Fig. 13. Visualization of the iris dataset using only the petal length and petal width.
reduces the class characterization size from 10 to 8 nodes, the length of the
outcome explanation is unchanged. For the model explanation of Figure 10, the
size can be reduced from 310 nodes to 196 nodes while the path for the sample
petallength = 2.4, petalwidth = 1.8, sepallength = 5.9, sepalwidth = 2.5 actu-
ally increased by 1 (from 9 to 10). Thus the outcome explanation may even be
impaired. This is not too surprising as these optimizations aim a size reduction
and not depth reduction of the considered ADDs. We are currently investigating
good heuristics for depth reduction.
More striking was the impact of infeasible path elimination. In fact, this opti-
mization can be regarded key for scalability when increasing the forest size. [14]
reports results about forests with 10.000 trees. Without infeasible path reduction
already 100 trees are problematic.
Standard ADD frameworks work on Boolean variables rather than predicates.
Thus in their setting infeasible paths do not occur. The problem of infeasible
path reduction in ADDs was first discussed in [13,14]. Our current corresponding
solution is still basic. We are currently generalizing our solution using more
involved SMT technology.
Of course, these observations where made on rather small datasets and it has
to be seen how well they tranfer to more complex scenarios. We believe, how-
ever, that they indicate general phenomena whose essence remains true in larger
setting.
8 Conclusion and Perspectives
We have presented Forest GUMP (for Generalized, Unifying Merge Process) a
tool for providing tangible experience with three concepts of explanation: model
328 Murtovi et al.
Fig. 14. The outcome explanation for the input petallength = 2.4, petalwidth = 1.8,
sepallength = 5.9, sepalwidth = 2.5 (8 nodes, highlighted path of length 5) where the
class characterization from Figure 11 is reordered.
explanation,outcome explanation, and class characterization. Key technology to
achieve model explanation is algebraic aggregation, i.e. the transformation of a
Random Forest into a semantically equivalent, concise white-box representation
in terms of Algebraic Decision Diagrams. Class characterization is then achieved
in terms of BDDs where the structure unnecessary to distinguish the considered
class is collapsed. This abstraction is not only interesting in itself to better
understand how easily the classes can be separated, but it also leads to highly
optimized outcome explanations. Together with infeasible path elimination and
the suppression of redundant predicates on a path, we observe reductions of
outcome explanations by more than an order of magnitude. Forest GUMP allows
even newcomers to easily experience these phenomena without much training.
Of course, these are first steps in a very ambitious new direction and it has
to be seen how far the approach carries. Scalability will probably require decom-
position methods, perhaps in a similar fashion as illustrated by the difference
between model explanation and the considerably smaller class characterization.
More work is needed also on techniques that aim at limiting the number of
involved predicates.
Data Availability Statement: The artifact is available in the Zenodo
repository [22].
References
1. Akers, S.B.: Binary decision diagrams. IEEE Trans. Comput. 27(6), 509–516 (1978)
2. Bahar, R., Frohm, E., Gaona, C., Hachtel, G., Macii, E., Pardo, A., Somenzi, F.:
Algebraic decision diagrams and their applications. In: Proceedings of 1993 Inter-
Forest GUMP: A Tool for Explanation 329
national Conference on Computer Aided Design (ICCAD). pp. 188–191 (1993).
https://doi.org/10.1109/ICCAD.1993.580054
3. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (Oct 2001).
https://doi.org/10.1023/A:1010933404324
4. Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE
Trans. Comput. 35(8), 677–691 (1986). https://doi.org/10.1109/TC.1986.1676819
5. Chipman, H.A., George, E.I., McCulloch, R.E.: Making sense of a forest of trees
(1999)
6. Deng, H.: Interpreting tree ensembles with intrees. Int. J. Data Sci. Anal. 7(4),
277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8
7. Domingos, P.M.: Knowledge discovery via multiple models. Intell. Data Anal. 2(1-
4), 187–202 (1998). https://doi.org/10.1016/S1088-467X(98)00023-7
8. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of
eugenics 7(2) (1936)
9. Gossen, F., Margaria, T., Murtovi, A., Naujokat, S., Steffen, B.: Dsls for decision
services: A tutorial introduction to language-driven engineering. In: Margaria, T.,
Steffen, B. (eds.) Leveraging Applications of Formal Methods, Verification and Val-
idation. Modeling - 8th International Symposium, ISoLA 2018, Limassol, Cyprus,
November 5-9, 2018, Proceedings, Part I. Lecture Notes in Computer Science,
vol. 11244, pp. 546–564. Springer (2018). https://doi.org/10.1007/978-3-030-03418-
4 33
10. Gossen, F., Margaria, T., Steffen, B.: Towards explainability in ma-
chine learning: The formal methods way. IT Prof. 22(4), 8–12 (2020).
https://doi.org/10.1109/MITP.2020.3005640
11. Gossen, F., Margaria, T., Steffen, B.: Formal methods boost experi-
mental performance for explainable AI. IT Prof. 23(6), 8–12 (2021).
https://doi.org/10.1109/MITP.2021.3123495,https://doi.org/10.1109/MITP.
2021.3123495
12. Gossen, F., Murtovi, A., Linden, J., Steffen, B.: The java library for algebraic
decision diagrams. https://add-lib.scce.info, accessed: 2022-01-13
13. Gossen, F., Steffen, B.: Large random forests: Optimisation for rapid evaluation.
CoRR abs/1912.10934 (2019), http://arxiv.org/abs/1912.10934
14. Gossen, F., Steffen, B.: Algebraic aggregation of random forests: towards explain-
ability and rapid evaluation. International Journal on Software Tools for Technol-
ogy Transfer (Sep 2021). https://doi.org/10.1007/s10009-021-00635-x
15. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.:
A survey of methods for explaining black box models. ACM Comput. Surv. 51(5),
93:1–93:42 (2019). https://doi.org/10.1145/3236009
16. Hara, S., Hayashi, K.: Making tree ensembles interpretable: A bayesian model selec-
tion approach. In: Storkey, A.J., erez-Cruz, F. (eds.) International Conference on
Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca,
Lanzarote, Canary Islands, Spain. Proceedings of Machine Learning Research,
vol. 84, pp. 77–85. PMLR (2018), http://proceedings.mlr.press/v84/hara18a.html
17. Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Confer-
ence on Document Analysis and Recognition. vol. 1, pp. 278–282 vol.1 (1995).
https://doi.org/10.1109/ICDAR.1995.598994
18. Hungar, H., Steffen, B., Margaria, T.: Methods for generating selection structures,
for making selections according to selection structures and for creating selection de-
scriptions. https://patents.justia.com/patent/9141708 (Sep 2015), USPTO Patent
number: 9141708
330 Murtovi et al.
19. Lee, C.Y.: Representation of switching circuits by binary-decision programs. Bell
System Technical Journal 38(4), 985–999 (1959)
20. Lou, Y., Caruana, R., Gehrke, J.: Intelligible models for classification and
regression. In: Yang, Q., Agarwal, D., Pei, J. (eds.) The 18th ACM
SIGKDD International Conference on Knowledge Discovery and Data Min-
ing, KDD ’12, Beijing, China, August 12-16, 2012. pp. 150–158. ACM (2012).
https://doi.org/10.1145/2339530.2339556
21. de Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of
Systems. pp. 337–340. Springer Berlin Heidelberg, Berlin, Heidelberg (2008).
https://doi.org/10.1007/978-3-540-78800-3 24
22. Murtovi, A., Bainczyk, A., Steffen, B.: Forest gump: A tool for explanation (tacas
2022 artifact) (Nov 2021). https://doi.org/10.5281/zenodo.5733107
23. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
24. Ribeiro, M.T., Singh, S., Guestrin, C.: ”why should I trust you?”: Explaining
the predictions of any classifier. In: Krishnapuram, B., Shah, M., Smola, A.J.,
Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
San Francisco, CA, USA, August 13-17, 2016. pp. 1135–1144. ACM (2016).
https://doi.org/10.1145/2939672.2939778
25. Somenzi, F.: Cudd: Cu decision diagram package release 3.0 (2015)
26. Steffen, B., Gossen, F., Naujokat, S., Margaria, T.: Language-Driven Engineering:
From General-Purpose to Purpose-Specific Languages, pp. 311–344. Springer Inter-
national Publishing, Cham (2019). https://doi.org/10.1007/978-3-319-91908-9 17
27. Van Assche, A., Blockeel, H.: Seeing the forest through the trees: Learning a
comprehensible model from an ensemble. In: Kok, J.N., Koronacki, J., Mantaras,
R.L.d., Matwin, S., Mladeniˇc, D., Skowron, A. (eds.) Machine Learning: ECML
2007. pp. 418–429. Springer Berlin Heidelberg, Berlin, Heidelberg (2007)
28. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining, Fourth Edition: Prac-
tical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA, 4th edn. (2016)
29. Zhou, Y., Hooker, G.: Interpreting models via single tree approximation (2016)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Forest GUMP: A Tool for Explanation 331
Alpinist
: an Annotation-Aware GPU Program
Optimizer?
¨
Omer S¸akar1() , Mohsen Safari1, Marieke Huisman1, and Anton Wijs2
1Formal Methods and Tools, University of Twente, Enschede, The Netherlands
{o.f.o.sakar,m.safari,m.huisman}@utwente.nl
2Software Engineering & Technology, Eindhoven University of Technology,
Eindhoven, The Netherlands
a.j.wijs@tue.nl
Abstract. GPU programs are widely used in industry. To obtain the
best performance, a typical development process involves the manual or
semi-automatic application of optimizations prior to compiling the code.
To avoid the introduction of errors, we can augment GPU programs
with (pre- and postcondition-style) annotations to capture functional
properties. However, keeping these annotations correct when optimizing
GPU programs is labor-intensive and error-prone.
This paper introduces Alpinist, an annotation-aware GPU program op-
timizer. It applies frequently-used GPU optimizations, but besides trans-
forming code, it also transforms the annotations. We evaluate Alpinist,
in combination with the VerCors program verifier, to automatically op-
timize a collection of verified programs and reverify them.
Keywords: GPU ·Optimization ·Deductive verification ·Annotation-
aware ·Program transformation
1 Introduction
Over the course of roughly a decade, graphics processing units (GPUs) have
been pushing the computational limits in fields as diverse as computational biol-
ogy [64], statistics [35], physics [7], astronomy [24], deep learning [29], and formal
methods [17,43,44,65,67]. Dedicated programming languages such as CUDA [34]
and OpenCL [42] can be used to write GPU source code. To achieve the most
performance out of GPUs, developer should apply incremental optimizations,
tailored to the GPU architecture. Unfortunately, this is to a large extent a man-
ual activity. The fact that for different GPU devices, the same code tends to
require a different sequence of transformations [21] makes this procedure even
more time consuming and error-prone. Recently, automating this has received
some attention, for instance by applying machine learning [3].
?This work is supported by NWO grant 639.023.710 for the Mercedes project and by
NWO TTW grant 17249 for the ChEOPS project
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 332–352, 2022.
https://doi.org/10.1007/978-3-030-99527-0_18
V
1
.
1
A
r
t
i
f
a
c
t
s
A
v
a
i
l
a
b
l
e
User-Selected
Transformations
sdf
1
2
3
4
Annotated
Program
Deductive Program Verifier
Annotation-Aware
Program
Transformer
Transformed
Annotated
Program
Fig. 1: Annotation-Aware Program Transformation.
Reasoning about the correctness of GPU software is hard, but necessary. Mul-
tiple verification techniques and tools have been developed to aid in this task
aimed at detecting data races, see [8,10,14,32,33], and for a recent overview,
see [22]. Some of these techniques apply deductive program verification, which
requires a program to be manually augmented with pre- and postcondition an-
notations. However, annotating a program is time consuming. The more complex
a program is, the more challenging it becomes to annotate it. In particular, as a
program is being optimized repeatedly, its annotations tend to change frequently.
This paper presents Alpinist, a tool that can apply annotation-aware trans-
formations [26] on annotated GPU programs. It can be used with the deductive
program verifier VerCors [9]. VerCors can verify the functional correctness of
GPU programs [10]. It allows the verification of many typical GPU computa-
tions, see e.g., [48,50,51]. The purpose of Alpinist is twofold (see Fig. 1): First, it
automates the optimization of GPU code, to the extent that the developer needs
to indicate which optimization needs to be applied where, and the tool performs
the transformation. Interestingly, the presence of annotations is exploited by
Alpinist to determine whether an optimization is actually applicable, and in
doing so, can sometimes apply an optimization where a compiler cannot. Second,
as it applies a code transformation, it also transforms the related annotations,
which means that once the developer has annotated the unoptimized, simpler
code, any further optimized version of that code is automatically annotated with
updated pre- and postconditions, making it reverifiable. This avoids having to
re-annotate the program every time it is optimized for a specific GPU device.
Alpinist supports GPU code optimizations that are used frequently in prac-
tice, namely loop unrolling, tiling, kernel fusion, iteration merging, matrix lin-
earization and data prefetching. In the current paper, we discuss how Alpinist
has been implemented, how it can be applied on annotated GPU code, and how
some of the more complex optimizations work. In addition, we evaluate the ef-
fect of applying several of these optimizations, both in terms of annotation size
and time needed to verify a program, to a collection of examples including the
verified case studies in [48,49,51].
Outline. Section 2demonstrates how Alpinist optimizes a verified GPU pro-
gram while preserving its provability. Section 3discusses the architecture of
Alpinist. Section 4discusses the most complex optimizations supported by
Alpinist: an Annotation-Aware GPU Program Optimizer 333
1/*@ context_everywhere N > 0 && N < a.length;
2req (\forall* int i; 0 <= i < a.length; Perm(a[i], 1));
3ens (\forall* int i; 0 <= i < a.length; i != a.length-1 ==> Perm(a[i+1], 1));
4ens (\forall* int i; 0 <= i < a.length; i == a.length-1 ==> Perm(a[0], 1));
5ens (\forall int i; 0 <= i < a.length-1; a[i+1] == N*i);
6ens a[0] == N*(a.length-1); @*/
7void Host(int[] a, int size, int N) {
8par Kernel1 (int tid = 0 .. a.length)
9/*@ context Perm(a[tid], 1);
10 ens a[tid] == 0; @*/
11 { a[tid] = 0; }
12 par Kernel2 (int tid = 0 .. a.length)
13 /*@ context tid != a.length-1 ? Perm(a[tid+1], 1) : Perm(a[0], 1);
14 req tid != a.length-1 ? a[tid+1] == 0 : a[0] == 0;
15 ens tid != a.length-1 ? a[tid+1] == N*tid : a[0] == N*tid; @*/
16 {/*@ inv k >= 0 && k <= N;
17 inv tid != a.length-1 ? Perm(a[tid+1], 1) : Perm(a[0], 1);
18 inv tid != a.length-1 ? a[tid+1] == k*tid : a[0] == k*tid;@*/
19 for(int k = 0; k < N; k++) {
20 if (tid != a.length-1) { a[tid+1] = a[tid+1] + tid; }
21 else { a[0] = a[0] + tid; }
22 }}}
Fig. 2: A verified GPU-style program
Alpinist in detail, namely loop unrolling, tiling and kernel fusion, and briefly
discusses the remaining three. Section 5presents the results of experiments in
which the tool has been applied on a collection of programs. Section 6discusses
related work and Section 7concludes the paper, and discusses future work.
2 Annotation-Aware Optimization using
Alpinist
This section illustrates how Alpinist can optimize a verified GPU program while
preserving its provability. Fig. 2shows a GPU program with annotations [10] that
is verified by VerCors. The example is written in a simplified version of VerCors’
own language PVL. The program initializes an array a, and subsequently updates
the values in a,Ntimes. The workflow of a GPU program in general is that the
host (i.e., CPU) invokes a kernel, i.e., a GPU function, executed by a specified
number of GPU threads. These threads are organized in one or more thread
blocks. In this program, there are two kernels, both executed by one thread
block of a.length threads (lines 8 and 12 (l.8, l.12))3. Each thread has a unique
identifier, in the example called tid. In the first kernel (l.8-l.11), each thread
initializes a[tid] to 0. In the second kernel (l.12-l.22), each thread updates
a[tid+1] (modulo a.length)Ntimes, by adding tid to it. In the main Host
function, Kernel1 is called, followed by Kernel2.
The kernels, the for-loop and the host function are annotated for verification
(in blue), using permission-based separation logic [6,11,12]. Permissions capture
which memory locations may be accessed by which threads; they are fractional
values in the interval (0, 1] (cf. Boyland [12]): any fraction in the interval (0,
3In practice, the size of a block cannot exceed a specific upper-bound, but for this
example, we assume that a.length is sufficiently small.
334 ¨
O. S¸akar et al.
1/*@ context_everywhere N > 0 && N < a.length;
2req (\forall* int i; 0 <= i < a.length; Perm(a[i], 1));
3ens (\forall* int i; 0 <= i < a.length; i != a.length-1 ==> Perm(a[i+1], 1));
4ens (\forall* int i; 0 <= i < a.length; i == a.length-1 ==> Perm(a[0], 1));
5ens (\forall int i; 0 <= i < a.length-1; a[i+1] == N*i);
6ens a[0] == N*(a.length-1); @*/
7void Host(int[] a,int size,int N){
8par Fused_Kernel(int tid = 0 .. a.length)
9/*@ req Perm(a[tid], 1);
10 ens tid != a.length-1 ? Perm(a[tid+1], 1) : Perm(a[0], 1);
11 ens tid != a.length-1 ? a[tid+1] == N*tid : a[0] == N*tid; @*/
12 {
13 a[tid] = 0;
14 /*@ req Perm(a[tid], 1);
15 req a[tid] == 0;
16 ens tid != a.length-1 ? Perm(a[tid+1], 1) : Perm(a[0], 1);
17 ens tid != a.length-1 ? a[tid+1] == 0 : a[0] == 0; @*/
18 barrier(Fused_Kernel)
19
20 int a_reg_0, a_reg_1;
21 if (tid != a.length-1) { a_reg_1 = a[tid+1] } else { a_reg_0 = a[0] }
22 int k = 0;
23 if (tid != a.length-1) { a_reg_1 = a_reg_1 + tid; }
24 else { a_reg_0 = a_reg_0 + tid; }
25 k ++;
26 /*@ inv k >= 0 + 1 && k <= N;
27 inv tid != a.length-1 ? Perm(a[tid+1], 1) : Perm(a[0], 1);
28 inv tid != a.length-1 ? a reg 1 == k*tid : a reg 0 == k*tid; @*/
29 for(k; k < N; k++) {
30 if (tid != a.length-1) { a_reg_1 = a_reg_1 + tid; }
31 else { a_reg_0 = a_reg_0 + tid; }
32 }
33 if (tid != a.length-1) { a[tid+1] = a_reg_1 } else { a[0] = a_reg_0 };
34 } }
Fig. 3: An optimized GPU-style program, annotated for verification
1) indicates a read permission, while 1 indicates a write permission. A write
permission can be split into multiple read permissions and read permissions can
be added up, and transformed into a write permission if they add up to 1. The
soundness of the logic ensures that for each memory location, the total number
of permissions among all threads does not exceed 1.
To specify permissions, predicates are used of the form Perm(L, π)where L
is a heap location and πa fractional value in the interval (0, 1] (e.g., 1\3). Pre-
and postconditions, denoted by keywords req and ens, should hold at the begin-
ning and the end of an annotated function, respectively. The keyword context
abbreviates both req and ens (l.9, l.13). The keyword context everywhere is
used to specify a property that must hold throughout the function (l.1). Note
that \forall* is used to express a universal separating conjunction over permis-
sion predicates (l.2-l.4) and \forall is used as standard universal conjunction
over logical predicates (l.5). For logical conjunction, && is used and ∗∗ is used as
separating conjunction in separation logic.
In the example, write permissions are required for all locations in a(l.2).
The pre- and postconditions of the first kernel specify that each thread needs
write permission for a[tid] (l.9). The postcondition states that a[tid] is set
to 0 (l.10). In the second kernel, all threads have write permission for a[tid+1],
Alpinist: an Annotation-Aware GPU Program Optimizer 335
except thread a.length-1 which has write permission for a[0] (l.13). Moreover,
it is required that a[tid+1] (modulo a.length) is 0 (l.14). For the for-loop (l.19-
l.22), loop invariants are specified: kis in the range [0,N] (l.16), each thread has
write permission for a[tid+1] (modulo a.length) (l.17) and this location always
has the value k*tid (l.18). The postconditions of the second kernel and the host
function are similar to this latter invariant.
Fig. 3shows an optimized version of the program, with updated annotations
to make it verifiable. Alpinist has applied three optimizations:
1. Fusing the two kernels: in GPU programs, the only global synchronisation
points (used, for instance, to avoid data races) exist implicitly between ker-
nel launches. However, if such a global synchronisation point is not really
needed between two specific kernels, then fusing them gives several benefits,
in particular the ability to store intermediate results in (fast) thread-local
register memory as opposed to (slow) GPU global memory, and it has a
positive effect on power consumption [62]. In the example, the kernels are
combined into Fused Kernel, and a thread block-local barrier is introduced
(l.18) to avoid data races within the single thread block executing the code.
2. Using register memory ; register variables can be used to reduce the number
of global memory accesses. Here, the use of a reg 0 and a reg 1 has been
enabled by kernel fusion.
3. Unrolling the for-loop; the for-loop has been unrolled once here (l.20-l.25).
Since GPU threads are very light-weight, compared to CPU threads, any
checking of conditions that can be avoided benefits performance. When un-
rolling a loop, this means that fewer checks of the loop-condition are needed.
Note that here, Alpinist benefits from the knowledge that N>0 (l.1), so it
knows that the for-loop can be unrolled at least once.
To preserve provability of the optimized program, Alpinist changed the
annotations, in particular the pre- and postcondition of the fused kernel and
the loop invariants (highlighted in Fig. 3). Moreover, Alpinist introduced an
annotated barrier (l.14-l.18). Since threads synchronize at a barrier, it is possible
to redistribute the permissions. In the rest of the paper, we discuss how Alpinist
performs these annotation-aware transformations.
3 The Design of
Alpinist
This section gives a high-level overview of the design of Alpinist. The opti-
mizations supported by Alpinist are discussed in Section 4. To understand the
design of Alpinist, we first explain the architecture of the VerCors verifier.
3.1 VerCors’ Architecture
VerCors is a deductive program verifier, which is designed to work for different in-
put languages (e.g., Java and OpenCL). It takes as input an annotated program,
336 ¨
O. S¸akar et al.
which is then transformed in several steps into an annotated Silver program. Sil-
ver is an intermediate verification language, used as input for Viper [37,60].
Viper then generates proof obligations, which can be discharged by an auto-
mated theorem prover, such as Z3 [36].
The internal transformations in VerCors are defined over our internal AST
representation (written in the Common Object Language or COL [52]), which
captures the features of all input languages. Some of the transformations are
generic (e.g., splitting composite variable declarations) and others are specific
to verification (e.g., transforming contracts). The transformations implemented
as part of Alpinist are also applied on the COL AST, but they are developed
with a different goal in mind, and in particular several of the transformation are
specific to the supported optimizations.
Using VerCors and its architecture to implement Alpinist gives us some ben-
efits. First, existing helper functions can be reused, which simplifies tasks such
as gathering information regarding specific AST nodes. Second, some generic
transformations of VerCors can be reused, such as splitting composite variable
declarations or simplifying expressions. This helps to simplify the implementa-
tion of the optimizations. Third, using the architecture of VerCors allows us to
prove assertions that we generate relatively easily by invoking VerCors internally.
3.2
Alpinist
’s Architecture
Alpinist takes a verified file as its input, annotated with special optimiza-
tion annotations that indicate where specific optimizations should be applied.
Alpinist is written in Java and Scala and runs on Windows, Linux and macOS.
Fig. 4gives a high-level overview of the internal design of Alpinist. The input
program goes through four phases: the parsing phase, the applicability checking
phase, the transformation phase and the output phase.
The parsing phase transforms the input file into a COL AST, after which
the applicability checking phase checks if the optimization can be applied. Some
optimizations, such as tiling (see Section 4.2), are always applicable, hence their
applicability check always passes. For other optimizations, prerequisites must be
established. Sometimes, a syntactical analysis of the AST suffices, e.g., kernel
fusion (see Section 4.3). For this optimization, it must be determined whether
there is any data dependency between two selected kernels. When analysis of the
AST is not enough, VerCors can be used to perform more complex reasoning.
An example of this is loop unrolling (see Section 4.1). Its prerequisite is that for
the loop to be unrollable ktimes, it is guaranteed that the loop executes at least
ktimes. This prerequisite is encoded as an assertion to be proven by VerCors.
The applicability checking phase is one of the strengths of Alpinist. It ex-
ploits the fact that the input program is annotated to determine whether an
optimization is applicable, and relies on the fact that VerCors can perform com-
plex reasoning. Moreover, this approach allows to distinguish failure due to un-
satisfied prerequisites and due to mistakes in the transformation procedure.
Alpinist: an Annotation-Aware GPU Program Optimizer 337
Transformation
Phase
Applicability
Checking Phase
Parsing
Phase
Output
File
Input
File
Output
Phase
Fig. 4: The internal design of Alpinist.
If the applicability check passes (i.e., the optimization is applicable), the
transformation phase is next, otherwise a message is generated that the prereq-
uisites could not be proven.
The transformation phase applies the optimizations to the input AST. The
output phase either prints the optimized program in the same language as the
input program, or a message is printed, signifying either a failure in optimizing
or a verification failure in the applicability checking phase.
4 GPU Optimizations
Alpinist supports six frequently-used GPU optimizations, namely loop un-
rolling, tiling, kernel fusion, iteration merging, matrix linearization and data
prefetching. This section discusses loop unrolling, tiling, and kernel fusion in
detail. The other optimizations follow the same approach in spirit and are dis-
cussed briefly, which can be found in the Alpinist implementation [16]. Each
optimization is introduced in the context of GPU programs. Then, we discuss
how to apply them. Interesting insights are discussed where relevant.
4.1 Loop Unrolling
Loop unrolling is a frequently-used optimization technique that is applicable
to both GPU and CPU programs. It unrolls some iterations of a loop, which
increases the code size, but can have a positive impact on program performance;
e.g., see [21,38,46,59,63] for its impact, specifically on GPU programs. Fig. 5
shows an example of unrolling an (annotated) loop twice: the body of the loop is
duplicated twice before the loop. This has the following effect on the annotations:
the loop invariant bounding the loop variable (l.5) changes in the optimized
program (l.14). Note that the other loop invariants (i.e., Inv(i)) remain the
same. Moreover, after each unrolling part, we add all invariants as assertions
(l.8-l.10) except after the last unroll. This captures that the code produced by
unrolling the loop should still satisfy the original loop invariants.
Our approach to loop unrolling is more general than optimization techniques
during compilation. For instance, the unroll pragma in CUDA [55] and the
unroll function in Halide [56] unroll loops by calculating the number of iterations
to see if unrolling is possible, i.e., it should be computable at compile time.
This difference is illustrated in Fig. 5where N(i.e., the number of iterations)
is unknown at compile time. Their approach cannot automatically handle this
338 ¨
O. S¸akar et al.
1/*@ context_everywhere N > 1; @*/
2void Host(int[] arr, int size, int N){
3par Kernel(tid=0..size){
4int i = 0;
5/*@ inv i >= 0 && i <= N;
6inv N > 1;
7inv Inv(i); @*/
8loop (i < N){
9int newInt = i;
10 arr[tid] = arr[tid] + newInt;
11 i=i+1;}
12 } }
1/*@ context_everywhere N > 1; @*/
2void Host(int[] arr, int size, int N){
3par Kernel(tid=0..size){
4int i = 0;
5int newInt = i;
6arr[tid] = arr[tid] + newInt;
7i = i + 1;
8//@ assert i >= 1 && i <= N;
9//@ assert N > 1;
10 //@ assert Inv(i);
11 newInt = i;
12 arr[tid] = arr[tid] + newInt;
13 i = i + 1;
14 /*@ inv i >= 2 && i <= N;
15 inv N > 1;
16 inv Inv(i); @*/
17 loop (i < N){
18 newInt = i;
19 arr[tid] = arr[tid] + newInt;
20 i=i+1;}
21 } }
Fig. 5: An example of unrolling a loop 2 times.
1void Host(int[] array, int size){
2par Kernel(tid=0..size){
3int i = init; // The loop variable
4
.
.
.
5//@ assert (i == a) || (i == b); // Depending on initialization of i only one
6// of the conditions is specified
7/*@ inv i >= a && i <= b; // The lowerbound of i (a), The upperbound of i (b)
8inv Inv(i); @*/ // Additional loop invariants
9loop (cond(i)) { // The loop condition
10 body(i); // The loop body, a sequence of statements in the ith iteration.
11 i = upd(i); } // The update function of i, restricted to (i+c),(ic),
12 } } // (i×c)or (i/c)where c is a positive integer constant4.
Fig. 6: A general template of a loop inside a kernel.
case, while our approach can automatically unroll the loop, since annotations
(l.1, l.6) specify the lower-bound of N(provided by the programmer, who knows
that this is a valid lower-bound). VerCors verifies that the unrolling is valid.
Fig. 6shows a loop template in a verified GPU program. We would like
to automatically unroll the loop ktimes and preserve the provability of the
program. To accomplish this, we follow a procedure consisting of three parts:
the main, checking and updating part. In the main part, an annotated (verified)
GPU program and positive kare given as input. Next we go to the checking
part, to see if it is possible to unroll the loop ktimes. This part corresponds
with the applicability checking phase. Thus, we statically calculate the number
of loop iterations, by counting how many times the condition (cond(i)) holds
starting from either a(as the lowerbound of i) or b(as the upperbound of i),
depending on the operation of upd(i). If kis greater than the total number of
loop iterations at the end of the checking part, then we report an error. Otherwise
4If cwas negative, for the multiplication and division, iwould oscillate between
positive and negative values and hence would not always be useful as array index.
Hence we consider cto be positive.
Alpinist: an Annotation-Aware GPU Program Optimizer 339
t0t1t2t3t4t5t6t7t8t9t10 t11
t0t1t2t3t0t1t2t3t0t1t2t3
t0t0t0t0t1t1t1t1t2t2t2t2
Inter
One thread per location
Fig. 7: Inter- and intra-tiling of an array as T= 12, N= 4 and dT/Ne= 3.
1void Host(int[] a, int T){
2par Kernel(tid = 0..T)
3/*@ // Preconditions related to permissions and functional correctness
4req prePerm(a[tid]) ** preFunc(a[tid]);
5// Postconditions related to permissions and functional correctness
6ens postPerm(a[tid]) ** postFunc(a[tid]); @*/
7{ body(a[tid]); } }
Fig. 8: A general unoptimized GPU program to apply for tiling.
we go to the updating part, in which we update either aor baccording to the
operation in upd(i). If the operation is addition or multiplication, then the loop
variable i(in the unoptimized program) goes from ato b. That means, after
unrolling, ashould be updated according to the constant cfrom the update
expression and k. If the operation is subtraction or division, igoes from bto a.
Thus, after unrolling, bshould be updated. After the updating part, we return
to the main part to unroll the loop ktimes.
4.2 Tiling
Tiling is another well-known optimization technique for GPU programs. It in-
creases the workload of the threads to fully utilize GPU resources by assigning
more data to each thread. Concretely, we assume there are Tthreads and a one-
dimensional array of size Tin the unoptimized GPU program where each thread
is responsible for one location in that array (Fig. 8). To apply the optimization,
we first divide the array into dT/Nechunks, each of size N(1 NT)5. There
are two different ways to create and assign threads to array cells (as in Fig. 7):
Inter-Tiling We define Nthreads and assign them to one specific location in
each chunk. That means each thread serially iterates over all chunks to be
responsible for a specific location in each chunk.
Intra-Tiling We define dT/Nethreads and assign one thread to one chunk
(i.e., 1-to-1 mapping) to serially iterate over all cells in that chunk.
Both forms of tiling can have a positive impact on GPU program performance;
e.g., see [25,28,47,69] for the impact of this optimization.
Fig. 9shows the optimized version of Fig. 8by applying inter-tiling. Regard-
ing program optimization, two major changes happen: 1) the total number of
threads has reduced (l.2), and 2) the body is encapsulated inside a loop (l.16-
l.18). As mentioned, in inter-tiling, we define Nthreads instead of T. The number
5Since Nis in the range 1 NT, the last chunk might have fewer cells.
340 ¨
O. S¸akar et al.
1void Host(int[] a, int T){
2par Kernel(tid = 0..N)
3/*@ req (\forall* int i; 0 <= i && i < ceiling(T, N) && tid+i×N < T;
4pre(a[tid+i×N]));
5ens (\forall* int i; 0 <= i && i < ceiling(T, N) && tid+i×N < T;
6post(a[tid+i×N])); @*/
7{
8int j = 0;
9/*@ inv j >= 0 && j <= ceiling(T, N);
10 inv (\forall* int i; 0 <= i && i < ceiling(T, N) && tid+i×N < T;
11 prePerm(a[tid+i×N]));
12 inv (\forall int i; j <= i && i < ceiling(T, N) && tid+i×N < T;
13 preFunc(a[tid+i×N]));
14 inv (\forall* int i; 0 <= i && i < j && tid+i×N < T;
15 postFunc(a[tid+i×N])); @*/
16 loop (tid+j×N < T){
17 body(a[tid+j×N]);
18 j=j+1;}
19 } }
Fig. 9: Optimized version of the GPU program of Fig. 8after applying inter-tiling.
of chunks is indicated by the function ceiling(T, N). Each thread in the newly
added loop iterates over all chunks (in the range 0 to ceiling(T, N)-1) to be
responsible for a specific location. This happens by the loop variable jand the
loop condition tid+j×N<T. This means, each thread tid can access its own
location at index tid in each chunk. To preserve verifiability, we add invariants
to the loop (l.9-l.17). Therefore, we specify:
the boundaries of the loop variable j, which iterates over all chunks.
a permission-related invariant for each thread in each chunk (l.10). This
comes from the precondition of the kernel and is quantified over all chunks.
an invariant to indicate functional properties of the locations that have not
yet been updated by threads in the body of the loop (l.12). This comes from
the functional precondition of the kernel and is quantified over all chunks.
an invariant to specify how each thread updates the array in each chunk
(l.14). This comes from the functional property as the postcondition of the
kernel and is quantified over all chunks.
Moreover, we modify the specification of the kernel (l.3-l.6). Note that we have
the condition tid+j×N<Tin all universally quantified invariants, because the
last chunk might have fewer cells than N. We quantified the pre- and postcondi-
tion of the kernel over the chunks in the same way as the invariants.
Intra-tiling is in essence similar to inter-tiling with two major differences: 1)
the total number of threads is ceiling(T, N), and 2) each thread in the loop
iterates over cells within its own chunk. Therefore, we have different conditions
in the loop and the quantified invariants. Alpinist also supports this.
Above, each thread is assigned to one cell. This can easily be generalized
to have each thread assigned to one or more consecutive cells (i.e., a task). A
similar procedure can be applied as long as the tasks do not overlap, i.e., each
cell is assigned to at most one thread.
Alpinist: an Annotation-Aware GPU Program Optimizer 341
4.3 Kernel Fusion
Kernel fusion is a GPU optimization where we merge two or more consecutive
kernels into one. It increases the potential to use thread-local registers to store
intermediate results (see Section 2) and can lead to less power consumption.
See [2,19,61,62,68] for the impact of kernel fusion on GPU programs. We pro-
vide a generalized procedure to fuse an arbitrary number of consecutive kernels
while considering data dependency between them. The idea is to fuse them by
repeatedly fusing the first two kernels (i.e., kernel reduction). In each iteration,
if there is no data dependency between the two kernels, we safely fuse them.
Else if there is only one thread block then we fuse the two kernels by inserting
a barrier between the bodies, else fusion fails.
A benefit of this approach is that it only considers two kernels at a time.
In this way, it can be determined whether a barrier is necessary between two
specific kernels, and we do not miss any possible fusion optimization. Another
benefit of this approach is that when a data dependency between two kernels P
and P+ 1 (1 < P < #kernels1) is detected, the output of the approach is the
fusion of the first Pkernels, and the remaining unfused kernels after P. This
allows the user to not only find out that there is a data dependency between P
and P+ 1, but also to obtain fused kernels where possible.
There are multiple challenges in this transformation: (1) how to detect data
dependency between two kernels? (2) how to collect the pre- and postconditions
for the fused kernel? and (3) how to deal with permissions so that in the fused
kernel the permission for a location does not exceed 1? The main difficulty in
addressing these challenges is that we have to consider many different possible
scenarios. Fortunately, we can use the information from the contract of the two
kernels. The permission patterns in the contract indicate for each thread which
locations it reads from and writes to. We provide procedures to separately collect
pre- and postconditions related to permissions and to functional correctness. Due
to space limitations, we only discuss the essential steps to collect the precondition
related to permissions for array accesses of the fused kernel in Alg. 1. Collecting
the rest of the contract uses a similar procedure.
Alg. 1requires kernels k1 and k2 to not lose any permissions, only possibly
redistribute them (using a barrier). Furthermore, for ease of presentation, we
assume that in both k1 and k2, each thread accesses at most one cell of array a,
and that the expressions used to compute array indices only combine constants
and thread ID variables, using standard arithmetic operators.
We compare the postcondition of k1 and the precondition of k2 (l.2) to
understand how to add permissions of the preconditions of k1 and k2 to the
precondition of the fused kernel. Note that prePerm and postPerm correspond
to a permission-related pre- and postcondition, respectively. We use the post-
condition of k1 for this comparison since the permission at the end of k1 needs
to be sufficient to satisfy the precondition of k2. If the index expressions e1 and
e2 to access an array aare syntactically the same, then they refer to the same
array cell. In that case, we first add to the precondition of the fused kernel the
original permission from the precondition of k1 that corresponds to the permis-
342 ¨
O. S¸akar et al.
Algorithm 1 Kernel fusion procedure for collecting precondition permissions.
1: Add all precondition permissions related to non-shared arrays (i.e., accessed by only one of the
two kernels) into the contract of the fused kernel kf.
2: for each shared array awith a permission postPerm(a[e1], p1) in the postcondition of the first
kernel k1 and a permission prePerm(a[e2], p2) in the precondition of the second kernel k2 do
3: if patterns e1 and e2 are syntactically the same then
4: Add pre. of k1 corresponding to postPerm(a[e1], p1) as pre. to kf
5: if p1 <p2 then
6: Add prePerm(a[e2], p2-p1) as pre. to kf
7: else if patterns e1 and e2 are not syntactically the same then
8: if p1 +p2 1then
9: Add pre. of k1 corresp. to postPerm(a[e1], p1) and prePerm(a[e2], p2) as pre. in kf
10: else if p1 +p2 >1 && p1 <1 && p2 <1then
11: Add pre. of k1 corresp. to postPerm(a[e1], p1) with permission p3 and prePerm(a[e2],
12: p4) as pre. s.t. p3 +p4 == 1
13: else if p1 == 1 (i.e., write) then Data dependency, add barrier
14: Add pre. of k1 corresponding to postPerm(a[e1], p1) as pre. to kf
15: else p2 == 1 Data dependency, add barrier
16: Add pre. of k1 corresponding to postPerm(a[e1], p1) as pre. to kf
17: Add prePerm(a[e2], 1-p1) as pre. to kf
sion for a[e1] in the postcondition of k1 (remember that the latter permission
may have been obtained in k1 after permission redistribution). Second, if p1 is
not sufficient for the precondition of k2 (l.5), we add additional permission to
the precondition of the fused kernel to satisfy the precondition of k2 (l.6).
The remaining different cases in the algorithm correspond to the different
edge cases that we should consider when e1 and e2 are not syntactically the
same. In particular, data dependency happens when the accumulated permission
(in both kernels) for one location is greater than 1, and there is at least one write
permission. Therefore, we have to distinguish multiple cases: 1) p1 +p2 does not
exceed 1 (l.8), 2) p1 +p2 exceeds 1, but no write permission is involved (l.10),
or 3) and 4) at least one write is involved (l.13 and l.15). In the latter two cases,
a barrier must be introduced to take care of distributing permissions from the
access in k1 to the access in k2, and possibly additional permission for the latter
must be added to the precondition of the fused kernel (l.17). After constructing
the contract of the fused kernel, we check for data dependency.
Fig. 10 shows an example of fusing two kernels. We only present the per-
mission precondition expressions which are collected with Alg. 1. There are two
shared arrays aand b. To collect permission preconditions in the fused kernel,
we follow steps {l.2l.3l.4}for array aand steps {l.2l.3l.4l.5l.6}for
array b. As there is no data dependency, we can safely fuse the two kernels.
Implementing Data Dependency Detection. One of the implementation chal-
lenges of kernel fusion is to check data dependency in the applicability checking
phase. Our idea of detecting kernel dependencies is similar to detecting loop
iteration dependencies, see [1]. To detect data dependency for a specific shared
array, the function SV is used. Fig. 11 shows an example of the output of SV. The
kernel has 1\2permission for a[tid+1] and 1\3permission for a[0] if tid+1 is
out of bounds. SV takes an array name and the pre- and postconditions of a ker-
nel (of the form cond(tid) => Perm(a[patt(tid)], p)) on l.3-l.6, and returns
a mapping from indices patt(tid) to the permissions p(in Fig. 11: right).
Alpinist: an Annotation-Aware GPU Program Optimizer 343
1void Host(...){
2par Kernel1(tid1 = 0..T)
3/*@ context Perm(a[tid1], 1);
4context Perm(b[tid1], 1\2);@*/
5{ a[tid1] = 2*b[tid1]; }
6par Kernel2(tid2 = 0..T)
7/*@ context Perm(a[tid2], 1\2);
8context Perm(b[tid2], 1);@*/
9{ b[tid2] = a[tid2]+1; } }
=
1void Host(...){
2par Fused_Kernel(tid = 0..T)
3/*@ req Perm(a[tid], 1);
4req Perm(b[tid], 1\2);
5req Perm(b[tid], 1\2);@*/
6{ a[tid] = 2*b[tid];
7b[tid] = a[tid]+1; } }
Fig. 10: An example of collecting preconditions in fusing two kernels.
1void Host(...){
2par Kernel1(tid1 = 0..T)
3/*@ context (tid != a.length-1 =>
4Perm(a[(tid + 1)], 1\2));
5context (tid == a.length-1 =>
6Perm(a[0], 1\3)); @*/
7{... } }
=
Output SV(a, spec kernel)
index 0 1 2 3 4
permission 1
3
1
2
1
2
1
2
1
2
Fig. 11: Example output of the SV function for array a.
If the function SV is executed for two kernels to fuse with the same shared
array a, the results SV1(a) and SV2(a) can be compared to determine whether
there is data dependency between the two kernels. This comparison is described
generally at l.8-l.16 in Algorithm 1. For each corresponding location in SV1(a)
and SV2(a), we can determine, for example, whether both permissions combined
do not exceed 1 (l.8) or whether the location in k1 has write permission (l.12).
4.4 Other Optimizations
We briefly discuss the three remaining optimizations supported by Alpinist.
Iteration merging is an optimization technique related to loop unrolling that
is applicable to both GPU and CPU programs6. Iteration merging reduces the
number of loop iterations by extending the loop body with multiple copies of it,
as opposed to creating copies of it outside the loop, as is done in loop unrolling.
Iteration merging can have a positive performance impact; see [38,46,53] for the
effectiveness of this optimization on GPU programs.
Matrix linearization is an optimization where we transform two-dimensional
arrays into one dimension ones. This optimization can result in better memory
access patterns, thereby improving caching. See [5,13,54] for the impact of matrix
linearization on GPU programs.
The last optimization implemented in Alpinist is data prefetching. Suppose
there is a verified GPU program where each thread accesses an array location
in global memory multiple times. In this optimization, we prefetch the values
of those locations that are in global memory into registers which are local to
each thread. A similar optimization, in which intermediate results are stored in
register memory, is applied in Section 2. Therefore, instead of multiple accesses
to the high latency global memory, we benefit from low-latency registers. Data
prefetching can have a positive performance impact; see [4,58,70].
6Iteration merging is also referred to as loop unrolling/vectorization in the literature.
344 ¨
O. S¸akar et al.
Table 1: A summary of the optimization and verification times for all optimizations.
Optimization Optim. time (s) Verif. time (orig.) (s) Verif. time (opt.) (s)
min. max. avg. med. min. max. avg. med. min. max. avg. med.
Loop unrolling 0.067 0.238 0.116 0.098 7.6 50.7 18.2 14.3 7.6 57.5 20.8 17.3
Tiling 0.044 0.052 0.048 0.047 16.7 21.5 18.7 18.1 19.3 31.4 24.7 20.8
Kernel fusion 0.099 0.338 0.173 0.137 16.7 54.5 24.6 20.0 14.9 22.3 19.0 19.5
Iteration merging 0.042 0.592 0.152 0.097 6.9 51 17.0 12.7 7.3 64 20.0 13.8
Matrix linearization 0.011 0.044 0.022 0.017 11.6 16 14.3 14.1 11.5 16.8 14.4 15.1
Data prefetching 0.010 0.068 0.051 0.053 9.7 23 14.0 13.4 10.4 23 13.5 12.7
5 Evaluation
This section describes the evaluation of Alpinist. The goal is to
Q1 test whether Alpinist works on GPU programs.
Q2 investigate how long it takes for Alpinist to transform GPU programs and
how this affects the verification time.
Q3 investigate the usability of Alpinist on real-world complex examples.
5.1 Experiment Setup
Alpinist is evaluated on examples from three different sources. The first source
consists of hand-made examples that cover different scenarios for each optimiza-
tion. The second source is a collection of verified programs from VerCors’ ex-
ample repository7. The third source consists of complex case studies that are
already verified in VerCors: two parallel prefix sum algorithms [51], parallel
stream compaction and summed-area table algorithms [48], a variety of sort-
ing algorithms [49], a solution [27] to the VerifyThis 2019 challenge 1 [18] and a
Tic-Tac-Toe example [57] based on [23]. In total, we applied the optimizations
30 times in the first category, 23 times in the second category and 17 times in
the third category (in total 70 experiments). All the examples are annotated
with special optimization annotations such that Alpinist can apply those op-
timizations automatically. All these examples are publicly available at [15]. All
the experiments were conducted on a MacBook Pro 2020 (macOS 11.3.1) with
a 2.0GHz Intel Core i5 CPU. Each experiment was performed ten times, af-
ter which the average times, i.e., optimization and verification times, of those
executions were recorded for the experiment.
5.2 Results & Discussion
Q1 To test whether Alpinist works on GPU programs, we applied the six
optimizations in all 70 experiments and used VerCors to reverify all the resulting
programs. All these tests were successful.
Q2 To investigate how long it takes for Alpinist to transform GPU programs,
we recorded the transformation time for each optimization applied to all the
7The example repository of VerCors is available at https://github.com/utwente-fmt/
vercors/tree/dev/examples.
Alpinist: an Annotation-Aware GPU Program Optimizer 345
Table 2: An overview of optimizing case studies, where #is the unroll factor (for
loop unrolling) or the merge factor (for iteration merging), OT the time it takes to
optimize, VB the original verification time (Verification Before) and VA the optimized
verification time (Verification After). All times are in seconds.
Case Loop unrolling Iter. merging Matrix lin. Data pref.
# OT VB VA # OT VB VA OT VB VA OT VB VA
BubbleSort [49] 1 0.101 25.4 27.3 4 0.170 29.8 34.1 N/A N/A N/A N/A N/A N/A
InsertionSort [49] 1 0.134 25.6 25.8 3 0.225 24.1 28.0 N/A N/A N/A N/A N/A N/A
SelectionSort [49] 1 0.107 23.5 25.7 2 0.592 22.8 27.7 N/A N/A N/A N/A N/A N/A
TimSort [49] 2 0.216 29.3 38.5 3 0.182 29.1 37.9 N/A N/A N/A N/A N/A N/A
Blelloch [51] 1 0.129 50.7 57.5 3 0.355 51.0 64.0 N/A N/A N/A N/A N/A N/A
Kogge-Stone [51] 1 0.238 23.0 25.6 2 0.082 21.8 25.6 N/A N/A N/A 0.103 23.0 23.0
TicTacToe [57] 3 0.106 19.8 21.0 2 0.076 17.3 19.6 N/A N/A N/A N/A N/A N/A
VerifyThis [27] 1 0.144 26.2 28.7 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
Transpose [48]N/A N/A N/A N/A N/A N/A N/A N/A 0.022 16.0 16.0 N/A N/A N/A
examples. Table 1summarizes the best and worst optimization times for the
six optimizations (as reported by Alpinist). To investigate the impact on the
verification time, the table also shows the (best and worst) verification times of
the original and optimized programs (as reported by VerCors). The table shows
the minimum, maximum, average and median times of all examples. It can be
observed that Alpinist takes insignificant time to apply each optimization to
all the examples. Moreover, the verification time after optimizing generally in-
creases. For loop unrolling, tiling and iteration merging, the verification time
increases. This can be attributed to the additional code that is generated. For
kernel fusion, the verification time decreases. This is due to verifying fewer ker-
nels. For matrix linearization and data prefetching, the verification time slightly
increases. This can be attributed to the linear expressions in matrix linearization
and the extra statements to read from/write to the registers in data prefetching.
Q3 To investigate the usability of Alpinist on real-world examples, we suc-
cessfully applied it on the third category with the complex case studies. Table 2
shows the optimization and verification times of applying loop unrolling, iter-
ation merging, matrix linearization and data prefetching to these case studies.
Note that in the case studies only these four optimizations could be applied. In
the table, N/A indicates that the optimization is not applicable to the example.
6 Related Work
To the best of our knowledge, this is the first paper to showcase a tool that
implements annotation-aware transformations. We categorize the related work
into three parts, covering both tools and optimizations.
Automatic Optimizations without Correctness. There is a large body of related
work, see e.g., [2,4,19,25,28,47,61,62,6870], that shows the impact of auto-
mated optimizations on GPU programs, but does not consider correctness, or
the preservation of it. Our tool can potentially complement these approaches by
preserving the provability of the optimized programs.
346 ¨
O. S¸akar et al.
Correctness Proofs for Transformations. Another body of related work focuses
on different approaches to preserve provability not specific to GPU programs.
COMPCert [30,31] is a formally verified C compiler which preserves semantic
equivalence of the source and compiled program, by proving correctness of each
transformation in the compilation process. Wijs and Engelen [66] and De Putter
and Wijs [45] prove the preservation of functional properties over transformations
on models of concurrent systems. They prove preservation of model-independent
properties. This approach differs from ours as they work on models instead of
concrete programs.
Compiler Optimization Correctness. Finally, there is related work that focusses
on the compilation of sequential programs, performing transformations from
high-level source code to lower-level machine code while preserving the seman-
tics. These approaches neither consider parallelization, nor target different ar-
chitectures. In GPU programming, the optimizations often need to be applied
manually rather than during the compilation process.
Namjoshi and Xu [41] use a proof checker to show equivalence between an
original WebAssembly program and optimized program. An equivalence proof is
generated based on the transformations. Namjoshi and Singhania [40] created a
semi-automatic loop optimizer with user-directives. The loops are verified during
compilation. For each transformation, semantics are defined to guarantee seman-
tical equivalence to the original program. Namjoshi and Pavlinovic [39] focus on
recovering from precision loss due to semantics-preserving program transforma-
tions and propose systematic approaches to simplify analysis of the transformed
program. Finally Gjomemo et al. [20] help compiler optimizations by supplying
high-level information gathered by external static analysis (e.g., Frama-C). This
information is used by the compiler for better reasoning.
7 Conclusion
In this paper, we presented Alpinist, the annotation-aware GPU program opti-
mizer. Given an unoptimized, annotated GPU program, we showed how Alpin-
ist transforms both the code and the annotations, with the goal to preserve the
provability of the optimized GPU program. Alpinist supports loop unrolling,
tiling, kernel fusion, iteration merging, matrix linearization and data prefetch-
ing, of which the first three are discussed in detail. We discussed the design and
implementation of Alpinist, and we validated it by verifying a set of examples
and reverifying their optimized counterparts.
For future work, there are other optimizations that could be supported, such
as data prefetching for all memory patterns as mentioned by Ayers et al. [4].
Another open question is if and how this approach can be used in program
compilation. We also plan to extend this approach to preserve the provability
of transpiled code, e.g., CUDA to OpenCL conversions. Moreover, we plan to
investigate how Alpinist can be combined with techniques such as autotuning
that automatically detect the potential for applying specific optimizations and
identify optimal parameter configurations [3,63].
Alpinist: an Annotation-Aware GPU Program Optimizer 347
References
1. Allen, R., Kennedy, K.: Automatic translation of Fortran programs to vector form.
ACM Transactions on Programming Languages and Systems (TOPLAS) 9(4), 491–
542 (1987)
2. Ashari, A., Tatikonda, S., Boehm, M., Reinwald, B., Campbell, K., Keenleyside,
J., Sadayappan, P.: On optimizing machine learning workloads via kernel fusion.
ACM SIGPLAN Notices 50(8), 173–182 (2015)
3. Ashouri, A., Killian, W., Cavazos, J., Palermo, G., Silvano, C.: A Survey on Com-
piler Autotuning using Machine Learning. ACM Computing Surveys 51(5), 96:1–
96:42 (2018)
4. Ayers, G., Litz, H., Kozyrakis, C., Ranganathan, P.: Classifying memory access pat-
terns for prefetching. In: Proceedings of the Twenty-Fifth International Conference
on Architectural Support for Programming Languages and Operating Systems. pp.
513–526 (2020)
5. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA.
Tech. rep., Citeseer (2008)
6. Berdine, J., Calcagno, C., O’Hearn, P.: Smallfoot: Modular Automatic Asser-
tion Checking with Separation Logic. In: de Boer, F., Bonsangue, M., Graf, S.,
de Roever, W. (eds.) FMCO. LNCS, vol. 4111, pp. 115–137. Springer (2005)
7. Bertolli, C., Betts, A., Mudalige, G., Giles, M., Kelly, P.: Design and Perfor-
mance of the OP2 Library for Unstructured Mesh Applications. In: Proceed-
ings of the 1st Workshop on Grids, Clouds and P2P Programming (CGWS).
Lecture Notes in Computer Science, vol. 7155, pp. 191–200. Springer (2011).
https://doi.org/10.1007/978-3-642-29737-3 22
8. Betts, A., Chong, N., Donaldson, A., Qadeer, S., Thomson, P.: GPUVerify: a ver-
ifier for GPU kernels. In: OOPSLA. pp. 113–132. ACM (2012)
9. Blom, S., Darabi, S., Huisman, M., Oortwijn, W.: The VerCors Tool Set: Verifi-
cation of Parallel and Concurrent Software. In: iFM. LNCS, vol. 10510, pp. 102
110. Springer (2017)
10. Blom, S., Huisman, M., Mihelˇci´c, M.: Specification and Verification of GPGPU
programs. Science of Computer Programming 95, 376–388 (2014)
11. Bornat, R., Calcagno, C., O’Hearn, P., Parkinson, M.: Permission accounting in
separation logic. In: Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium
on Principles of programming languages (POPL). pp. 259–270 (2005)
12. Boyland, J.: Checking Interference with Fractional Permissions. In: SAS. LNCS,
vol. 2694, pp. 55–72. Springer (2003)
13. Catanzaro, B., Keller, A., Garland, M.: A decomposition for in-place matrix trans-
position. ACM SIGPLAN Notices 49(8), 193–206 (2014)
14. Collingbourne, P., Cadar, C., Kelly, P.H.: Symbolic testing of OpenCL code. In:
Haifa Verification Conference. pp. 203–218. Springer (2011)
15. S¸akar, O., Safari, M., Huisman, M., Wijs, A.: The repository for the examples used
in Alpinist,https://github.com/OmerSakar/Alpinist- Examples.git
16. S¸akar, O., Safari, M., Huisman, M., Wijs, A.: The repository for the
implementations of Alpinist,https://github.com/utwente-fmt/vercors/tree/
gpgpu-optimizations/src/main/java/vct/col/rewrite/gpgpuoptimizations
17. DeFrancisco, R., Cho, S., Ferdman, M., Smolka, S.: Swarm Model Checking on
the GPU. International Journal on Software Tools for Technology Transfer 22,
583–599 (2020). https://doi.org/10.1007/s10009-020-00576-x
348 ¨
O. S¸akar et al.
18. Dross, C., Furia, C.A., Huisman, M., Monahan, R., uller, P.: Verifythis 2019:
a program verification competition. International Journal on Software Tools for
Technology Transfer pp. 1–11 (2021)
19. Filipoviˇc, J., Madzin, M., Fousek, J., Matyska, L.: Optimizing CUDA code by
kernel fusion: application on BLAS. The Journal of Supercomputing 71(10), 3934–
3957 (2015)
20. Gjomemo, R., Namjoshi, K.S., Phung, P.H., Venkatakrishnan, V., Zuck, L.D.: From
verification to optimizations. In: International Workshop on Verification, Model
Checking, and Abstract Interpretation. pp. 300–317. Springer (2015)
21. Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.:
Auto-tuning a High-Level Language Targeted to GPU Codes. In: Proc.
2012 Innovative Parallel Computing (InPar). pp. 1–10. IEEE (2012).
https://doi.org/10.1109/InPar.2012.6339595
22. van den Haak, L., Wijs, A., M.G.J. van den Brand, Huisman, M.: Formal
Methods for GPGPU Programming: Is The Demand Met? In: Proceedings of
the 16th International Conference on Integrated Formal Methods (IFM 2020).
Lecture Notes in Computer Science, vol. 12546, pp. 160–177. Springer (2020).
https://doi.org/10.1007/978-3-030-63461-2 9
23. Hamers, R., Jongmans, S.S.: Safe sessions of channel actions in Clojure: a tour of
the discourje project. In: International Symposium on Leveraging Applications of
Formal Methods. pp. 489–508. Springer (2020)
24. Herrmann, F., Silberholz, J., Tiglio, M.: Black Hole Simulations with CUDA. In:
GPU Computing Gems Emerald Edition, chap. 8, pp. 103–111. Morgan Kaufmann
(2011)
25. Hong, C., Sukumaran-Rajam, A., Nisa, I., Singh, K., Sadayappan, P.: Adaptive
sparse tiling for sparse matrix multiplication. In: Proceedings of the 24th Sympo-
sium on Principles and Practice of Parallel Programming. pp. 300–314 (2019)
26. Huisman, M., Blom, S., Darabi, S., Safari, M.: Program correctness by transfor-
mation. In: 8th International Symposium On Leveraging Applications of Formal
Methods, Verification and Validation (ISoLA). LNCS, vol. 11244. Springer (2018)
27. Huisman, M., Joosten, S.: A solution to VerifyThis 2019
challenge 1, https://github.com/utwente-fmt/vercors/blob/
97c49d6dc1097ded47a5ed53143695ace6904865/examples/verifythis/2019/
challenge1.pvl
28. Konstantinidis, A., Kelly, P.H., Ramanujam, J., Sadayappan, P.: Parametric GPU
code generation for affine loop programs. In: International Workshop on Languages
and Compilers for Parallel Computing. pp. 136–151. Springer (2013)
29. Le, Q., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.: On Optimization
Methods for Deep Learning. In: Proceedings of the 28th International Conference
on Machine Learning (ICML). pp. 265–272. Omnipress (2011)
30. Leroy, X.: Formal certification of a compiler back-end or: programming a compiler
with a proof assistant. In: Conference record of the 33rd ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages. pp. 42–54 (2006)
31. Leroy, X.: A formally verified compiler back-end. Journal of Automated Reasoning
43(4), 363–446 (2009)
32. Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel func-
tions. In: SIGSOFT FSE 2010, Santa Fe, NM, USA. pp. 187–196. ACM (2010)
33. Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE:
concolic verification and test generation for GPUs. In: ACM SIGPLAN Notices.
vol. 47, pp. 215–224. ACM (2012)
Alpinist: an Annotation-Aware GPU Program Optimizer 349
34. Lindholm, L., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A Uni-
fied Graphics and Computing Architecture. IEEE Micro 28(2), 39–55 (2008).
https://doi.org/10.1109/MM.2008.31
35. Liu, X., Tan, S., Wang, H.: Parallel Statistical Analysis of Analog Circuits by
GPU-Accelerated Graph-Based Approach. In: Proceedings of the 2012 Conference
and Exhibition on Design, Automation & Test in Europe (DATE). pp. 852–857.
IEEE Computer Society (2012). https://doi.org/10.1109/DATE.2012.6176615
36. de Moura, L.M., Bjørner, N.: Z3: An efficient SMT solver. In: Ramakrishnan, C.,
Rehof, J. (eds.) TACAS. LNCS, vol. 4963, pp. 337–340. Springer (2008)
37. uller, P., Schwerhoff, M., Summers, A.: Viper - a verification infrastructure for
permission-based reasoning. In: VMCAI (2016)
38. Murthy, G.S., Ravishankar, M., Baskaran, M.M., Sadayappan, P.: Optimal loop
unrolling for GPGPU programs. In: 2010 IEEE International Symposium on Par-
allel & Distributed Processing (IPDPS). pp. 1–11. IEEE (2010)
39. Namjoshi, K.S., Pavlinovic, Z.: The impact of program transformations on static
program analysis. In: International Static Analysis Symposium. pp. 306–325.
Springer (2018)
40. Namjoshi, K.S., Singhania, N.: Loopy: Programmable and formally verified
loop transformations. In: International Static Analysis Symposium. pp. 383–402.
Springer (2016)
41. Namjoshi, K.S., Xue, A.: A Self-certifying Compilation Framework for WebAssem-
bly. In: International Conference on Verification, Model Checking, and Abstract
Interpretation. pp. 127–148. Springer (2021)
42. The OpenCL 1.2 specification (2011)
43. Osama, M., Wijs, A.: Parallel SAT Simplification on GPU Architectures. In:
TACAS, Part I. LNCS, vol. 11427, pp. 21–40. Springer (2019)
44. Osama, M., Wijs, A., Biere, A.: SAT Solving with GPU Accelerated Inprocess-
ing. In: Proceedings of the 27th International Conference on Tools and Algo-
rithms for the Construction and Analysis of Systems (TACAS), Part I. Lec-
ture Notes in Computer Science, vol. 12651, pp. 133–151. Springer (2021).
https://doi.org/10.1007/978-3-030-72016-2 8
45. de Putter, S., Wijs, A.: Verifying a verifier: on the formal correctness of an LTS
transformation verification technique. In: International Conference on Fundamen-
tal Approaches to Software Engineering. pp. 383–400. Springer (2016)
46. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.:
Halide: a language and compiler for optimizing parallelism, locality, and recompu-
tation in image processing pipelines. Acm Sigplan Notices 48(6), 519–530 (2013)
47. Rocha, R.C., Pereira, A.D., Ramos, L., oes, L.F.: Toast: Automatic tiling for
iterative stencil computations on GPUs. Concurrency and Computation: Practice
and Experience 29(8), e4053 (2017)
48. Safari, M., Huisman, M.: Formal verification of parallel stream compaction and
summed-area table algorithms. In: International Colloquium on Theoretical As-
pects of Computing. pp. 181–199. Springer (2020)
49. Safari, M., Huisman, M.: A generic approach to the verification of the permutation
property of sequential and parallel swap-based sorting algorithms. In: International
Conference on Integrated Formal Methods. pp. 257–275. Springer (2020)
50. Safari, M., Oortwijn, W., Huisman, M.: Automated verification of the parallel
Bellman–Ford algorithm. In: Dr˘agoi, C., Mukherjee, S., Namjoshi, K. (eds.) Static
Analysis. pp. 346–358. Springer International Publishing, Cham (2021)
51. Safari, M., Oortwijn, W., Joosten, S., Huisman, M.: Formal verification of parallel
prefix sum. In: NASA Formal Methods Symposium. pp. 170–186. Springer (2020)
350 ¨
O. S¸akar et al.
52. S¸akar, O.: Extending support for axiomatic data types in vercors (April 2020),
http://essay.utwente.nl/80892/
53. Shimobaba, T., Ito, T., Masuda, N., Ichihashi, Y., Takada, N.: Fast calculation of
computer-generated-hologram on AMD HD5000 series GPU and OpenCL. Optics
express 18(10), 9955–9960 (2010)
54. Sundfeld, D., Havgaard, J.H., Gorodkin, J., De Melo, A.C.: CUDA-Sankoff: using
GPU to accelerate the pairwise structural RNA alignment. In: 2017 25th Euromicro
International Conference on Parallel, Distributed and Network-based Processing
(PDP). pp. 295–302. IEEE (2017)
55. The CUDA team: Documentation of the CUDA unroll pragma (Accessed Oct
6, 2021), https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#
pragma-unroll
56. The Halide team: Documentation of the Halide unroll function (Ac-
cessed Oct 6, 2021), https://halide-lang.org/docs/class halide 1 1 func.html#
a05935caceb6efb8badd85f306dd33034
57. The verification of tictactoe program, https://github.com/utwente-fmt/vercors/
blob/0a2fdc24419466c2d3b7a853a2908c37e7a8daa7/examples/session-generate/
MatrixGrid.pvl
58. Unkule, S., Shaltz, C., Qasem, A.: Automatic restructuring of GPU kernels for ex-
ploiting inter-thread data locality. In: International Conference on Compiler Con-
struction. pp. 21–40. Springer (2012)
59. Van Werkhoven, B., Maassen, J., Bal, H.E., Seinstra, F.J.: Optimizing convolution
operations on GPUs using adaptive tiling. Future Generation Computer Systems
30, 14–26 (2014)
60. Viper project website: (2016), http://www.pm.inf.ethz.ch/research/viper
61. Wahib, M., Maruyama, N.: Scalable kernel fusion for memory-bound GPU applica-
tions. In: SC’14: Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis. pp. 191–202. IEEE (2014)
62. Wang, G., Lin, Y., Yi, W.: Kernel fusion: An effective method for better power
efficiency on multithreaded GPU. In: 2010 IEEE/ACM Int’l Conference on Green
Computing and Communications & Int’l Conference on Cyber, Physical and Social
Computing. pp. 344–350. IEEE (2010)
63. Werkhoven, B.v.: Kernel Tuner: A search-optimizing GPU code auto-tuner. Future
Generation Computer Systems 90, 347–358 (2019)
64. Wienke, S., Springer, P., Terboven, C., Mey, D.: OpenACC - First Experiences
with Real-World Applications. In: Proceedings of the 18th European Conference
on Parallel and Distributed Computing (EuroPar). Lecture Notes in Computer
Science, vol. 7484, pp. 859–870. Springer (2012). https://doi.org/10.1007/978-3-
642-32820-6 85
65. Wijs, A.: BFS-Based Model Checking of Linear-Time Properties With An Appli-
cation on GPUs. In: CAV, Part II. LNCS, vol. 9780, pp. 472–493. Springer (2016)
66. Wijs, A., Engelen, L.: REFINER: Towards Formal Verification of Model Transfor-
mations. In: NFM. LNCS, vol. 8430, pp. 258–263. Springer (2014)
67. Wijs, A., Neele, T., Boˇsnaˇcki, D.: GPUexplore 2.0: Unleashing GPU Explicit-State
Model Checking. In: Proceedings of the 21st International Symposium on Formal
Methods. Lecture Notes in Computer Science, vol. 9995, pp. 694–701. Springer
(2016). https://doi.org/10.1007/978-3-319-48989-6 42
68. Wu, H., Diamos, G., Wang, J., Cadambi, S., Yalamanchili, S., Chakradhar, S.:
Optimizing data warehousing applications for GPUs using kernel fusion/fission.
In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Workshops & PhD Forum. pp. 2433–2442. IEEE (2012)
Alpinist: an Annotation-Aware GPU Program Optimizer 351
69. Xu, C., Kirk, S.R., Jenkins, S.: Tiling for performance tuning on different models
of GPUs. In: 2009 Second International Symposium on Information Science and
Engineering. pp. 500–504. IEEE (2009)
70. Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory opti-
mization and parallelism management. ACM Sigplan Notices 45(6), 86–97 (2010)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
352 ¨
O. S¸akar et al.
Automatic Repair for Network Programs
Lei Shi1, Yuepeng Wang2, Rajeev Alur1, and Boon Thau Loo1
1University of Pennsylvania, Philadelphia, USA
2Simon Fraser University, Burnaby, Canada
{shilei,alur,boonloo}@seas.upenn.edu yuepeng@sfu.ca
Abstract. Debugging imperative network programs is a difficult task
for operators as it requires understanding various network modules and
complicated data structures. For this purpose, this paper presents an au-
tomated technique for repairing network programs with respect to unit
tests. Given as input a faulty network program and a set of unit tests,
our approach localizes the fault through symbolic reasoning, and synthe-
sizes a patch ensuring that the repaired program passes all unit tests. It
applies domain-specific abstraction to simplify network data structures
and exploits function summary reuse for modular symbolic analysis. We
have implemented the proposed techniques in a tool called NetRep and
evaluated it on 10 benchmarks adapted from real-world software-defined
network controllers. The evaluation results demonstrate the effectiveness
and efficiency of NetRep for repairing network programs.
1 Introduction
Emerging tools for program synthesis and repair facilitate automation of pro-
gramming tasks in various domains. For example, in the domain of end-user
programming, synthesis techniques allow users without any programming expe-
rience to generate scripts from examples for extracting, wrangling, and manip-
ulating data in spreadsheets [13,40]. In computer-aided education, repair tech-
niques are capable of providing feedback on programming assignments to novice
programmers and help them improve programming skills [49,14]. In software
development, synthesis and repair techniques aim to reduce the manual efforts
in various tasks, including code completion [43,10], application refactoring [42],
program parallelization [8], bug detection [11,41], and patch generation [11,32].
As an emerging domain, Software-Defined Networking (SDN) offers the in-
frastructure for monitoring network status and managing network resources
based on programmable software, replacing traditional specialized hardware in
communication devices. Since SDN provides an opportunity to dynamically mod-
ify the traffic handling policies on programmable routers, this technology has
witnessed growing industrial adoption. However, using SDNs involves many pro-
gramming tasks that are inevitably susceptible to programmer errors leading to
bugs [3,23]. For example, a device with incorrect routing policies could forward a
packet to undesired destinations, and a buggy firewall rule may make the entire
network system vulnerable to security threats.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 353–372, 2022.
https://doi.org/10.1007/978-3-030-99527-0_19
In the SDN framework, a logically centralized control plane generates rules
that are installed into data planes, which in turn decides the routing of packets
throughout the network. While network verification is a well-studied field where
operators can be hinted on incorrectly installed rules [3,4,22], little prior work
has explored the problem of automatically repairing the corresponding bug in the
control plane, especially those written in widely used general-purpose languages
such as Java or Python. Existing work mostly restricts the target to control
plane programs written in domain-specific languages such as Datalog [51,17].
Since networks cannot tolerate even small mistakes, and most network oper-
ators are not trained in programming skills, debugging and repair tools in this
domain should prioritize accuracy and automation. This means that many exist-
ing techniques for general program repair are not suitable to this domain as they
trade off accuracy for heuristics for scaling with the size of analyzed programs
and number of discovered potential bugs.
Motivated by the demand for automated repair and the limitations of ex-
isting techniques, we develop a precise and scalable program repair technique
for network programs. Specifically, our repair technique takes as input a net-
work program and a set of unit tests, reveals the program location that causes
the test failure, and automatically generates a patch to fix the program. In the
setting of SDN, a unit test corresponds to an incorrectly installed routing rule
generated by the control plane from a reported packet. Such unit tests can be
discovered by a separate network verification procedure [3,4,22].
Our main idea is to use symbolic reasoning using constraints capturing the
semantics of the program for accurate repair, and modular analysis to improve
the efficiency. We extended the encoding techniques from prior work [21,12] to
support object-oriented features in Java. We also developed a new approach to
focus the analysis on one function at a time and gradually narrow down the range
of faulty statements along with the specification for the expected behavior.
The proposed technique is implemented in an automatic network program
repair tool called NetRep. To evaluate NetRep, we adapt 10 benchmarks from
real-world faulty network programs in Floodlight that require changing up to 3
lines of code to fix and apply NetRep to repair the benchmarks automatically.
The experimental results show that NetRep is able to find a repair that passes
all unit tests for faulty programs up to 738 lines of code for 8 benchmarks using
2 or 3 test cases, outperforming a state-of-the-art repair tool for general Java
programs. Furthermore, NetRep is efficient in terms of repair time, requiring
only an average running time of 744 seconds across all benchmarks.
Contributions. We make the following main contributions in this paper:
We present an automated program repair technique that aims to help net-
work operators debug and fix network controller programs automatically.
We describe a bug localization approach based on symbolic execution and
constraint solving for programs with imperative object-oriented features such
as virtual function calls.
We propose novel modular analysis techniques to effectively scale up the
symbolic reasoning for automatic repair.
354 L. Shi et al.
1@network public class MacAddr {
2private long v al u e ;
3private MacAddr(long v) { v a l u e = v ; }
4public static MacAddr N O N E = n e w M ac Ad dr ( 0) ;
5public static M ac A d dr of ( long v ) { return new M a cA d dr ( v ) ;}
6.. . }
7public class FirewallRule {
8public M ac Ad d r d l_ d st ; p u b l ic b o o le a n a n y_ d l_ ds t ;
9public FirewallRule() {
10 dl_dst = MacAddr . NONE ; a n y _d l_dst = true; . . . }
11 public boole a n i sS a me A s ( Fi r ew a ll R ul e r ) {
12 if ( . . . | | a n y _ d l _ ds t != r. an y_ dl _dst
13 || ( a n y_ d l_ d st = = false &&
14 dl_dst != r.dl_dst)) {
15 r et ur n f a ls e ; }
16 r et u rn tr u e ; }
17 .. . }
Fig. 1: Code snippet about a bug in Floodlight.
1public boo l e a n test(long mac1 , long ma c 2 ) {
2FirewallRule r1 = new FirewallRule();
3r1 . d l_ d s t = M ac A d dr . o f ( ma c 1 ) ; r1 . a n y_ d l _ ds t = f a ls e ;
4FirewallRule r2 = new FirewallRule();
5r2 . d l_ d s t = M ac A d dr . o f ( ma c 2 ) ; r2 . a n y_ d l _ ds t = f a ls e ;
6return r1 . i s Sa m e As ( r 2 ); }
Fig. 2: Unit test that reveals the bug in FirewallRule.
We develop a tool called NetRep based on the proposed techniques and
evaluate it using 10 benchmarks adapted from real-world network programs.
The evaluation results demonstrate that NetRep is effective for bug local-
ization and able to generate correct patches for realistic network programs.
2 Overview
In this section, we give a high-level overview of our repair techniques and
walk through the NetRep tool using an example adapted from the Floodlight
SDN controller [9].
Figure 1shows a simplified code snippet about firewall rules in Floodlight.
Specifically, the program consists of two classes FirewallRule and MacAddr.
The FirewallRule class describes rules enforced by the firewall, including infor-
mation about source and destination mac addresses. The MacAddr class is an
auxiliary data structure that stores the raw value of mac addresses 3.
The network program shown in Figure 1is problematic because the isSameAs
function compares two mac addresses using the != operator rather than a nega-
tion of the equals functions. The != operator only compares two objects based
on their memory addresses, whereas the intent of the developer is to check if two
mac addresses have the same raw value. The bug is revealed by the unit test
in Figure 2, then confirmed and fixed by the Floodlight developers 4. Next, let
3A unique 48-bit number that identifies each network device.
4https://github.com/floodlight/floodlight/commit/4d528e4bf5f02c59347bb9c0beb1b875ba2c821e
Automatic Repair for Network Programs 355
us illustrate how NetRep localizes this bug based on unit tests test(1, 2) =
false and test(1, 1) = true and automatically synthesizes a patch to fix it.
At a high level, NetRep enters a loop that iteratively attempts to find the
fault location and synthesize the patch. Since our repair technique works in a
modular fashion, NetRep first selects a function Fin the program and tries
to repair each possible fault location at a time. If NetRep cannot synthesize a
patch consistent with the provided unit tests for any potential fault location in
F, it backtracks and selects the next function and repeats the same process until
all possible functions are checked. We now describe the experience of running
NetRep on our illustrative example.
Iteration 1. NetRep selects the constructor of FirewallRule as the target func-
tion. Fault localization determines that the fault is located at the dl dst =
MacAddr.NONE part of Line 10, because it is related to the equality checking in
the unit test. However, it is not the fault location. NetRep tries to synthesize
a patch that passes all unit tests to replace this statement, but fails.
Iteration 2. NetRep selects the same function constructor of FirewallRule,
but the fault localization switches to a different statement any dl dst = true
at Line 10. Similar to Iteration 1, the synthesizer cannot generate a correct patch
by replacing this statement.
Iteration 3. Since none of the statements in the constructor is the fault loca-
tion, NetRep now selects a different function: isSameAs. The fault localization
determines that any dl dst = false at Line 13 may be the fault location as it
may affect the testing results. However, having tried to replace the statement
with many other candidate statements, e.g., r.any dl dst = false,any dl dst
= true, the synthesizer still fails to generate the correct patch.
Last iteration. Finally, after several attempts to localize the fault, NetRep
identifies the fault lies in dl dst != r.dl dst at Line 14, which is indeed the
reported bug location. At this time, the synthesizer manages to generate a correct
patch !dl dst.equals(r.dl dst). Replacing the original condition at Line 14
with this patch results in a program that can pass all the provided test cases, so
NetRep has successfully repaired the original faulty program.
3 Preliminaries
In this section, we present the language of network programs and describe a
program formalism that is used in the rest of paper. We also define the program
repair problem that we want to solve.
3.1 Language of Network Programs
The language of network programs considered in this paper is summarized in
Figure 3. A network program consists of a set of classes, where each class has an
optional annotation @network to denote that the class can benefit from network
domain-specific abstraction.
356 L. Shi et al.
Prog P::= C+Stmt s::= l:= e|jmp (e)L
Class C::= @network?class C{a+F+} | ret v|x:= new C
Func F::= function f(x1,...,xn) (L:s)+|x:= C.f (v1,...,vn)
Expr e::= l|c|op(e1,...,en)|x:= y.f (v1,...,vn)
LValue l::= x|x.a |x[v]Imm v::= x|c
x, y Variable cConstant LLineID
CClassName f, f0FuncName aFieldName
Fig. 3: Syntax of network programs.
Each class in the program consists of a list of fields and functions. Each func-
tion has a name, a parameter list, and a function body. The function body is a
list of statements, where each statement is labeled with its line number. Various
kinds of statements are included in our language of network programs. Specif-
ically, assign statement l:= eassigns expression eto left value l. Conditional
jump statement jmp (e)Lfirst evaluates predicate e. If the result is true, then
the control flow jumps to line L; otherwise, it performs no operation. Note that
our language does not have traditional if statements or loop statements, but
those statements can be expressed using conditional jumps. 5
Return statement ret vexits the current function with return value v. New
statement x:= new Ccreates an object of class Cand assigns the object address
to variable x. Static call x:= C.f(v1, . . . , vn) invokes the static function fin class
Cwith arguments v1, . . . , vnand assigns the return value to variable x. Similarly,
virtual call x:= y.f (v1, . . . , vn) invokes the virtual function fon receiver object
ywith arguments v1, . . . , vnand assigns the return value to variable x. Different
kinds of expressions are supported including constants, variable accesses, field
accesses, array accesses, arithmetic operations, and logical operations. Since the
semantics of network programs is similar to that of traditional programs written
in object-oriented languages, we omit the formal description of semantics.
In addition, we assume each statement in the program is labeled with a
globally unique line number, and line numbers are consecutive within a function.
3.2 Problem Statement
We assume a unit test tis written in the form of a pair (I , O), where Iis the
input and Ois the expected output. Given a network program Pand a unit
test t= (I, O), we say Ppasses the test tif executing Pon input Iyields the
expected output O, denoted by JPKI=O. Otherwise, if JPKI=O, we say P
fails the test t. In general, given a network program Pand a set of unit tests E,
program Pis faulty modulo Eif there exists a test t E such that Pfails on t.
Now let us turn the attention to the meaning of fault locations and patches.
Definition 1 (Fault location and patch). Let Pbe a program that is faulty
modulo tests E. Line Lis cal led the fault location of P, if there exists a statement
5Our repair techniques only handle bounded loops. If there are unbounded loops in
the network program, we need to perform loop unrolling.
Automatic Repair for Network Programs 357
Algorithm 1 Modular Program Repair
1: procedure Repair(P,E)
Input: Program P, examples E
Output: Repaired program Por to indicate failure
2: P Abstraction(P);
3: V {L7→ false |LLines(P)};P ;
4: while P=do
5: FSelectFunction(P,V);
6: if F=then return ;
7: V,PRepairFunction(P, F, E,V);
8: return P;
9: procedure RepairFunction(P, F, E,V)
Input: Program P, function F, examples E, visited map V
Output: Updated visited map V, repaired program P
10: P ;
11: while P=do
12: LLocalizeFault(P, F, E,V);
13: if L=then
14: V V [L7→ true];
15: else
16: V V [L7→ true |TransInFunc(L,P, F )];
17: if L=or IsCallStmt(P, L)then return V,;
18: PSynthesizePatch(P,E, F, L);
19: return V,P;
ssuch that replacing line Lof Pwith syields a new program that can pass all
tests in E. Here, the statement sis cal led a patch to P.
Problem statement. Given a network program Pthat is faulty modulo tests
E, our goal is to find a fault location Lin Pand generate the corresponding
patch s, such that for any unit test t E, the patched program Pcan always
pass the test t.
4 Modular Program Repair
In this section, we present our algorithm for automatically repairing network
programs from a set of unit tests.
4.1 Algorithm Overview
The top-level repair algorithm is described in Algorithm 1. The Repair proce-
dure takes as input a faulty network program Pand unit tests Eand produces
as output a repaired program Por to indicate repair failure.
At a high level, the Repair procedure maintains a visited map Vfrom line
numbers to boolean values, representing whether each line of Pis checked or not.
358 L. Shi et al.
The Repair procedure first applies the domain-specific abstraction to program
P(Line 2) and initializes the visited map Vby setting every line in Pas not
checked (Line 3). Next, it tries to iteratively repair Pin a modular way until it
finds a program Pthat is not faulty modulo tests E(Lines 4 8). In particular,
the Repair procedure invokes SelectFunction to choose a function Fas the target
of repair (Line 5). If none of the functions in Pcan be repaired, it returns
to indicate that the repair procedure failed (Line 6). Otherwise, it invokes the
RepairFunction procedure (Line 7) to enter the localization-synthesis loop
inside the target function F.
In addition to the program Pand tests E, the RepairFunction procedure
takes as input a target function Fand the current visited map V. It produces as
output the updated version of the visited map V, as well as a repaired program
Por to indicate that the function Fcannot be repaired. As shown in Lines
11 18 of Algorithm 1,RepairFunction alternatively invokes sub-procedures
LocalizeFault and SynthesizePatch to repair the target function. In particular, the
goal of LocalizeFault is to identify a fault location in function F. If LocalizeFault
manages to find a fault location Lin F, then line Lis marked as visited (Line
14). Otherwise, if LocalizeFault returns , it means function Fand all functions
transitively invoked in Fare correct or not repairable. In this case, all lines in
Fand its transitive callees are marked as checked (Line 16). Furthermore, if
the identified fault location Lcorresponds to a statement that invokes F, it
means the fault location is inside F. Thus, RepairFunction directly returns
(Line 17) and SelectFunction will choose Fas the target function in the next
iteration. On the other hand, the goal of the sub-procedure SynthesizePatch is
generating a patch for function Fgiven the fault location L. If SynthesizePatch
successfully synthesizes a patch and produces a non-faulty program P, then the
entire procedure succeeds with repaired program P. Otherwise, RepairFunc-
tion backtracks with a new program location and repeat the same process.
In the rest of this section, we explain fault localization, modular analysis,
and patch synthesis in more detail.
4.2 Fault Localization
Next, we give a high-level description of our fault localization technique that
aims to find the fault location in a given program. This corresponds to the
LocalizeFault procedure in Algorithm 1. We will first show how to encode the
problem on an entire program, and then explain how the analysis can be made
modular to boost the performance.
At a high level, our fault localization technique uses a symbolic approach
by reducing the fault localization problem into a constraint solving problem. In
particular, we introduce a boolean variable for each line L, denoted by B[L], and
encode the fault localization problem as an SMT formula, such that the value
of the variable B[L] indicates whether line Lis correct or not.
Checking faulty programs. To understand how to encode the fault localiza-
tion problem, let us first explain how to encode the consistency check given a
Automatic Repair for Network Programs 359
program Pand a test case t= (I, O). Specifically, the encoded SMT formula
Φ(t) consists of three components:
1. Semantic constraints. For each line Li:si, we generate a formula Φi(S, S) to
describe the semantics of the statement si. Specifically, given a state Sthat
holds before statement si,Φi(S, S) is valid if Sis the state after executing
si. There are two parts of the constraint: the memory contents that are
changed, and the memory contents that are preserved. For example, in case
of an assignment statement, the constraint will claim that 1) the evaluation
result of the right value in state Sequals to the left value in state S, and
2) all values except for the left value are the same in Sand S.
2. Control flow integrity constraints. In order to ensure all traces satisfying the
constraint faithfully follow the control flow structure of a given program P,
we generate another set of formulae Φf. Specifically, we require that any line
of code that is executed must have exactly one predecessor and one successor
that are executed, and the branch condition in the code must be respected
when picking the successor. This guarantees that there is exactly one valid
execution trace corresponding to one test case,
3. Consistency between program and test. For the provided test case t= (I , O),
we also generate formula Φin(S0, I) and Φout(Sn, O) to ensure the program
behavior is consistent with the test. In particular, Φin(S0, I ) binds input Ito
the initial state S0and Φout(Sn, O) describes the connection between output
Oand final state Sn.
The satisfiability of formula Φ(t) indicates the result of consistency check.
If Φ(t) is satisfiable, the solver generates a feasible execution trace and an as-
signment of all intermediate states along this trace. In this case, program Pcan
pass the test tbecause there exists a valid trace following the control flow and
every pair of adjacent states in the trace is consistent with the semantics of the
corresponding statement. Otherwise, if Φ(t) is unsatisfiable, Pfails the test t.
Now to check whether Pagainst a set of unit tests E, we can conjoin the for-
mula Φ(tj) for each unit test tj E and obtain the conjunction Φ=Vtj∈E Φ(tj).
The satisfiability of formula Φindicates whether Pis faulty modulo tests E6.
Methodology of fault localization. Let Pbe a faulty program modulo E,
we know the corresponding formula Φfor consistency check is unsatisfiable.
Suppose the fault location is line Li, one key insight is that replacing the semantic
constraint Φi(S, S) with true yields a satisfiable formula. This is because true
does not enforce any constraint between the pre-state Sand post-state S, so a
previously invalid trace caused by the bug at Lbecomes valid now.
Based on this insight, we develop a methodology to find the fault location
using symbolic reasoning. Specifically, given a consistency check formula Φ, we
can obtain a fault localization formula Φby replacing the semantic constraint
Φi(S, S) with B[Li]Φi(S, S ) for every line Li, i [1, n]. Here, variable B[Li]
decides whether or not it turns the semantic constraint of Liinto true. Thus,
B[Li] = false indicates Liis a fault location.
6The encoding is described in more detail in the extended version [46].
360 L. Shi et al.
One hiccup here is that formula Φis always satisfiable and a model of Φ
can simply assign B[Li] = false for all Li. It means all lines in the program are
fault locations, which is not useful for fault localization. To address this issue,
we can add a cardinality constraint stating there are exactly Kvariables in map
Bthat can be assigned to false, which forces the constraint solver to find exactly
Kfault locations in program P.
Modular analysis. The method above can precisely compute a potential fault
location. But an obvious shortcoming is it is hard to scale. Encoding a long pro-
gram involves 1) a large number of semantic constraints, 2) many fault location
choices, as well as 3) many intermediate states to be assigned.
Notice that although a program can be arbitrarily long, developers usually
follow the design practice that every function is of limited size. Focusing on
analyzing one function at a time and recursively search for the final fault location
could be way more efficient than solving one NP-hard problem at the entire
program’s scale.
To facilitate modular analysis of a function, we need to summarize the be-
havior of its sub-modules (callee functions) and infer external specification from
its higher-level module (caller function).
The encoding method introduced above treats one line of code as a constraint
on its pre-state and post-state. To summarize the behavior of a callee function,
we aim to turn it into a similar constraint on the pre-state and post-state for the
calling statement. The inner states of this callee function should be skipped in
the encoding. We can compute such summaries of the target function’s callees
by symbolic execution. We start with a symbolic representation of the pre-state
and execute the callee function until it returns, and claim that the output state
equals the post-state. In this way, we can entirely eliminate all bug location
choices and inner state assignments in the callee function, as well as greatly
simplifying the semantic constraint.
There are two ways to infer the specification of target function. The first way
is to encode only the calling stack of the target function up until the top-level
function, where we can use the test case as the specification. All function calls
made by the target’s caller and transitive callers that are not in the stack can be
replaced by the automatically computed summary. We can also disable all fault
location choices except for lines in the target function. Another way is to infer a
possible pre-condition and post-condition of the target function. From the per-
spective of the caller, the target function is a line of code that puts an incorrect
constraint on its pre-state and post-state. After the analysis, the constraint solver
will infer a feasible pre-state and post-state assuming this incorrect constraint is
removed. This assignment can be used as the pre-condition and post-condition,
which eliminates the need to encode any caller function. Since the second ap-
proach will possibly introduce incompleteness into the analysis, we use it only to
infer a specification to synthesize the final patch, and use the first one for every
function’s analysis.
Domain-specific abstraction. A domain-specific abstraction is essentially a
function summary as discussed above. But for those repeatedly used network
Automatic Repair for Network Programs 361
classes (identified by the @network annotation), we can pre-define some more
succinct abstractions based on domain knowledge to make the analysis easier.
The abstraction A[F] of a function Fis an over-approximation of Fthat is
precise enough to characterize the behavior of F.
The abstraction is useful due to two observations. First, source code for
network programs may only be partially available due to the use of high-level
interface and native implementation. For example, when comparing the equality
between two network addresses, the getClass function is frequently used, but
its implementation depends on the runtime and is not available. To make the
analysis easier, we can instead use the following abstraction for such comparison:
A[equals] : λx. λy. (x.dtype =y.dtype x.value =y.value ),
where x.dtype denotes the dynamic type of the object x.
Second, network programs have complex operations that are challenging for
symbolic reasoning. For instance, bit manipulations are heavily used in network
data structures. While bit manipulations can improve the performance of net-
work programs, they present significant challenges for symbolic analysis due to
the encoding in the theory of bitvectors. We can give an abstraction equivalent in
correctness but simpler in the behavior, e.g., using the identity function instead
of a hash code computation.
4.3 Patch Synthesis
The last step of our repair algorithm is to generate a patch to fix the faulty
program. This corresponds to the SynthesizePatch procedure in Algorithm 1. It
can be reduced to a sketch finishing problem in program synthesis where we
replace the existing faulty line with a hole.
Our general idea is to use plain enumerative search with a depth bound in
the candidate patch’s space, but with two significant optimizations.
First, we reduce the search space with heuristics. On one hand, we only re-
place the core expression in the faulty statement with a hole to focus on the most
expressive part. To be specific, we consider changing the right-hand-side expres-
sions of assignments, conditional expressions of jump statements, return values
of return statements, and functions and arguments for function invocations. On
the other hand, we use a limited grammar to guide the search. We parameterize
all constants, variables, fields, functions, and operators over the sketch and only
instantiate constructs that are in scope. For example, given a particular sketch
with a hole, we only populate the variable set with all local and global variables
that are in scope of the hole. Also, if the hole corresponds to the conditional
expression of a if statement, we only add logical operators to the grammar.
Second, we use the local specification to guide the synthesis. Sketch comple-
tion is different from synthesizing a complete program in that the specification
is defined for the entire program. We have to repeatedly waste time on executing
the correct part of the program to verify a candidate patch. We use the tech-
nique described in the modular analysis section to generate a pre-condition and
post-condition for only the faulty line. In this way, only the generated patch
362 L. Shi et al.
needs to be executed to verify against the specification, which greatly saves time
when the program grows larger.
5 Implementation
We have implemented the proposed repair technique in a tool called NetRep.
NetRep leverages the Soot static analysis framework [26] to convert Java pro-
grams into Jimple code, which provides a succinct yet expressive set of instruc-
tions for analysis. In addition, NetRep utilizes the Rosette tool [48] to perform
symbolic reasoning for fault localization and patch synthesis. While our imple-
mentation closely follows the algorithm presented in Section 4, we also conduct
several optimizations important to improve the performance of NetRep.
Memories for different types. Since the conversion between bitvectors and in-
tegers imposes significant overhead on running time, NetRep divides the mem-
ory into one part for integers and another for bitvectors. In this design, NetRep
automatically selects the memory chunk based on the variable types. The type
checking can guarantee that no such conversion will exist.
Stack and heap. In order to reduce the number of memory operations, Ne-
tRep also divides the memory into stack and heap. As is standard, stack only
stores static data and its layout is deterministic. Therefore, stacks are imple-
mented using fixed-size vectors, and thus can be efficiently accessed for read and
write operations. On the other hand, heap stores dynamic data that are usually
not known at compile time, such as allocated objects. Since the heap size can-
not be determined beforehand, NetRep uses an uninterpreted function f(x) to
represent heaps, where xis the address and f(x) is the value stored at x.
String values. Since reasoning over string values is a challenging task and not
always necessary for repairing network programs, we simplified the representa-
tion of strings with integer values. Specifically, NetRep maps each string literal
to a unique integer and represents all string operations (e.g. concatenation) with
uninterpreted functions.
Bounded program analysis. In order to improve the repair time, NetRep
only performs bounded program analysis for fault localization and patch syn-
thesis. Namely, we unroll loops and inline functions up to Ktimes, where Kis a
predefined hyper-parameter. In this way, function summaries can be easily and
efficiently computed using symbolic execution.
6 Evaluation
To evaluate the proposed techniques, we perform experiments that are designed
to answer the following research questions:
RQ1 Is NetRep effective to repair realistic network programs?
RQ2 How efficient are the fault localization and repair techniques in NetRep?
RQ3 How helpful are modular analysis and domain-specific abstraction for repair-
ing network programs?
Automatic Repair for Network Programs 363
ID Module LOC # Funcs # Tests Succ Exp Loc Synth Total
Time (s) Time (s) Time (s)
1 DHCP 212 17 2 Yes Yes 40 117 157
2 Load Balancer 336 28 2 No No - - -
3 Firewall 262 13 2 Yes Yes 893 197 1090
4 DHCP 431 32 2 Yes Yes 95 39 134
5 Utility 809 65 2 No No - - -
6 Routing 605 44 3 Yes Yes 271 179 450
7 Utility 454 45 2 Yes Yes 39 46 85
8 Learning Switch 738 34 2 Yes No 571 595 1166
9 Database 442 17 2 Yes No 310 2139 2449
10 Link Discovery 671 46 2 Yes No 268 158 426
Table 1: Experimental results of NetRep.
RQ4 How is NetRep compared to other repair tools for Java programs?
Benchmark collection. To obtain realistic benchmarks, we crawl the commit
history of Floodlight [9], a representative open-source SDN controller in Java
that supports the OpenFlow protocol and a rich set of network functions. To
distinguish commits caused by bug repairs from those generated for non-repair
scenarios, we identify commits based on the following criteria: 1) The commit
message contains keywords about repairing bugs, e.g., “bug”, “error”, “fix”; 2)
The commit changes no more than three lines of code.
Following these criteria, we have collected 10 commits from the Floodlight
repository and adapted them into our benchmarks. Specifically, given a commit
in the repository, we take the code before the commit as the faulty network
program and the version after the commit as the ground-truth repaired program.
The code is post-processed and the parts irrelevant to the bug of interest are
removed. We also identify corresponding unit tests and modify them to directly
reveal the bug as appropriate. Each benchmark in our evaluation consists of a
faulty network program and its corresponding unit tests.
Experimental setup. All experiments are conducted on a computer with 4-core
2.80GHz CPU and 16GB of physical memory, running the Arch Linux Operating
system. We use Racket v7.7 as the compiler and runtime system of NetRep and
set a time limit of 1 hour for each benchmark.
6.1 Main Results
Our main experimental results are summarized in Table 1. The column labeled
“Module” describes the network module to which the benchmark belongs. The
next two columns labeled “LOC” and “# Funcs” show the number of lines
of source code (in Jimple) and the number of functions, respectively. The “#
Tests” column presents the number of unit tests used for fault localization and
patch synthesis. Next, the “Succ” and “Exp” columns show whether NetRep
can successfully repair the program and if the generated patch is exactly the
same as the ground-truth. Since NetRep returns the first fix that can pass all
provided test cases, the repaired programs are not necessarily the same as those
364 L. Shi et al.
expected in the ground-truth. In this case, the table will show a “Yes” in the
“Succ” column and a “No” in the “Exp” column. Finally, the last three columns
in Table 1denote the fault localization time, patch synthesis time and the total
running time of NetRep.
As shown in Table 1, there is a range of 13 to 65 functions in each benchmark
and the average number of functions is 34 across all benchmarks. Each bench-
mark has 212 809 lines of Jimple code, with the average being 496. NetRep
succeeds in repairing 8 out of 10 benchmarks. Furthermore, for 5 benchmarks
that can be successfully repaired, NetRep is able to generate exactly the same
fix as ground-truth. Given that our benchmarks cover programs from a variety
of modules of Floodlight, such as DHCP Server, Firewall, etc, we believe that
NetRep is effective to repair realistic network programs (RQ1).
We inspected the reason why NetRep fails to repair benchmarks 2 and 5.
NetRep is not able to localize the fault in benchmark 2 due to its incomplete
support for unbounded data structures with dynamic allocation such as hash
map. For Benchmark 5, NetRep is able to localize the fault but not able to
synthesize the correct patch. This is because the expected function to be invoked
has side effects with another function, which needs some improvements in the
specification checking to verify.
Regarding the efficiency, NetRep can repair 8 benchmarks in an average of
744 seconds with only 2 to 3 test cases. The fault localization time ranges from
39 seconds to 893 seconds, with 50% of the benchmarks within five minutes. The
patch synthesis time ranges from 39 seconds to 2139 seconds, with 60% of the
benchmarks within five minutes. In summary, the evaluation results show that
NetRep only takes minutes to localize bugs in a faulty program and synthesize
a correct patch based on two to three unit tests (RQ2).
6.2 Ablation Study
To explore the impact of modular analysis and domain-specific abstraction on
the proposed repair technique, we develop three variants of NetRep:
NetRep-NoMod is a variant of NetRep without modular analysis. Specif-
ically, NetRep-NoMod inlines the functions in a given program but still
uses abstractions for network data structures for fault localization and patch
synthesis.
NetRep-NoAbs is a variant of NetRep without domain-specific abstraction.
In particular, NetRep-NoAbs uses the original concrete implementation of
network functions for symbolic reasoning. If the implementation is written in
a different language, we manually translate the implementation to Java.
NetRep-NoModAbs is a variant of NetRep without modular analysis or
domain-specific abstraction. NetRep-NoModAbs simply inlines all func-
tions in the faulty program, including those in the network data structures,
and performs symbolic analysis for fault localization and patch synthesis.
To understand the impact of modular analysis and domain-specific abstrac-
tion, we run all variants on the 10 collected benchmarks. For each variant, we
Automatic Repair for Network Programs 365
0 1 2 3 4 5 6 7 8 9 10 11
0
1,000
2,000
# Solved Benchmarks
Running Time (s)
NetRep
NetRep-NoA
NetRep-NoM
NetRep-NoMA
Fig. 4: Comparing NetRep against three variants.
measure the total running time (including time for fault localization and time
for patch synthesis) on each benchmark, and order the results by running time
in increasing order. The results for all variants are depicted in Figure 4. All lines
stop at the last benchmark that the corresponding variant can solve within 1
hour time limit.
As shown in Figure 4, both NetRep-NoAbs and NetRep-NoMod can only
solve 4 out of 10 benchmarks in the evaluation, with the average running time
being 569 seconds and 610 seconds, respectively. NetRep-NoModAbs solves
the least number of benchmarks: 3 out of 10. For the ones that it can solve,
the average running time is 1165 seconds. This experiment shows that modu-
lar analysis and domain-specific abstraction are both great boost to NetRep’s
efficiency to repair network programs (RQ3).
6.3 Comparison with the Baseline
To understand how NetRep performs compared to other Java program repair
tools, we compare NetRep against a state-of-the-art tool called Jaid [5] on our
benchmarks. Specifically, Jaid takes as input a faulty Java program, a set of
unit tests, and a function signature for fault localization and patch synthesis,
a setting closest to NetRep among a variety of tools. Note that Jaid solves a
simpler repair problem than NetRep, because it requires the user to specify a
function that is potentially incorrect in the program, whereas NetRep does not
need input other than the faulty program and unit tests. In order to run Jaid
on our benchmarks, we adjust their formats to fit Jaid’s and provide the faulty
function (known from the ground truth) as input for Jaid.
Jaid will indefinitely enumerate all possible patches, rather than recommend-
ing a most correct one. We think it is successful if the expected patch can be
found among the results. In practice, human assistance is needed to pick out this
patch from the thousands of candidates.
As a result, Jaid is able to finish on 8 out of 10 benchmarks. The expected
patches are found among 2 of them, whereas NetRep can give the expected
result for 5 benchmarks on the first recommendation. For one benchmark, Jaid
is unable to fix. For another one, it runs out of memory.
We argue that NetRep is better suited for automatically repairing network
programs compared to Jaid. First, it only requires network operators to provide
366 L. Shi et al.
unit test cases. As is discussed above, they can be automatically discovered by
another verification or testing procedure. In comparison, Jaid requires users to
have skill of programming network controllers to identify the buggy function
and pick the correct patch from the results. This is beyond the ability of most
network operators and starts to require an expert team. Second NetRep has
higher repairing accuracy. As we discussed above, network is sensitive to small
mistakes. High accuracy is crucial for a network to function correctly.
In summary, NetRep is more effective in automatically fixing bugs in net-
work programs compared to state-of-the-art repairing tools for Java programs,
especially with respect to repairing accuracy and automation (RQ4).
7 Related Work
Automated program repair. Automated program repair is an active re-
search area that aims to automatically fix the mistakes in programs based on
specifications of correctness criteria [11,28,39,18], with a variety of applications
such as aiding software development [34], finding security vulnerabilities [37],
and teaching novice programmers [49,14]. Different techniques have been pro-
posed to solve the automated program repair problem, including heuristics-based
techniques [16,31], semantics-based techniques [37,27], and learning-based tech-
niques [45,30,32,47]. NetRep is a semantics-based automated repair tool. Dif-
ferent from prior work, NetRep is specialized to repair network programs based
on modular analysis and network data structure abstractions.
Fault localization. Researchers have developed various approaches to fault lo-
calization, including spectrum-based, learning-based, and constraint-based tech-
niques. Specifically, the spectrum-based techniques [27,1,2,7,44,6,19] perform
fault localization by identifying which part of program is active during a run
through execution profiles (called program spectrum). Learning-based techniques
[29,53,54] typically train machine learning models to predict and rank possible
fault locations. By contrast, constraint-based techniques [21,20,12] encode the
semantics of problems as logical constraints and reduce the fault localization
problem into constraint satisfaction problem. In spirit, NetRep uses a similar
idea for fault localization. However, NetRep performs modular analysis and
enables debugging programs involving object-oriented features, whereas prior
work only analyzes the entire program in a C-like language. Besides, NetRep
reuses the fault localization result to speedup the patch synthesis while prior
work mainly focuses on the fault localization step.
Patch synthesis. Many synthesis algorithms have been developed for generat-
ing patches, including enumerative search [27], constraint-based techniques [37],
statistical model [52], machine learning [15], hint from existing code [25], and
so on. In terms of patch synthesis, NetRep generates a context-free grammar
from the context of fault locations and performs enumerative search based on the
grammar to synthesize patches. It does not require machine learning model or
statistical information for ranking all possible patches. However, it is conceivable
that NetRep will benefit from the guidance of such ranking techniques.
Automatic Repair for Network Programs 367
Verification and synthesis for SDN. In the networking domain, several ver-
ification tools [3,33,23,24] have been proposed based on either model checking
or theorem proving. For example, VeriCon [3] performs deductive verification
to verify the correctness of SDN programs specified by network-wide invari-
ants on all admissible topologies. In addition to verification, synthesis tech-
niques [36,35,38] have also been proposed to aid software-defined networking.
NetRep aims to repair network programs automatically, which is a different
problem than SDN verification or synthesis.
Repair for network programs. Our work is most related to automated re-
pair of network programs in the SDN domain [50,51,17]. Prior work about auto-
repair [50,51] relies on using Datalog to capture the operational semantics of the
target language to be repaired. The repair techniques work for domain-specific
languages (e.g. Datalog or Ruby on Rails) with simple structure. Similarly, Ho-
jjat et al. [17] propose a framework based on horn clause repair problem to
help network operators fix faulty configurations. However, NetRep targets Java
network programs with object-oriented features and more complex constructs,
which cannot be handled by existing techniques.
8 Limitations and Future Work
We discuss several limitations of NetRep that we plan to improve in future
work. First, NetRep repairs the faulty network program with the first correct
patch that can pass all tests. A user interaction that resumes the synthesis can be
introduced in case it is not intended by the user or a more formal specification.
Second, patches that require complicated changes, e.g., those involving con-
trol flow structures, are beyond NetRep’s ability. They make up 44% of our col-
lection of bug-fixing commits. We envision that the challenge can be addressed
by introducing more sophisticated patch synthesis techniques such as searching
over a domain-specific language for edits.
Third, in order to force symbolic execution to terminate in finite time, Ne-
tRep currently unrolls all loops in the network program, which may result in
missing a potential bug. Loop invariant inference techniques can be leveraged to
overcome this challenge and still guarantee termination.
9 Conclusion
In this paper, we have proposed an automated repair technique for network
controller programs with unit tests as specifications. Our technique internally
performs symbolic reasoning for bug localization and patch synthesis, optimized
by network domain-specific abstractions and modular analysis to reduce encod-
ing size. we have implemented a tool called NetRep and evaluated it on 10
benchmarks adapted from the Floodlight framework. The experimental results
demonstrate that NetRep is effective for repairing realistic network programs
with moderate change sizes.
368 L. Shi et al.
References
1. Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: Spectrum-based multiple fault local-
ization. In: Proceedings of the IEEE/ACM International Conference on Automated
Software Engineering (ASE). pp. 88–99. IEEE Computer Society (2009)
2. Abreu, R., Zoeteweij, P., van Gemund, A.J.: On the accuracy of spectrum-based
fault localization. In: Testing: Academic and Industrial Conference Practice and
Research Techniques - MUTATION. pp. 89–98 (2007)
3. Ball, T., Bjørner, N., Gember, A., Itzhaky, S., Karbyshev, A., Sagiv, M., Schapira,
M., Valadarsky, A.: Vericon: towards verifying controller programs in software-
defined networks. In: Proceedings of the ACM SIGPLAN Conference on Program-
ming Language Design and Implementation (PLDI). pp. 282–293. ACM (2014)
4. Beckett, R., Gupta, A., Mahajan, R., Walker, D.: A general approach to network
configuration verification. In: Proceedings of the Conference of the ACM Special
Interest Group on Data Communication. pp. 155–168 (2017)
5. Chen, L., Pei, Y., Furia, C.A.: Contract-based program repair without the con-
tracts. In: Proceedings of the IEEE/ACM International Conference on Automated
Software Engineering (ASE). pp. 637–647. IEEE Computer Society (2017)
6. Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.A.: Pinpoint: Problem
determination in large, dynamic internet services. In: Proceedings of the Inter-
national Conference on Dependable Systems and Networks (DSN). pp. 595–604.
IEEE Computer Society (2002)
7. Dallmeier, V., Lindig, C., Zeller, A.: Lightweight defect localization for java.
In: Proceedings of the European Conference on Object-Oriented Programming
(ECOOP). Lecture Notes in Computer Science, vol. 3586, pp. 528–550. Springer
(2005)
8. Fedyukovich, G., Ahmad, M.B.S., Bod´ık, R.: Gradual synthesis for static paral-
lelization of single-pass array-processing programs. In: Proceedings of the 38th
ACM SIGPLAN Conference on Programming Language Design and Implementa-
tion, PLDI 2017, Barcelona, Spain, June 18-23, 2017. pp. 572–585. ACM (2017)
9. Floodlight: https://github.com/floodlight/floodlight (2021)
10. Galenson, J., Reames, P., Bod´ık, R., Hartmann, B., Sen, K.: Codehint: dynamic
and interactive synthesis of code snippets. In: Jalote, P., Briand, L.C., van der
Hoek, A. (eds.) Proceedings of the International Conference on Software Engineer-
ing (ICSE). pp. 653–663. ACM (2014)
11. Goues, C.L., Pradel, M., Roychoudhury, A.: Automated program repair. Commun.
ACM 62(12), 56–65 (2019)
12. Griesmayer, A., Bloem, R., Cook, B.: Repair of boolean programs with an ap-
plication to c. In: International Conference on Computer Aided Verification. pp.
358–371. Springer (2006)
13. Gulwani, S.: Automating string processing in spreadsheets using input-output ex-
amples. In: Ball, T., Sagiv, M. (eds.) Proceedings of the ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages (POPL). pp. 317–330. ACM
(2011)
14. Gulwani, S., Radicek, I., Zuleger, F.: Automated clustering and program repair for
introductory programming assignments. In: Proceedings of the ACM Conference on
Programming Language Design and Implementation (PLDI). pp. 465–480. ACM
(2018)
15. Gupta, R., Pal, S., Kanade, A., Shevade, S.K.: Deepfix: Fixing common C lan-
guage errors by deep learning. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of
Automatic Repair for Network Programs 369
the Thirty-First AAAI Conference on Artificial Intelligence. pp. 1345–1351. AAAI
Press (2017)
16. Harman, M.: Automated patching techniques: the fix is in: technical perspective.
Commun. ACM 53(5), 108 (2010)
17. Hojjat, H., R¨ummer, P., McClurg, J., Cern´y, P., Foster, N.: Optimizing horn solvers
for network repair. In: Piskac, R., Talupur, M. (eds.) Proceedings of the Formal
Methods in Computer-Aided Design (FMCAD). pp. 73–80. IEEE (2016)
18. Hong, S., Lee, J., Lee, J., Oh, H.: SAVER: scalable, precise, and safe memory-error
repair. In: Proceedings of the International Conference on Software Engineering
(ICSE). pp. 271–283. ACM (2020)
19. Jones, J.A., Harrold, M.J., Stasko, J.T.: Visualization of test information to assist
fault localization. In: Proceedings of the 24th International Conference on Software
Engineering, ICSE 2002, 19-25 May 2002, Orlando, Florida, USA. pp. 467–477.
ACM (2002)
20. Jose, M., Majumdar, R.: Bug-assist: Assisting fault localization in ANSI-C pro-
grams. In: Proceedings of International Conference on Computer Aided Verification
(CAV). LNCS, vol. 6806, pp. 504–509. Springer (2011)
21. Jose, M., Majumdar, R.: Cause clue clauses: error localization using maximum
satisfiability. In: Proceedings of the ACM Conference on Programming Language
Design and Implementation (PLDI). pp. 437–446. ACM (2011)
22. Kazemian, P., Varghese, G., McKeown, N.: Header space analysis: Static check-
ing for networks. In: 9th USENIX Symposium on Networked Systems Design and
Implementation (NSDI 12). pp. 113–126 (2012)
23. Khurshid, A., Zou, X., Zhou, W., Caesar, M., Godfrey, P.B.: Veriflow: Verifying
network-wide invariants in real time. In: Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI). pp. 15–27. USENIX
Association (2013)
24. Kim, H., Reich, J., Gupta, A., Shahbaz, M., Feamster, N., Clark, R.J.: Kinetic:
Verifiable dynamic network control. In: Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI). pp. 59–72. USENIX
Association (2015)
25. Kneuss, E., Koukoutos, M., Kuncak, V.: Deductive program repair. In: Interna-
tional Conference on Computer Aided Verification. pp. 217–233. Springer (2015)
26. Lam, P., Bodden, E., Lhot´ak, O., Hendren, L.: The soot framework for java pro-
gram analysis: a retrospective. In: Cetus Users and Compiler Infrastructure Work-
shop. vol. 15 (2011)
27. Le, X.D., Chu, D., Lo, D., Goues, C.L., Visser, W.: S3: syntax- and semantic-guided
repair synthesis via programming by examples. In: Bodden, E., Sch¨afer, W., van
Deursen, A., Zisman, A. (eds.) Proceedings of the Joint Meeting on Foundations
of Software Engineering, (ESEC/FSE). pp. 593–604. ACM (2017)
28. Li, G., Liu, H., Chen, X., Gunawi, H.S., Lu, S.: Dfix: automatically fixing timing
bugs in distributed systems. In: Proceedings of the ACM Conference on Program-
ming Language Design and Implementation (PLDI). pp. 994–1009. ACM (2019)
29. Li, X., Li, W., Zhang, Y., Zhang, L.: Deepfl: integrating multiple fault diagnosis
dimensions for deep fault localization. In: Zhang, D., Møller, A. (eds.) Proceed-
ings of the SIGSOFT International Symposium on Software Testing and Analysis
(ISSTA). pp. 169–180. ACM (2019)
30. Li, Y., Wang, S., Nguyen, T.N.: Dlfix: context-based code transformation learn-
ing for automated program repair. In: Proceedings of International Conference on
Software Engineering (ICSE). pp. 602–614. ACM (2020)
370 L. Shi et al.
31. Long, F., Rinard, M.: Staged program repair with condition synthesis. In: Proceed-
ings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE).
pp. 166–178. ACM (2015)
32. Long, F., Rinard, M.: Automatic patch generation by learning correct code. In:
Proceedings of the Symposium on Principles of Programming Languages (POPL).
pp. 298–312. ACM (2016)
33. Lopes, N.P., Bjørner, N., Godefroid, P., Jayaraman, K., Varghese, G.: Checking
beliefs in dynamic networks. In: Proceedings of the USENIX Symposium on Net-
worked Systems Design and Implementation (NSDI). pp. 499–512. USENIX Asso-
ciation (2015)
34. Marginean, A., Bader, J., Chandra, S., Harman, M., Jia, Y., Mao, K., Mols, A.,
Scott, A.: Sapfix: automated end-to-end repair at scale. In: Proceedings of the In-
ternational Conference on Software Engineering: Software Engineering in Practice,
ICSE (SEIP). pp. 269–278. IEEE / ACM (2019)
35. McClurg, J., Ho jjat, H., Cern´y, P.: Synchronization synthesis for network pro-
grams. In: Majumdar, R., Kuncak, V. (eds.) Proceedings of the International
conference on Computer Aided Verification (CAV). Lecture Notes in Computer
Science, vol. 10427, pp. 301–321. Springer (2017)
36. McClurg, J., Hojjat, H., Cern´y, P., Foster, N.: Efficient synthesis of network up-
dates. In: Grove, D., Blackburn, S.M. (eds.) Proceedings of the ACM SIGPLAN
Conference on Programming Language Design and Implementation (PLDI). pp.
196–207. ACM (2015)
37. Mechtaev, S., Yi, J., Roychoudhury, A.: Angelix: scalable multiline program patch
synthesis via symbolic analysis. In: Proceedings of the International Conference on
Software Engineering (ICSE). pp. 691–701. ACM (2016)
38. Padon, O., Immerman, N., Karbyshev, A., Lahav, O., Sagiv, M., Shoham, S.: De-
centralizing SDN policies. In: Proceedings of the ACM SIGPLAN-SIGACT Sympo-
sium on Principles of Programming Languages (POPL). pp. 663–676. ACM (2015)
39. Perry, D.M., Kim, D., Samanta, R., Zhang, X.: Semcluster: clustering of imperative
programming assignments based on quantitative semantic features. In: Proceedings
of the ACM Conference on Programming Language Design and Implementation
(PLDI). pp. 860–873. ACM (2019)
40. Polozov, O., Gulwani, S.: Flashmeta: a framework for inductive program synthesis.
In: Aldrich, J., Eugster, P. (eds.) Proceedings of theACM SIGPLAN International
Conference on Object-Oriented Programming, Systems, Languages, and Applica-
tions, (OOPSLA). pp. 107–126. ACM (2015)
41. Pradel, M., Sen, K.: Deepbugs: a learning approach to name-based bug detection.
Proc. ACM Program. Lang. 2(OOPSLA), 147:1–147:25 (2018)
42. Raychev, V., Sch¨afer, M., Sridharan, M., Vechev, M.T.: Refactoring with synthe-
sis. In: Hosking, A.L., Eugster, P.T., Lopes, C.V. (eds.) Proceedings of the ACM
SIGPLAN International Conference on Object Oriented Programming Systems
Languages & Applications, (OOPSLA). pp. 339–354. ACM (2013)
43. Raychev, V., Vechev, M.T., Yahav, E.: Code completion with statistical language
models. In: O’Boyle, M.F.P., Pingali, K. (eds.) Proceedings of the ACM SIGPLAN
Conference on Programming Language Design and Implementation (PLDI). pp.
419–428. ACM (2014)
44. Renieris, M., Reiss, S.P.: Fault localization with nearest neighbor queries. In: Pro-
ceedings of the IEEE International Conference on Automated Software Engineering
(ASE). pp. 30–39. IEEE Computer Society (2003)
Automatic Repair for Network Programs 371
45. Sakkas, G., Endres, M., Cosman, B., Weimer, W., Jhala, R.: Type error feed-
back via analytic program repair. In: Proceedings of the International Conference
on Programming Language Design and Implementation (PLDI). pp. 16–30. ACM
(2020)
46. Shi, L., Wang, Y., Alur, R., Loo, B.T.: NetRep: Automatic repair for network
programs. https://arxiv.org/abs/2110.06303 (2021)
47. Sidiroglou-Douskos, S., Lahtinen, E., Long, F., Rinard, M.: Automatic error elim-
ination by horizontal code transfer across multiple applications. In: Proceedings
of the ACM Conference on Programming Language Design and Implementation
(PLDI). pp. 43–54. ACM (2015)
48. Torlak, E., Bod´ık, R.: A lightweight symbolic virtual machine for solver-aided host
languages. In: Proceedings of the ACM Conference on Programming Language
Design and Implementation (PLDI). pp. 530–541. ACM (2014)
49. Wang, K., Singh, R., Su, Z.: Search, align, and repair: data-driven feedback gener-
ation for introductory programming exercises. In: Proceedings of the ACM Confer-
ence on Programming Language Design and Implementation (PLDI). pp. 481–495.
ACM (2018)
50. Wu, Y., Chen, A., Haeberlen, A., Zhou, W., Loo, B.T.: Automated network repair
with meta provenance. In: Proceedings of the ACM Workshop on Hot Topics in
Networks (HotNets). pp. 26:1–26:7. ACM (2015)
51. Wu, Y., Chen, A., Haeberlen, A., Zhou, W., Loo, B.T.: Automated bug removal
for software-defined networks. In: Proceedings of the USENIX Symposium on Net-
worked Systems Design and Implementation (NSDI). pp. 719–733. USENIX Asso-
ciation (2017)
52. Xiong, Y., Wang, J., Yan, R., Zhang, J., Han, S., Huang, G., Zhang, L.: Precise
condition synthesis for program repair. In: Proceedings of the International Con-
ference on Software Engineering (ICSE). pp. 416–426. IEEE / ACM (2017)
53. Xuan, J., Monperrus, M.: Learning to combine multiple ranking metrics for fault
localization. In: Proceedings of the IEEE International Conference on Software
Maintenance and Evolution (ICSME). pp. 191–200. IEEE Computer Society (2014)
54. Zhang, Z., Lei, Y., Tan, Q., Mao, X., Zeng, P., Chang, X.: Deep learning-based
fault localization with contextual information. IEICE Trans. Inf. Syst. 100-D(12),
3027–3031 (2017)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
372 L. Shi et al.
11th Competition on Software
Verification: SV-COMP 2022
Progress on Software Verification: SV-COMP 2022
Dirk Beyer B
LMU Munich, Munich, Germany
Abstract.
The 11th edition of the Competition on Software Verification
(SV-COMP 2022) provides the largest ever overview of tools for software
verification. The competition is an annual comparative evaluation of fully
automatic software verifiers for C and Java programs. The objective is
to provide an overview of the state of the art in terms of effectiveness
and efficiency of software verification, establish standards, provide a
platform for exchange to developers of such tools, educate PhD students
on reproducibility approaches and benchmarking, and provide computing
resources to developers that do not have access to compute clusters. The
competition consisted of
15 648
verification tasks for C programs and
586
verification tasks for Java programs. Each verification task consisted
of a program and a property (reachability, memory safety, overflows,
termination). The new category on data-race detection was introduced as
demonstration category. SV-COMP 2022 had 47 participating verification
systems from 33 teams from 11 countries.
Keywords: Formal Verification ·Program Analysis ·Competition ·
Software Verification ·Verification Tasks ·Benchmark ·C Language ·
Java Language ·SV-Benchmarks ·BenchExec ·CoVeriTeam
1
Introduction
This report is the 2022 edition of the series of competition reports (see footnote)
that accompanies the competition, by explaining the process and rules, giving
insights into some aspects of the competition (this time the focus is on trouble
shooting and reproducing results on a small scale), and, most importantly,
reporting the results of the comparative evaluation. The 11th Competition on
Software Verification (SV-COMP,
https://sv-comp.sosy-lab.org/2022
) is the largest
comparative evaluation ever in this area. The objectives of the competitions were
discussed earlier (1-4 [16]) and extended over the years (5-6 [17]):
1.
provide an overview of the state of the art in software-verification technology
and increase visibility of the most recent software verifiers,
This report extends previous reports on SV-COMP [10,11,12,13,14,15,16,17,18].
Reproduction packages are available on Zenodo (see Table 4).
Bdirk.beyer@sosy-lab.org
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 375–402, 2022.
https://doi.org/10.1007/978-3-030-99527-0_20
2.
establish a repository of software-verification tasks that is publicly available
for free use as standard benchmark suite for evaluating verification software,
3.
establish standards that make it possible to compare different verification
tools, including a property language and formats for the results,
4.
accelerate the transfer of new verification technology to industrial practice
by identifying the strengths of the various verifiers on a diverse set of tasks,
5.
educate PhD students and others on performing reproducible benchmarking,
packaging tools, and running robust and accurate research experiments, and
6.
provide research teams that do not have sufficient computing resources with
the opportunity to obtain experimental results on large benchmark sets.
The SV-COMP 2020 report [17] discusses the achievements of the SV-COMP
competition so far with respect to these objectives.
Related Competitions. There are many competitions in the area of formal
methods [9], because it is well-understood that competitions are a fair and
accurate means to execute a comparative evaluation with involvement of the
developing teams. We refer to a previous report [17] for a more detailed discussion
and give here only the references to the most related competitions [20,53,67].
Quick Summary of Changes. While we try to keep the setup of the compe-
tition stable, there are always improvements and developments. For the 2022
edition, the following changes were made:
A demonstration category on data-race detection was added. Due to several
participating verification tools, this category will become a normal main
category in SV-COMP 2023. The results are outlined in Sect. 5.
New verification tasks were added, with an increase in C from
15 201
in 2021
to
15 648
in 2022 and in Java from
473
in 2021 to
586
in 2022, combined
with ongoing efforts on quality improvement.
2
Organization, Definitions, Formats, and Rules
Procedure. The overall organization of the competition did not change in
comparison to the earlier editions [10,11,12,13,14,15,16,17,18]. SV-COMP is
an open competition (also known as comparative evaluation), where all verification
tasks are known before the submission of the participating verifiers, which is
necessary due to the complexity of the C language. The procedure is partitioned
into the benchmark submission phase, the training phase, and the evaluation
phase. The participants received the results of their verifier continuously via
e-mail (for preruns and the final competition run), and the results were publicly
announced on the competition web site after the teams inspected them.
Competition Jury. Traditionally, the competition jury consists of the chair and
one member of each participating team; the team-representing members circulate
every year after the candidate-submission deadline. This committee reviews
376 Dirk Beyer
the competition contribution papers and helps the organizer with resolving any
disputes that might occur (from competition report of SV-COMP 2013 [11]).
In more detail, the tasks of the jury consist of the following:
The jury oversees the process and ensures transparency, fairness, and com-
munity involvement.
Each jury member who participates in the competition is assigned a number
of (3 or 4) submissions (papers and systems) to review.
Participating systems are reviewed to determine whether they fulfill the
requirements for verifier archives, based on the archives submitted to the
repository.
Teams and paper submissions are reviewed to verify the requirements for
qualification, based on the submission data and paper in EasyChair and the
results of the qualification runs.
Some qualified competition candidates are selected to publish (in the LNCS
proceedings of TACAS) a contribution paper that gives an overview of the
participating system.
The jury helps the organizer with discussing and resolving any disputes that
might occur.
Jury members adhere to the deadlines with all the duties.
The team representatives of the competition jury are listed in Table 5.
License Requirements. Starting 2018, SV-COMP required that the verifier
must be publicly available for download and has a license that
(i) allows reproduction and evaluation by anybody (incl. results publication),
(ii) does not restrict the usage of the verifier output (log files, witnesses), and
(iii) allows any kind of (re-)distribution of the unmodified verifier archive.
Two exceptions were made to allow minor incompatibilities for commercial
participants: The jury felt that the rule “allows any kind of (re-)distribution of the
unmodified verifier archive” is too broad. The idea of the rule was to maximize
the possibilities for reproduction. Starting with SV-COMP 2023, this license
requirement shall be changed to “allows (re-)distribution of the unmodified
verifier archive via SV-COMP repositories and archives”.
Validation of Results. The validation of the verification results was done
by eleven validation tools, which are listed in Table 1, including references to
literature. Four new validators support the competition:
There are two new validators for the C language: Dartagnan
new
supports
result validation for violation witnesses in category
ConcurrencySafety-Main
.
Symbiotic-Witch
new
supports result validation for violation witnesses in
categories ReachSafety,MemSafety, and NoOverflows.
For the first time, there are validators for the Java language: GWit
new
and
Wit4Java
new
support result validation for violation witnesses in category
ReachSafety-Java.
Progress on Software Verification: SV-COMP 2022 377
Table 1: Tools for witness-based result validation (validators) and witness linter
Validator Reference Representative Affiliation
CPAchecker [25,26,28] Thomas Bunk LMU Munich, Germany
CPA-w2t [27] Thomas Lemberger LMU Munich, Germany
Dartagnan new [89] Hernán Ponce de León Bundeswehr U., Germany
CProver-w2t [27] Michael Tautschnig Queen Mary U. London, UK
GWit new [68] Falk Howar TU Dortmund U., Germany
MetaVal [33] Martin Spiessl LMU Munich, Germany
NitWit [105] Jana (Philipp) Berger RWTH Aachen, Germany
Symbiotic-Witch new [6] Paulína Ayaziová Masaryk U., Brno, Czechia
UAutomizer [25,26] Daniel Dietsch U. of Freiburg, Germany
Wit4Java new [108] Tong Wu U. of Manchester, UK
WitnessLint Sven Umbricht LMU Munich, Germany
Table 2: Scoring schema for SV-COMP 2022 (unchanged from 2021 [18])
Reported result Points Description
Unknown 0Failure to compute verification result
False correct +1 Violation of property in program was correctly found
and a validator confirmed the result based on a witness
False incorrect 16 Violation reported but property holds (false alarm)
True correct +2 Program correctly reported to satisfy property
and a validator confirmed the result based on a witness
True incorrect 32 Incorrect program reported as correct (wrong proof)
Task-Definition Format 2.0. SV-COMP 2022 used the task-definition format
in version 2.0. More details can be found in the report for Test-Comp 2021 [19].
Properties. Please see the 2015 competition report [13] for the definition of the
properties and the property format. All specifications used in SV-COMP 2022
are available in the directory
c/properties/
of the benchmark repository.
Categories. The (updated) category structure of SV-COMP 2022 is illustrated by
Fig. 1. The categories are also listed in Tables 8,9, and 10, and described in detail
on the competition web site (
https://sv-comp.sosy-lab.org/2022/benchmarks.php
).
Compared to the category structure for SV-COMP 2021, we added the sub-
category
Termination-BitVectors
to category
Termination
and the sub-category
SoftwareSystems-BusyBox-ReachSafety
to category
SoftwareSystems
.
Scoring Schema and Ranking. The scoring schema of SV-COMP 2022 was the
same as for SV-COMP 2021. Table 2 provides an overview and Fig. 2 visually illus-
trates the score assignment for the reachability property as an example. As before,
the rank of a verifier was decided based on the sum of points (normalized for meta
categories). In case of a tie, the rank was decided based on success run time, which
is the total CPU time over all verification tasks for which the verifier reported
378 Dirk Beyer
Arrays
BitVectors
ControlFlow
ECA
Floats
Heap
Loops
ProductLines
Recursive
Sequentialized
XCSP
Combinations
ReachSafety
Arrays
Heap
LinkedList
Other
MemCleanup
Juliet
MemSafety
Main
ConcurrencySafety
BitVectors
Other
NoOverflows
BitVectors
MainControlFlow
MainHeap
Other
Termination
AWS-C-Common ReachSafety
BusyBox ReachSafety
BusyBox MemSafety
BusyBox NoOverflows
DeviceDriversLinux64 ReachSafety
DeviceDriversLinux64Large ReachSafety
OpenBSD MemSafety
uthash MemSafety
uthash NoOverflows
uthash ReachSafety
SoftwareSystems
C-Overall
Java-Overall
C-FalsificationOverall
Fig. 1: Category structure for SV-COMP 2022; category
C-FalsificationOverall
contains all verification tasks of
C-Overall
without
Termination
;
Java-Overall
contains all Java verification tasks; compared to SV-COMP 2021, there is one
new sub-category in
Termination
and one new sub-categories in
SoftwareSystems
Progress on Software Verification: SV-COMP 2022 379
TASK
VERIFIER
true-unreach
VERIFIER
false-unreach
WITNESS_VALIDATOR
true
0
unknown
-16
false
2
true (witness confirmed)
0
unconfirmed (false, unknown, or ressources exhausted)
0
invalid (error in witness syntax)
-32
true
0
unknown
WITNESS_VALIDATOR
false 0
invalid (error in witness syntax)
0
unconfirmed (true, unknown, or ressources exhausted)
1
false (witness confirmed)
Fig. 2: Visualization of the scoring schema for the reachability property (un-
changed from 2021 [18])
(a) Verification Task
(e) Verification Run
(b) Benchmark Definition (c) Tool-Info Module (d) Verifier Archive
FALSE UNKNOWN TRUE
(f) Violation
Witness
(f) Correctness
Witness
Fig. 3: Benchmarking components of SV-COMP and competition’s execution flow
(same as for SV-COMP 2020)
a correct verification result. Opt-out from Categories and Score Normalization
for Meta Categories was done as described previously [11] (page 597).
Reproducibility. SV-COMP results must be reproducible, and consequently,
all major components are maintained in public version-control repositories. The
overview of the components is provided in Fig. 3, and the details are given
in Table 3. We refer to the SV-COMP 2016 report [14] for a description of all
components of the SV-COMP organization. There are competition artifacts at
Zenodo (see Table 4) to guarantee their long-term availability and immutability.
Competition Workflow. The workflow of the competition is described in the re-
port for Test-Comp 2021 [19] (SV-COMP and Test-Comp use a similar workflow).
380 Dirk Beyer
Table 3: Publicly available components for reproducing SV-COMP 2022
Component Fig. 3 Repository Version
Verification Tasks (a) gitlab.com/sosy-lab/benchmarking/sv-benchmarks svcomp22
Benchmark Definitions (b) gitlab.com/sosy-lab/sv-comp/bench-defs svcomp22
Tool-Info Modules (c) github.com/sosy-lab/benchexec 3.10
Verifier Archives (d) gitlab.com/sosy-lab/sv-comp/archives-2022 svcomp22
Benchmarking (e) github.com/sosy-lab/benchexec 3.10
Witness Format (f ) github.com/sosy-lab/sv-witnesses svcomp22
Table 4: Artifacts published for SV-COMP 2022
Content DOI Reference
Verification Tasks 10.5281/zenodo.5831003 [22]
Competition Results 10.5281/zenodo.5831008 [21]
Verifiers and Validators 10.5281/zenodo.5959149 [24]
Verification Witnesses 10.5281/zenodo.5838498 [23]
BenchExec 10.5281/zenodo.5720267 [106]
3
Reproducing a Verification Run and
Trouble-Shooting Guide
In the following we explain a few steps that are useful to reproduce individual
results and for trouble shooting. It is written from the perspective of a participant.
Step 1: Make Verifier Archive Available. The first action item for a partici-
pant is to submit a merge request to the repository that contains all the verifier
archives (see list of merge requests at GitLab). Typical problems include:
The fork is not public. This means that the continuous integration (CI)
pipeline results are not visible and the merge request cannot be merged.
The shared runners are not enabled. This means that the CI pipeline cannot
run and no results will be available.
Verifier does not provide a version string (and this should not include the
verifier name itself). This means that it is not possible to later determine
which version of the verifier was used for the experiments. Therefore, version
strings are mandatory and are checked by the CI.
The interface between the execution (with BenchExec) and the verifica-
tion tool can be checked using the procedure decribed in the BenchExec
documentation.1
Step 2: Ensure That Verifier Works on Competition Machines. Once the
CI checks passed and the archive is merged into the official competition repository,
the verifier can be executed on the competition machines on a few verification
1https://github.com/sosy-lab/benchexec/blob/3.10/doc/tool-integration.md
Progress on Software Verification: SV-COMP 2022 381
tasks. The competition uses the infrastructure VerifierCloud, and remote
execution in this compute cloud is possible using CoVeriTeam [29]. CoVeriTeam
is a tool for constructing cooperative verification tools from existing components,
and the competition is supported by this project since SV-COMP 2021. Among
its many capabilities, it enables remote execution of verification runs directly
on the competition machines, which was found to be a valuable service for
trouble shooting. A description and example invokation for each participating
verifier is available in the CoVeriTeam documentation (see file doc/competition-
help.md in the CoVeriTeam repository). Competition participants are asked
to execute their tool locally using CoVeriTeam and then remotely on the
competition machines. Typical problems include:
Verifiers sometimes have insufficient log output, such that it is not possible
to observe what the verifier was executing. The first step towards trouble
shooting is always to ensure some minimal log output.
The verifier assumes software that is not installed yet. Each verifier states its
dependencies in its documentation. For example, the verifier Cbmc specifies
under
required-ubuntu-packages
that is relies on the Ubuntu packages
gcc
and
libc6-dev-i386
in file benchmark-defs/category-structure.yml in
the repository with the benchmark definitions. This is easy to fix by adding
the dependency in the definition file and get it installed.
The verifier makes assumptions about the hardware of the machine, e.g.,
selecting a specific processing unit. This can be investigated by running the
verifier in the Docker container and remotely on the competition machines.
For the above-mentioned purpose, the competition offers a Docker image
that can be used to try out if all required dependencies are available.2
The competition also provides a list of installed packages, which is important
for ensuring reproducibility.
Step 3: Check Prerun Results. So far, we considered executing individ-
ual verification runs in the Docker container or remotely on the competition
machines. As a service to the participating teams, the competition offers train-
ing runs and provides the results to the teams. Typical checks that teams
perform on the prerun results include:
Inspect the verification results (solution to the verification task, like True,
False,Unknown, etc.) and log files.
Inspect the validation results (was the verification result confirmed by a
validator) and the produced verification witnesses.
Inspect the result of the witness linter. All witnesses should be syntactically
correct according to the witness specification.
In case the verification result does not match the expected result, investigate
the verifier and the verification task; in case of problems with the verification
task, report to the jury by creating a merge request with a fix or an issue for
discussion in the SV-Benchmarks repository.
2https://gitlab.com/sosy-lab/benchmarking/competition-scripts/-/tree/svcomp22
382 Dirk Beyer
Table 5: Competition candidates with tool references and representing jury members;
new for first-time participants, for hors-concours participation
Participant Ref. Jury member Affiliation
2ls [36,81] Viktor Malík BUT, Brno, Czechia
AProVE [65,100] Jera Hensel RWTH Aachen, Germany
Brick [37] Lei Bu Nanjing U., China
Cbmc [75] Michael Tautschnig Queen Mary U. of London, UK
Coastal[102] (hors concours)
CVT-AlgoSel new[29,30] (hors concours)
CVT-ParPort new [29,30] (hors concours)
CPA-BAM-BnB[3,104] (hors concours)
CPA-BAM-SMGnew Anton Vasilyev ISP RAS, Russia
CPAchecker [31,49] Thomas Bunk LMU Munich, Germany
CPALockator[4,5] (hors concours)
Crux new [52,96] Ryan Scott Galois, USA
CSeq [47,71] Emerson Sales Gran Sasso Science Institute, Italy
Dartagnan [58,88] Hernán Ponce de León U. Bundeswehr Munich, Germany
Deagle new [62] Fei He Tsinghua U., China
Divine[8,76] (hors concours)
Ebf new Fatimah Aljaafari U. of Manchester, UK
Esbmc-incr[43,46] (hors concours)
Esbmc-kind [56,57] Rafael Menezes U. of Manchester, UK
Frama-C-SV [34,48] Martin Spiessl LMU Munich, Germany
Gazer-Theta[1,60] (hors concours)
GDart new [84] Falk Howar TU Dortmund, Germany
Goblint [95,103] Simmo Saan U. of Tartu, Estonia
Graves-CPAnew [79] Will Leeson U. of Virginia, USA
Infer new[38,73] (hors concours)
Java-Ranger [98,99] Soha Hussein U. of Minnesota, USA
JayHorn [72,97] Ali Shamakhi Tehran Inst. Adv. Studies, Iran
Jbmc [44,45] Peter Schrammel U. of Sussex / Diffblue, UK
JDart [80,83] Falk Howar TU Dortmund, Germany
Korn [55] Gidon Ernst LMU Munich, Germany
Lartnew [77,78] Henrich Lauko Masaryk U., Brno, Czechia
Lazy-CSeq[69,70] (hors concours)
Locksmith new [90] Vesal Vojdani U. of Tartu, Estonia
PeSCo [93,94] Cedric Richter U. of Oldenburg, Germany
Pinaka[41] (hors concours)
PredatorHP[66,87] (hors concours)
Sesl new Xie Li Academy of Sciences, China
Smack[61,92] (hors concours)
Spf[85,91] (hors concours)
Symbiotic [39,40] Marek Chalupa Masaryk U., Brno, Czechia
Thetanew [101,109] Vince Molnár BME Budapest, Hungary
UAutomizer [63,64] Matthias Heizmann U. of Freiburg, Germany
UGemCutter new [74] Dominik Klumpp U. of Freiburg, Germany
UKojak [54,86] Frank Schüssele U. of Freiburg, Germany
UTaipan [51,59] Daniel Dietsch U. of Freiburg, Germany
VeriAbs [2,50] Priyanka Darke Tata Consultancy Services, India
VeriFuzz [42,82] Raveendra Kumar M. Tata Consultancy Services, India
Progress on Software Verification: SV-COMP 2022 383
Table 6: Algorithms and techniques that the participating verification systems used;
new for first-time participants, for hors-concours participation
Verifier
CEGAR
Predicate Abstraction
Symbolic Execution
Bounded Model Checking
k-Induction
Property-Directed Reach.
Explicit-Value Analysis
Numeric. Interval Analysis
Shape Analysis
Separation Logic
Bit-Precise Analysis
ARG-Based Analysis
Lazy Abstraction
Interpolation
Automata-Based Analysis
Concurrency Support
Ranking Functions
Evolutionary Algorithms
Algorithm Selection
Portfolio
2ls 3 3 3 3 3 3
AProVE 3 3 3 3 3 3
Brick 3 3 3 3 3
Cbmc 3 3 3
Coastal3
CVT-AlgoSel new3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
CVT-ParPortnew 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
CPA-BAM-BnB3 3 3 3 3 3 3
CPA-BAM-SMG new
CPALockator3 3 3 3 3 3 3 3
CPAchecker 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Crux new 3
CSeq 3 3 3
Dartagnan 3 3 3
Deagle new
Divine3 3 3 3 3 3
Ebf new 3
Esbmc-incr3 3 3 3
Esbmc-kind 3 3 3 3 3
Frama-C-SV 3
Gazer-Theta3 3 3 3 3 3 3 3 3
GDart new 3 3 3
Goblint 3 3
Graves-CPA new 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Infer new3 3 3 3
Java-Ranger 3 3
JayHorn 3 3 3 3 3 3
Jbmc 3 3 3
JDart 3 3 3
Korn 3 3 3 3
Lart new 3 3 3 3
Lazy-CSeq3 3 3
(continues on next page)
384 Dirk Beyer
Verifier
CEGAR
Predicate Abstraction
Symbolic Execution
Bounded Model Checking
k-Induction
Property-Directed Reach.
Explicit-Value Analysis
Numeric. Interval Analysis
Shape Analysis
Separation Logic
Bit-Precise Analysis
ARG-Based Analysis
Lazy Abstraction
Interpolation
Automata-Based Analysis
Concurrency Support
Ranking Functions
Evolutionary Algorithms
Algorithm Selection
Portfolio
Locksmith new 3
PeSCo 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Pinaka3 3 3
PredatorHP3
Sesl new 3 3
Smack3 3 3 3
Spf3 3 3
Symbiotic 3 3 3 3 3 3 3
Theta new 3 3 3 3 3 3 3 3 3
UAutomizer 3 3 3 3 3 3 3 3 3 3
UGemCutter new 3 3 3 3 3 3 3 3 3
UKojak 3 3 3 3 3
UTaipan 3 3 3 3 3 3 3 3 3 3 3
VeriAbs 3 3 3 3 3 3 3 3
VeriFuzz 3 3 3
4
Participating Verifiers
The participating verification systems are listed in Table 5. The table contains
the verifier name (with hyperlink), references to papers that describe the systems,
the representing jury member and the affiliation. The listing is also available on
the competition web site at
https://sv-comp.sosy-lab.org/2022/systems.php
.Table 6
lists the algorithms and techniques that are used by the verification tools, and
Table 7 gives an overview of commonly used solver libraries and frameworks.
Hors-Concours Participation. There are verification tools that participated
in the comparative evaluation, but did not participate in the rankings. We call
this kind of participation hors concours as these participants cannot participate
in rankings and cannot “win” the competition. Those are either passive or active
participants. Passive participation means that the tools are taken from previous
years of the competition, in order to show progress and compare new tools against
them (Coastal
,CPA-BAM-BnB
,CPALockator
,Divine
,Esbmc-incr
,
Gazer-Theta
,Lazy-CSeq
,Pinaka
,PredatorHP
,Smack
,Spf
). Active
paritication means that there are teams actively developing the tools, but there
are reasons why those tools should not occur in the rankings. For example, a
Progress on Software Verification: SV-COMP 2022 385
Table 7: Solver libraries and frameworks that are used as components in the participating
verification systems (component is mentioned if used more than three times;
new
for
first-time participants, for hors-concours participation
Verifier
CPAchecker
CProver
Esbmc
Jpf
Ultimate
JavaSMT
MathSAT
Cvc4
SMTinterpol
z3
MiniSAT
2ls 3 3
AProVE 3 3
Brick 3 3
Cbmc 3 3
Coastal3
CVT-AlgoSel new3 3 3 3 3 3 3
CVT-ParPortnew 3 3 3 3 3 3 3
CPA-BAM-BnB3 3 3
CPA-BAM-SMG new 3 3 3
CPALockator3 3 3
CPAchecker 3 3 3
Crux new 3
CSeq 3 3
Dartagnan 3
Deagle new
Divine
Ebf new 3 3
Esbmc-incr3 3
Esbmc-kind 3 3
Frama-C-SV
Gazer-Theta
GDart new 3
Goblint
Graves-CPA new 3 3 3
Infer new
Java-Ranger 3
JayHorn
Jbmc 3 3
JDart 3
Korn
Lart new 3
Lazy-CSeq3 3
Locksmith new
PeSCo 3 3 3
(continues on next page)
386 Dirk Beyer
Verifier
CPAchecker
CProver
Esbmc
Jpf
Ultimate
JavaSMT
MathSAT
Cvc4
SMTinterpol
z3
MiniSAT
Pinaka
PredatorHP
Sesl new
Smack
Spf3
Symbiotic 3
Theta new
UAutomizer 3 3 3 3 3
UGemCutter new 3 3 3 3 3
UKojak 3 3 3
UTaipan 3 3 3 3 3
VeriAbs 3 3 3 3
VeriFuzz 3
tool might use other tools that participate in the competition on their own,
and comparing such a tool in the ranking could be considered unfair (CVT-
AlgoSel
new
,CVT-ParPort
new
). Also, a tool might produce uncertain results
and the team was not sure if the full potential of the tool can be shown in the
SV-COMP experiments (Infer
new
). Those participations are marked as ‘hors
concours’ in Table 5 and others, and the names are annotated with a symbol (
).
5
Results and Discussion
The results of the competition represent the the state of the art of what can
be achieved with fully automatic software-verification tools on the given bench-
mark set. We report the effectiveness (number of verification tasks that can
be solved and correctness of the results, as accumulated in the score) and
the efficiency (resource consumption in terms of CPU time and CPU energy).
The results are presented in the same way as in last years, such that the im-
provements compared to last year are easy to identify, except that due to the
number of tools, we have to split the table and put the hors-concours verifiers
into a second results table. The results presented in this report were inspected
and approved by the participating teams.
Computing Resources. The resource limits were the same as in the previous
competitions [14]: Each verification run was limited to 8 processing units (cores),
15 GB of memory, and 15 min of CPU time. Witness validation was limited
to 2 processing units, 7 GB of memory, and 1.5min of CPU time for violation
witnesses and 15 min of CPU time for correctness witnesses. The machines
Progress on Software Verification: SV-COMP 2022 387
Table 8: Quantitative overview over all regular results; empty cells are used for opt-outs,
new for first-time participants
Verifier
ReachSafety
8631 points
5400 tasks
MemSafety
5003 points
3321 tasks
ConcurrencySafety
1160 points
763 tasks
NoOverflows
685 points
454 tasks
Termination
4144 points
2293 tasks
SoftwareSystems
5898 points
3417 tasks
FalsificationOverall
5718 points
13355 tasks
Overall
25209 points
15648 tasks
JavaOverall
828 points
586 tasks
2ls 3585 810 0 428 2178 83 1462 7366
AProVE 2305
Brick
Cbmc 3808 -262 460 284 1800 -198 2024 6733
CPA-BAM-SMGnew 3101 776
CPAchecker 5572 3057 498 531 1270 809 3835 11904
Cruxnew 1408 290
CSeq 655
Dartagnan 481
Deaglenew 757
Ebf new 496
Esbmc-kind 4959 2162 -74 318 1389 633 1841 7727
Frama-C-SV 213
Goblint 858 106 159 340 1951
Graves-CPA new 4520 802 2400 9218
Korn
Lartnew 3034 573
Locksmith new
PeSCo 5080 -273 3683 10515
Sesl new 345
Symbiotic 4571 4051 105 370 2030 2704 3274 12249
Thetanew 1132 -14
UAutomizer 3969 2350 493 506 2552 712 3089 11802
UGemCutter new 612
UKojak 2058 1573 0 445 0 382 2056 5078
UTaipan 3634 2336 535 501 0 486 3049 8666
VeriAbs 6923
VeriFuzz 1518 -777 -32 136 -129 0 817
GDart new 640
Java-Ranger 670
JayHorn 376
Jbmc 700
JDart 714
388 Dirk Beyer
Table 9: Quantitative overview over all hors-concours results; empty cells represent
opt-outs, new for first-time participants, for hors-concours participation
Verifier
ReachSafety
8631 points
5400 tasks
MemSafety
5003 points
3321 tasks
ConcurrencySafety
1160 points
763 tasks
NoOverflows
685 points
454 tasks
Termination
4144 points
2293 tasks
SoftwareSystems
5898 points
3417 tasks
FalsificationOverall
5718 points
13355 tasks
Overall
25209 points
15648 tasks
JavaOverall
828 points
586 tasks
CVT-AlgoSel new5438 314
CVT-ParPort new5904 3700 -551 553 2351 1282 1087 10704
CPA-BAM-BnB504
CPALockator-1154
Divine110 99 -136 0 0 112 -1253 -248
Esbmc-incr-74
Gazer-Theta
Infer new-50415 -5890 -5982 -29566
Lazy-CSeq571
Pinaka3710 -200 1259
PredatorHP2205
Smack1181
Coastal-2541
Spf430
for running the experiments are part of a compute cluster that consists of
167 machines; each verification run was executed on an otherwise completely
unloaded, dedicated machine, in order to achieve precise measurements. Each
machine had one Intel Xeon E3-1230 v5 CPU, with 8 processing units each,
a frequency of
3.4 GHz
,
33 GB
of RAM, and a GNU/Linux operating system
(x86_64-linux, Ubuntu 20.04 with Linux kernel 5.4). We used BenchExec [32]
to measure and control computing resources (CPU time, memory, CPU energy)
and VerifierCloud to distribute, install, run, and clean-up verification runs,
and to collect the results. The values for time and energy are accumulated
over all cores of the CPU. To measure the CPU energy, we used CPU Energy
Meter [35] (integrated in BenchExec [32]).
One complete verification execution of the competition consisted of
309 081
ver-
ification runs (each verifier on each verification task of the selected categories
according to the opt-outs), consuming 937 days of CPU time and
249 kWh
of CPU energy (without validation). Witness-based result validation required
1.43
million validation runs (each validator on each verification task for categories
with witness validation, and for each verifier), consuming 708 days of CPU time.
Each tool was executed several times, in order to make sure no installation issues
occur during the execution. Including preruns, the infrastructure managed a
Progress on Software Verification: SV-COMP 2022 389
Table 10: Overview of the top-three verifiers for each category;
new
for first-time
participants, measurements for CPU time and energy rounded to two significant digits
(‘–’ indicates a missing energy value due to a configuration bug)
Rank Verifier Score CPU CPU Solved Unconf. False Wrong
Time Energy Tasks Tasks Alarms Proofs
(in h) (in kWh)
ReachSafety
1VeriAbs 6923 170 1.8 4 117 359
2CPAchecker 5572 130 1.5 3 245 228 4
3PeSCo 5080 63 0.57 3 033 314 7
MemSafety
1Symbiotic 4051 2.6 0.034 2 167 1 097
2CPA-BAM-SMGnew 3101 7.3 0.064 2 975 17
3CPAchecker 3057 7.8 0.069 3 119 0
ConcurrencySafety
1Deagle new 757 0.50 0.0059 517 42
2CSeq 655 5.1 0.059 454 50
3UGemCutter new 612 4.9445 21
NoOverflows
1CPAchecker 531 1.2 0.012 366 3
2UAutomizer 506 2.0 0.019 356 2
3UTaipan 501 2.2 0.023 355 1
Termination
1UAutomizer 2552 13 0.12 1 581 8
2AProVE 2305 38 0.43 1 114 37
32ls 2178 2.9 0.025 1 163 203
SoftwareSystems
1Symbiotic 2704 1.2 0.016 1 188 73
2CPAchecker 809 52 0.60 1 660 169 1
3Graves-CPAnew 802 19 0.17 1 582 95 2 3
FalsificationOveral l
1CPAchecker 3835 81 0.90 3 626 95 5
2PeSCo 3683 45 0.41 3 552 110 9
3Symbiotic 3274 14 0.18 2 295 1 191 3
Overall
1Symbiotic 12249 34 0.44 7 430 1 529 3
2CPAchecker 11904 210 2.3 9 773 408 14
3UAutomizer 11802 170 1.7 7 948 311 2 2
JavaOverall
1JDart 714 1.2 0.012 522 0
2Jbmc 700 0.42 0.0039 506 0
3Java-Ranger 670 4.4 0.052 466 0
390 Dirk Beyer
1
10
100
1000
Min. time in s
2LS
CBMC
CVT-ParPort
CPAchecker-2-1
DIVINE
ESBMC-kind
Goblint
Graves-CPA
PeSCo
Symbiotic
UAutomizer
UKojak
UTaipan
-6000 -4000 -2000 0 2000 4000 6000 8000 10000 12000
Cumulative score
Fig. 4: Quantile functions for category
C-Overall
. Each quantile function illustrates
the quantile (
x
-coordinate) of the scores obtained by correct verification runs
below a certain run time (
y
-coordinate). More details were given previously [11].
A logarithmic scale is used for the time range from 1 s to 1000 s, and a linear
scale is used for the time range between 0 s and 1 s.
total of
2.85
million verification runs consuming 19 years of CPU time, and
16.3
million validation runs consuming 11 years of CPU time.
Quantitative Results. Tables 8 and 9present the quantitative overview of
all tools and all categories. Due to the large number of tools, we need to split
the presentation into two tables, one for the verifiers that participate in the
rankings (Table 8), and one for the hors-concours verifiers (Table 9). The head
row mentions the category, the maximal score for the category, and the number
of verification tasks. The tools are listed in alphabetical order; every table
row lists the scores of one verifier. We indicate the top three candidates by
formatting their scores in bold face and in larger font size. An empty table cell
means that the verifier opted-out from the respective main category (perhaps
participating in subcategories only, restricting the evaluation to a specific topic).
More information (including interactive tables, quantile plots for every category,
and also the raw data in XML format) is available on the competition web site
(
https://sv-comp.sosy-lab.org/2022/results
) and in the results artifact (see Table 4).
Table 10 reports the top three verifiers for each category. The run time (column
‘CPU Time’) and energy (column ‘CPU Energy’) refer to successfully solved
verification tasks (column ‘Solved Tasks’). We also report the number of tasks for
which no witness validator was able to confirm the result (column ‘Unconf. Tasks’).
The columns ‘False Alarms’ and ‘Wrong Proofs’ report the number of verification
tasks for which the verifier reported wrong results, i.e., reporting a counterexample
when the property holds (incorrect False) and claiming that the program fulfills
the property although it actually contains a bug (incorrect True), respectively.
Progress on Software Verification: SV-COMP 2022 391
Table 11: Results of verifiers in demonstration category NoDataRace
Verifier Score Correct true Correct false Incorrect true Incorrect false
CSeq 39 37 61 0 6
Dartagnan 299 47 23 13 0
Goblint 124 62 0 0 0
Locksmith new 34 17 0 0 0
UAutomizer 120 49 54 1 0
UGemCutter new 151 57 69 1 0
UKojak 0 0 0 0 0
UTaipan 139 56 59 1 0
Score-Based Quantile Functions for Quality Assessment. We use score-
based quantile functions [11,32] because these visualizations make it easier
to understand the results of the comparative evaluation. The results archive
(see Table 4) and the web site (
https://sv-comp.sosy-lab.org/2022/results
) include
such a plot for each (sub-)category. As an example, we show the plot for category
C-Overall
(all verification tasks) in Fig. 4. A total of 13 verifiers participated in
category
C-Overall
, for which the quantile plot shows the overall performance over
all categories (scores for meta categories are normalized [11]). A more detailed
discussion of score-based quantile plots, including examples of what insights one
can obtain from the plots, is provided in previous competition reports [11,14].
The winner of the competition, Symbiotic, not only achieves the best cum-
mulative score (graph for Symbiotic has the longest width from
x
= 0 to its right
end), but is also extremely efficient (area below the graph is very small). Verifiers
whose graphs start with a negative commulative score produced wrong results.
Several verifiers whose graphs start with a minimal CPU time larger than
3 s
are based on Java and the time is consumed by starting the JVM.
Demo Category
NoDataRace
.SV-COMP 2022 had a new category on
data-race detection and we report the results in Table 11. The benchmark
set contained a total of 162 verification tasks. The category was defined as
a demonstration category because it was not clear how many verifiers would
participate. Eight verifiers specified the execution for this sub-category in their
benchmark definition
3
and participated in this demonstration. A detailed table
was generated by BenchExec’s table-generator together with all other results as
well and is available on the competition web site and in the artifact (see Table 4).
The results are presented as a means to show that such a category is useful;
the results do not represent the full potential of the verifiers, as they were not
fully tuned by their developers but handed in for demonstrating abilities only.
Alternative Rankings. The community suggested to report a couple of alterna-
tive rankings that honor different aspects of the verification process as complement
to the official SV-COMP ranking. Table 12 is similar to Table 10, but contains
3https://gitlab.com/sosy-lab/sv-comp/bench-defs/-/tree/svcomp22/benchmark-defs
392 Dirk Beyer
Table 12: Alternative rankings for catagory
Overall
; quality is given in score
points (sp), CPU time in hours (h), kilo-watt-hours (kWh), wrong results in
errors (E), rank measures in errors per score point (E/sp), joule per score point
(J/sp), and score points (sp)
Rank Verifier Quality CPU CPU Solved Wrong Rank
Time Energy Tasks Results Measure
(sp) (h) (kWh) (E)
Correct Verifiers (E/sp)
1Goblint 1 951 4.9 0.070 1 574 0 0
2UKojak 5 078 66 0.71 3 988 1 .00020
3Symbiotic 12 249 34 0.44 7 430 3 .00024
worst (with pos. score) 282 .042
Green Verifiers (J/sp)
1Goblint 1 951 4.9 0.070 1 574 0 120
2Symbiotic 12 249 34 0.44 7 430 3 130
3Cbmc 6 733 25 0.27 6 479 282 140
worst (with pos. score) 690
the alternative ranking categories
Correct
and
Green Verifiers
. Column ‘Quality’
gives the score in score points, column ‘CPU Time’ the CPU usage of successful
runs in hours, column ‘CPU Energy’ the CPU usage of successful runs in kWh,
column ‘Solved Tasks’ the number of correct results, column ‘Wrong Results’
the sum of false alarms and wrong proofs in number of errors, and column
‘Rank Measure’ gives the measure to determine the alternative rank.
Correct Verifiers Low Failure Rate. The right-most columns of Table 10
report that the verifiers achieve a high degree of correctness (all top three
verifiers in the
C-Overall
have less than 2wrong results). The winners of
category
Java-Overall
produced not a single wrong answer. The first category in
Table 12 uses a failure rate as rank measure:
number of incorrect results
max(total score,1)
, the number
of errors per score point (
E/sp
). We use
E
as unit for
number of incorrect results
and
sp
as unit for
total score
. The worst result was
0.023 E/sp
in SV-COMP 2021
and is now at
0.042 E/sp
.Goblint is the best verifier regarding this measure.
Green Verifiers Low Energy Consumption. Since a large part of the cost of
verification is given by the energy consumption, it might be important to also
consider the energy efficiency. The second category in Table 12 uses the energy
consumption per score point as rank measure:
total CPU energy
max(total score,1)
, with the unit
J/sp
.
The worst result from SV-COMP 2021 was
630 J/sp
and is now at
690 J/sp
. Also
here, Goblint is the best verifier regarding this measure.
New Verifiers. To acknowledge the verification systems that participate for
the first or second time in SV-COMP, Table 13 lists the new verifiers (in
SV-COMP 2021 or SV-COMP 2022).
Progress on Software Verification: SV-COMP 2022 393
Table 13: New verifiers in SV-COMP 2021 and SV-COMP 2022; column ‘Sub-
categories’ gives the number of executed categories (including demo category
NoDataRace ), new for first-time participants, for hors-concours participation
Verifier Language First Year Sub-categories
CVT-AlgoSel newC 2022 18
CVT-ParPort new C 2022 35
CPA-BAM-SMGnew C 2022 16
Crux new C 2022 20
Deagle new C 2022 1
Ebf new C 2022 1
Graves-CPAnew C 2022 35
Infer newC 2022 25
Lartnew C 2022 22
Locksmith new C 2022 1
Sesl new C 2022 6
Thetanew C 2022 13
UGemCutter new C 2022 2
Frama-C-SV C 2021 4
Gazer-ThetaC 2021 9
Goblint C 2021 25
Korn C 2021 4
GDart new Java 2022 1
Table 14: Confirmation rate of verification witnesses during the evaluation in
SV-COMP 2022;
new
for first-time participants,
for hors-concours participation
Result True False
Total Confirmed Unconf. Total Confirmed Unconf.
2ls 2 394 2 388 99.7 %61 648 1 363 82.7 % 285
Cbmc 3 837 3 493 91.0 % 344 3 536 2 986 84.4 % 550
CVT-ParPort new 7 440 7 083 95.2 % 357 4 754 4 332 91 .1 % 422
CPAchecker 6 006 5 701 94.9 % 305 4 175 4 072 97.5 % 103
Divine1 692 1 672 98.8 % 20 1 040 870 83.7 % 170
Esbmc-kind 5 542 5 483 98.9 % 59 3 034 2 556 84.2 % 478
Goblint 1 657 1 574 95.0 % 83 0 0
Graves-CPAnew 5 651 5 458 96.6 % 193 3 723 3 576 96.1 % 147
PeSCo 6 155 5 734 93.2 % 421 4 116 3 934 95.6 % 182
Symbiotic 4 878 4 798 98.4 % 80 4 081 2 632 64.5 % 1 449
UAutomizer 5 751 5 591 97.2 % 160 2 508 2 357 94.0 % 151
UKojak 2 875 2 863 99.6 % 12 1 144 1 125 98.3 %19
UTaipan 4 567 4 513 98.8 % 54 1 719 1 576 91.7 % 143
Verifiable Witnesses. Results validation is of primary importance in the
competition. All SV-COMP verifiers are required to justify the result (True
or False) by producing a verification witness (except for those categories for
which no result validator is available). We used ten independently developed
witness-based result validators and one witness linter (see Table 1).
394 Dirk Beyer
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
0
20
40
60
10 6
513
17 11
410
64
14
510 9
18 21 17 11
22 26 33
Evaluated verifiers
Fig. 5: Number of evaluated verifiers for each year (first-time participants on top)
Table 14 shows the confirmed versus unconfirmed results: the first column
lists the verifiers of category
C-Overall
, the three columns for result True reports
the total, confirmed, and unconfirmed number of verification tasks for which the
verifier answered with True, respectively, and the three columns for result False
reports the total, confirmed, and unconfirmed number of verification tasks for
which the verifier answered with False, respectively. More information (for all
verifiers) is given in the detailed tables on the competition web site and in the
results artifact; all verification witnesses are also contained in the witnesses
artifact (see Table 4). The verifiers 2ls and UKojak are the winners in terms of
confirmed results for expected results True and False, respectively. The overall
interpretation is similar to SV-COMP 2020 and 2021 [17,18].
6
Conclusion
The 11th edition of the Competition on Software Verification (SV-COMP 2022)
was the largest ever, with 47 participating verification systems (incl. 14 hors-
concours and 14 new verifiers) (see Fig. 5 for the participation numbers and
Table 5 for the details). The number of result validators was increased from 6 in
2021 to 11 in 2022, to validate the results (Table 1). The number of verification
tasks was increased to
15 648
in the C category and to
586
in the Java category,
and a new category on data-race detection was demonstrated. A new section
in this report (Sect. 3) explains steps to reproduce verification results and to
investigate problems during execution, and a new table tried to give an overview of
the usage of common solver libraries and frameworks. The high quality standards
of the TACAS conference, in particular with respect to the important principles
of fairness, community support, and transparency are ensured by a competition
jury in which each participating team had a member. We hope that the broad
overview of verification tools stimulates their further application by an ever
growing user community of formal methods.
Data-Availability Statement. The verification tasks and results of the compe-
tition are published at Zenodo, as described in Table 4. All components and data
that are necessary for reproducing the competition are available in public version
repositories, as specified in Table 3. For easy access, the results are presented
also online on the competition web site
https://sv-comp.sosy-lab.org/2022/results
.
Progress on Software Verification: SV-COMP 2022 395
Funding Statement. This project was funded in part by the Deutsche Forschungs-
gemeinschaft (DFG) 418257054 (Coop).
References
1.
Ádám, Zs., Sallai, Gy., Hajdu, Á.: Gazer-Theta: LLVM-based verifier portfolio
with BMC/CEGAR (competition contribution). In: Proc. TACAS (2). pp. 433–437.
LNCS 12652, Springer (2021). https://doi.org/10.1007/978-3-030-72013-1_27
2.
Afzal, M., Asia, A., Chauhan, A., Chimdyalwar, B., Darke, P., Datar, A., Kumar,
S., Venkatesh, R.: VeriAbs: Verification by abstraction and test generation. In:
Proc. ASE. pp. 1138–1141 (2019). https://doi.org/10.1109/ASE.2019.00121
3.
Andrianov, P., Friedberger, K., Mandrykin, M.U., Mutilin, V.S., Volkov, A.: CPA-
BAM-BnB: Block-abstraction memoization and region-based memory models for
predicate abstractions (competition contribution). In: Proc. TACAS. pp. 355–359.
LNCS 10206, Springer (2017). https://doi.org/10.1007/978-3-662-54580-5_22
4.
Andrianov, P., Mutilin, V., Khoroshilov, A.: CPALockator: Thread-modular ap-
proach with projections (competition contribution). In: Proc. TACAS (2). pp. 423–
427. LNCS 12652, Springer (2021).
https://doi.org/10.1007/978-3-030-72013-1_25
5.
Andrianov, P.S.: Analysis of correct synchronization of operating
system components. Program. Comput. Softw. 46, 712–730 (2020).
https://doi.org/10.1134/S0361768820080022
6.
Ayaziová, P., Chalupa, M., Strejček, J.: Symbiotic-Witch: A Klee-based viola-
tion witness checker (competition contribution). In: Proc. TACAS. LNCS 13244,
Springer (2022)
7.
Balyo, T., Heule, M.J.H., Järvisalo, M.: SAT Competition 2016: Recent develop-
ments. In: Proc. AAAI. pp. 5061–5063. AAAI Press (2017)
8.
Baranová, Z., Barnat, J., Kejstová, K., Kučera, T., Lauko, H., Mrázek, J., Ročkai,
P., Štill, V.: Model checking of C and C++ with Divine 4. In: Proc. ATVA. pp. 201–
207. LNCS 10482, Springer (2017).
https://doi.org/10.1007/978-3-319-68167-2_14
9.
Bartocci, E., Beyer, D., Black, P.E., Fedyukovich, G., Garavel, H., Hartmanns, A.,
Huisman, M., Kordon, F., Nagele, J., Sighireanu, M., Steffen, B., Suda, M., Sutcliffe,
G., Weber, T., Yamada, A.: TOOLympics 2019: An overview of competitions in
formal methods. In: Proc. TACAS (3). pp. 3–24. LNCS 11429, Springer (2019).
https://doi.org/10.1007/978-3-030-17502-3_1
10.
Beyer, D.: Competition on software verification (SV-COMP).
In: Proc. TACAS. pp. 504–524. LNCS 7214, Springer (2012).
https://doi.org/10.1007/978-3-642-28756-5_38
11.
Beyer, D.: Second competition on software verification (Summary of SV-
COMP 2013). In: Proc. TACAS. pp. 594–609. LNCS 7795, Springer (2013).
https://doi.org/10.1007/978-3-642-36742-7_43
12.
Beyer, D.: Status report on software verification (Competition summary SV-
COMP 2014). In: Proc. TACAS. pp. 373–388. LNCS 8413, Springer (2014).
https://doi.org/10.1007/978-3-642-54862-8_25
13.
Beyer, D.: Software verification and verifiable witnesses (Report on SV-
COMP 2015). In: Proc. TACAS. pp. 401–416. LNCS 9035, Springer (2015).
https://doi.org/10.1007/978-3-662-46681-0_31
14.
Beyer, D.: Reliable and reproducible competition results with BenchExec and
witnesses (Report on SV-COMP 2016). In: Proc. TACAS. pp. 887–904. LNCS 9636,
Springer (2016). https://doi.org/10.1007/978-3-662-49674-9_55
396 Dirk Beyer
15.
Beyer, D.: Software verification with validation of results (Report on SV-
COMP 2017). In: Proc. TACAS. pp. 331–349. LNCS 10206, Springer (2017).
https://doi.org/10.1007/978-3-662-54580-5_20
16.
Beyer, D.: Automatic verification of C and Java programs: SV-COMP
2019. In: Proc. TACAS (3). pp. 133–155. LNCS 11429, Springer (2019).
https://doi.org/10.1007/978-3-030-17502-3_9
17.
Beyer, D.: Advances in automatic software verification: SV-COMP 2020.
In: Proc. TACAS (2). pp. 347–367. LNCS 12079, Springer (2020).
https://doi.org/10.1007/978-3-030-45237-7_21
18.
Beyer, D.: Software verification: 10th comparative evaluation (SV-COMP
2021). In: Proc. TACAS (2). pp. 401–422. LNCS 12652, Springer (2021).
https://doi.org/10.1007/978-3-030-72013-1_24
19.
Beyer, D.: Status report on software testing: Test-Comp 2021.
In: Proc. FASE. pp. 341–357. LNCS 12649, Springer (2021).
https://doi.org/10.1007/978-3-030-71500-7_17
20.
Beyer, D.: Advances in automatic software testing: Test-Comp 2022. In: Proc.
FASE. LNCS 13241, Springer (2022)
21.
Beyer, D.: Results of the 11th Intl. Competition on Software Verification (SV-
COMP 2022). Zenodo (2022). https://doi.org/10.5281/zenodo.5831008
22.
Beyer, D.: SV-Benchmarks: Benchmark set for software verification
and testing (SV-COMP 2022 and Test-Comp 2022). Zenodo (2022).
https://doi.org/10.5281/zenodo.5831003
23.
Beyer, D.: Verification witnesses from verification tools (SV-COMP 2022). Zenodo
(2022). https://doi.org/10.5281/zenodo.5838498
24.
Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software Verifi-
cation (SV-COMP 2022). Zenodo (2022).
https://doi.org/10.5281/zenodo.5959149
25.
Beyer, D., Dangl, M., Dietsch, D., Heizmann, M.: Correctness witnesses: Exchang-
ing verification results between verifiers. In: Proc. FSE. pp. 326–337. ACM (2016).
https://doi.org/10.1145/2950290.2950351
26.
Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Stahlbauer, A.: Witness valida-
tion and stepwise testification across software verifiers. In: Proc. FSE. pp. 721–733.
ACM (2015). https://doi.org/10.1145/2786805.2786867
27.
Beyer, D., Dangl, M., Lemberger, T., Tautschnig, M.: Tests from witnesses:
Execution-based validation of verification results. In: Proc. TAP. pp. 3–23.
LNCS 10889, Springer (2018). https://doi.org/10.1007/978-3-319-92994-1_1
28.
Beyer, D., Friedberger, K.: Violation witnesses and result validation for multi-
threaded programs. In: Proc. ISoLA (1). pp. 449–470. LNCS 12476, Springer
(2020). https://doi.org/10.1007/978-3-030-61362-4_26
29.
Beyer, D., Kanav, S.: CoVeriTeam: On-demand composition of cooperative
verification systems. In: Proc. TACAS. Springer (2022)
30.
Beyer, D., Kanav, S., Richter, C.: Construction of Verifier Combinations Based on
Off-the-Shelf Verifiers. In: Proc. FASE. Springer (2022)
31.
Beyer, D., Keremoglu, M.E.: CPAchecker: A tool for configurable soft-
ware verification. In: Proc. CAV. pp. 184–190. LNCS 6806, Springer (2011).
https://doi.org/10.1007/978-3-642-22110-1_16
32.
Beyer, D., Löwe, S., Wendler, P.: Reliable benchmarking: Requirements
and solutions. Int. J. Softw. Tools Technol. Transfer 21(1), 1–29 (2019).
https://doi.org/10.1007/s10009-017-0469-y
33.
Beyer, D., Spiessl, M.: MetaVal: Witness validation via verifica-
tion. In: Proc. CAV. pp. 165–177. LNCS 12225, Springer (2020).
https://doi.org/10.1007/978-3-030-53291-8_10
Progress on Software Verification: SV-COMP 2022 397
34.
Beyer, D., Spiessl, M.: The static analyzer Frama-C in SV-COMP (competition
contribution). In: Proc. TACAS. LNCS 13244, Springer (2022)
35.
Beyer, D., Wendler, P.: CPU Energy Meter: A tool for energy-aware algorithms
engineering. In: Proc. TACAS (2). pp. 126–133. LNCS 12079, Springer (2020).
https://doi.org/10.1007/978-3-030-45237-7_8
36.
Brain, M., Joshi, S., Kröning, D., Schrammel, P.: Safety verification and refutation
by k-invariants and k-induction. In: Proc. SAS. pp. 145–161. LNCS 9291, Springer
(2015). https://doi.org/10.1007/978-3-662-48288-9_9
37.
Bu, L., Xie, Z., Lyu, L., Li, Y., Guo, X., Zhao, J., Li, X.: Brick: Path enumeration-
based bounded reachability checking of C programs (competition contribution).
In: Proc. TACAS. LNCS 13244, Springer (2022)
38.
Calcagno, C., Distefano, D., O’Hearn, P.W., Yang, H.: Compositional
shape analysis by means of bi-abduction. ACM 58(6), 26:1–26:66 (2011).
https://doi.org/10.1145/2049697.2049700
39.
Chalupa, M., Strejček, J., Vitovská, M.: Joint forces for mem-
ory safety checking. In: Proc. SPIN. pp. 115–132. Springer (2018).
https://doi.org/10.1007/978-3-319-94111-0_7
40.
Chalupa, M., Řechtáčková, A., Mihalkovič, V., Zaoral, L., Strejček, J.: Symbiotic
9: String analysis and backward symbolic execution with loop folding (competition
contribution). In: Proc. TACAS. LNCS 13244, Springer (2022)
41.
Chaudhary, E., Joshi, S.: Pinaka: Symbolic execution meets incremental solving
(competition contribution). In: Proc. TACAS (3). pp. 234–238. LNCS 11429,
Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_20
42.
Chowdhury, A.B., Medicherla, R.K., Venkatesh, R.: VeriFuzz: Program-aware
fuzzing (competition contribution). In: Proc. TACAS (3). pp. 244–249. LNCS 11429,
Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_22
43.
Cordeiro, L.C., Fischer, B.: Verifying multi-threaded software using SMT-based
context-bounded model checking. In: Proc. ICSE. pp. 331–340. ACM (2011).
https://doi.org/10.1145/1985793.1985839
44.
Cordeiro, L.C., Kesseli, P., Kröning, D., Schrammel, P., Trtík, M.: JBmc: A
bounded model checking tool for verifying Java bytecode. In: Proc. CAV. pp. 183–
190. LNCS 10981, Springer (2018).
https://doi.org/10.1007/978-3-319-96145-3_10
45.
Cordeiro, L.C., Kröning, D., Schrammel, P.: Jbmc: Bounded model checking for
Java bytecode (competition contribution). In: Proc. TACAS (3). pp. 219–223.
LNCS 11429, Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_17
46.
Cordeiro, L.C., Morse, J., Nicole, D., Fischer, B.: Context-bounded model checking
with Esbmc 1.17 (competition contribution). In: Proc. TACAS. pp. 534–537.
LNCS 7214, Springer (2012). https://doi.org/10.1007/978-3-642-28756-5_42
47.
Coto, A., Inverso, O., Sales, E., Tuosto, E.: A prototype for data race detection
in CSeq 3 (competition contribution). In: Proc. TACAS. LNCS 13244, Springer
(2022)
48.
Cuoq, P., Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J.,
Yakobowski, B.: Frama-C. In: Proc. SEFM. pp. 233–247. Springer (2012).
https://doi.org/10.1007/978-3-642-33826-7_16
49.
Dangl, M., Löwe, S., Wendler, P.: CPAchecker with support for
recursive programs and floating-point arithmetic (competition contribu-
tion). In: Proc. TACAS. pp. 423–425. LNCS 9035, Springer (2015).
https://doi.org/10.1007/978-3-662-46681-0_34
50.
Darke, P., Agrawal, S., Venkatesh, R.: VeriAbs: A tool for scalable verification
by abstraction (competition contribution). In: Proc. TACAS (2). pp. 458–462.
LNCS 12652, Springer (2021). https://doi.org/10.1007/978-3-030-72013-1_32
398 Dirk Beyer
51.
Dietsch, D., Heizmann, M., Nutz, A., Schätzle, C., Schüssele, F.: Ultimate
Taipan with symbolic interpretation and fluid abstractions (competition con-
tribution). In: Proc. TACAS (2). pp. 418–422. LNCS 12079, Springer (2020).
https://doi.org/10.1007/978-3-030-45237-7_32
52.
Dockins, R., Foltzer, A., Hendrix, J., Huffman, B., McNamee, D., Tomb,
A.: Constructing semantic models of programs with the software analy-
sis workbench. In: Proc. VSTTE. pp. 56–72. LNCS 9971, Springer (2016).
https://doi.org/10.1007/978-3-319-48869-1_5
53.
Dross, C., Furia, C.A., Huisman, M., Monahan, R., Müller, P.: Verifythis 2019:
A program-verification competition. Int. J. Softw. Tools Technol. Transf. 23(6),
883–893 (2021). https://doi.org/10.1007/s10009-021-00619-x
54.
Ermis, E., Hoenicke, J., Podelski, A.: Splitting via interpolants.
In: Proc. VMCAI. pp. 186–201. LNCS 7148, Springer (2012).
https://doi.org/10.1007/978-3-642-27940-9_13
55.
Ernst, G.: A complete approach to loop verification with invariants and summaries.
Tech. Rep. arXiv:2010.05812v2, arXiv (January 2020)
56.
Gadelha, M.Y.R., Monteiro, F.R., Cordeiro, L.C., Nicole, D.A.: Esbmc v6.0:
Verifying C programs using k-induction and invariant inference (competition
contribution). In: Proc. TACAS (3). pp. 209–213. LNCS 11429, Springer (2019).
https://doi.org/10.1007/978-3-030-17502-3_15
57.
Gadelha, M.Y., Ismail, H.I., Cordeiro, L.C.: Handling loops in bounded model
checking of C programs via k-induction. Int. J. Softw. Tools Technol. Transf. 19(1),
97–114 (February 2017). https://doi.org/10.1007/s10009-015-0407-9
58.
Gavrilenko, N., Ponce de León, H., Furbach, F., Heljanko, K., Meyer,
R.: BMC for weak memory models: Relation analysis for compact SMT
encodings. In: Proc. CAV. pp. 355–365. LNCS 11561, Springer (2019).
https://doi.org/10.1007/978-3-030-25540-4_19
59.
Greitschus, M., Dietsch, D., Podelski, A.: Loop invariants from coun-
terexamples. In: Proc. SAS. pp. 128–147. LNCS 10422, Springer (2017).
https://doi.org/10.1007/978-3-319-66706-5_7
60.
Hajdu, Á., Micskei, Z.: Efficient strategies for CEGAR-based
model checking. J. Autom. Reasoning 64(6), 1051–1091 (2020).
https://doi.org/10.1007/s10817-019-09535-x
61.
Haran, A., Carter, M., Emmi, M., Lal, A., Qadeer, S., Rakamarić,
Z.: Smack+Corral: A modular verifier (competition contribu-
tion). In: Proc. TACAS. pp. 451–454. LNCS 9035, Springer (2015).
https://doi.org/10.1007/978-3-662-46681-0_42
62.
He, F., Sun, Z., Fan, H.: Deagle: An SMT-based verifier for multi-threaded
programs (competition contribution). In: Proc. TACAS. LNCS 13244, Springer
(2022)
63.
Heizmann, M., Chen, Y.F., Dietsch, D., Greitschus, M., Hoenicke, J., Li,
Y., Nutz, A., Musa, B., Schilling, C., Schindler, T., Podelski, A.: Ulti-
mate Automizer and the search for perfect interpolants (competition con-
tribution). In: Proc. TACAS (2). pp. 447–451. LNCS 10806, Springer (2018).
https://doi.org/10.1007/978-3-319-89963-3_30
64.
Heizmann, M., Hoenicke, J., Podelski, A.: Software model checking for people
who love automata. In: Proc. CAV. pp. 36–52. LNCS 8044, Springer (2013).
https://doi.org/10.1007/978-3-642-39799-8_2
65.
Hensel, J., Mensendiek, C., Giesl, J.: AProVE: Non-termination witnesses for C
programs (competition contribution). In: Proc. TACAS. LNCS 13244, Springer
(2022)
Progress on Software Verification: SV-COMP 2022 399
66.
Holík, L., Kotoun, M., Peringer, P., Šoková, V., Trtík, M., Vojnar,
T.: Predator shape analysis tool suite. In: Hardware and Software:
Verification and Testing. pp. 202–209. LNCS 10028, Springer (2016).
https://doi.org/10.1007/978-3-319-49052-6
67.
Howar, F., Jasper, M., Mues, M., Schmidt, D.A., Steffen, B.: The RERS challenge:
Towards controllable and scalable benchmark synthesis. Int. J. Softw. Tools Technol.
Transf. 23(6), 917–930 (2021). https://doi.org/10.1007/s10009-021-00617-z
68.
Howar, F., Mues, M.: GWit (competition contribution). In: Proc. TACAS.
LNCS 13244, Springer (2022)
69.
Inverso, O., Tomasco, E., Fischer, B., La Torre, S., Parlato, G.: Lazy-CSeq: A lazy
sequentialization tool for C (competition contribution). In: Proc. TACAS. pp. 398–
401. LNCS 8413, Springer (2014).
https://doi.org/10.1007/978-3-642-54862-8_29
70.
Inverso, O., Tomasco, E., Fischer, B., Torre, S.L., Parlato, G.: Bounded verification
of multi-threaded programs via lazy sequentialization. ACM Trans. Program. Lang.
Syst. 44(1), 1:1–1:50 (2022). https://doi.org/10.1145/3478536
71.
Inverso, O., Trubiani, C.: Parallel and distributed bounded model checking
of multi-threaded programs. In: Proc. PPoPP. pp. 202–216. ACM (2020).
https://doi.org/10.1145/3332466.3374529
72.
Kahsai, T., Rümmer, P., Sanchez, H., Schäf, M.: JayHorn: A framework for
verifying Java programs. In: Proc. CAV. pp. 352–358. LNCS 9779, Springer (2016).
https://doi.org/10.1007/978-3-319-41528-4_19
73.
Kettl, M., Lemberger, T.: The static analyzer Infer in SV-COMP (competition
contribution). In: Proc. TACAS. LNCS 13244, Springer (2022)
74.
Klumpp, D., Dietsch, D., Heizmann, M., Schüssele, F., Ebbinghaus, M., Farzan, A.,
Podelski, A.: Ultimate GemCutter and the axes of generalization (competition
contribution). In: Proc. TACAS. LNCS 13244, Springer (2022)
75.
Kröning, D., Tautschnig, M.: Cbmc: C bounded model checker (competition
contribution). In: Proc. TACAS. pp. 389–391. LNCS 8413, Springer (2014).
https://doi.org/10.1007/978-3-642-54862-8_26
76.
Lauko, H., Ročkai, P., Barnat, J.: Symbolic computation via pro-
gram transformation. In: Proc. ICTAC. pp. 313–332. Springer (2018).
https://doi.org/10.1007/978-3-030-02508-3_17
77.
Lauko, H., Ročkai, P.: Lart: Compiled abstract execution (competition contribu-
tion). In: Proc. TACAS. LNCS 13244, Springer (2022)
78.
Lauko, H., Ročkai, P., Barnat, J.: Symbolic computation via program trans-
formation. In: Proc. ICTAC. pp. 313–332. LNCS 11187, Springer (2018).
https://doi.org/10.1007/978-3-030-02508-3_17
79.
Leeson, W., Dwyer, M.: Graves-CPA: A graph-attention verifier selector (com-
petition contribution). In: Proc. TACAS. LNCS 13244, Springer (2022)
80.
Luckow, K.S., Dimjasevic, M., Giannakopoulou, D., Howar, F., Isberner, M.,
Kahsai, T., Rakamaric, Z., Raman, V.: JDart: A dynamic symbolic analy-
sis framework. In: Proc. TACAS. pp. 442–459. LNCSS 9636, Springer (2016).
https://doi.org/10.1007/978-3-662-49674-9_26
81.
Malík, V., Schrammel, P., Vojnar, T.: 2ls: Heap analysis and memory safety
(competition contribution). In: Proc. TACAS (2). pp. 368–372. LNCS 12079,
Springer (2020). https://doi.org/10.1007/978-3-030-45237-7_22
82.
Metta, R., Medicherla, R.K., Chakraborty, S.: BMC+Fuzz: Efficient and effective
test generation. In: Proc. DATE. IEEE (2022)
83.
Mues, M., Howar, F.: JDart: Portfolio solving, breadth-first search and SMT-Lib
strings (competition contribution). In: Proc. TACAS (2). pp. 448–452. LNCS 12652,
Springer (2021). https://doi.org/10.1007/978-3-030-72013-1_30
400 Dirk Beyer
84.
Mues, M., Howar, F.: GDart (competition contribution). In: Proc. TACAS.
LNCS 13244, Springer (2022)
85.
Noller, Y., Păsăreanu, C.S., Le, X.B.D., Visser, W., Fromherz, A.:
Symbolic Pathfinder for SV-COMP (competition contribution).
In: Proc. TACAS (3). pp. 239–243. LNCS 11429, Springer (2019).
https://doi.org/10.1007/978-3-030-17502-3_21
86.
Nutz, A., Dietsch, D., Mohamed, M.M., Podelski, A.: Ultimate Kojak with
memory safety checks (competition contribution). In: Proc. TACAS. pp. 458–460.
LNCS 9035, Springer (2015). https://doi.org/10.1007/978-3-662-46681-0_44
87.
Peringer, P., Šoková, V., Vojnar, T.: PredatorHP revamped (not only) for
interval-sized memory regions and memory reallocation (competition contri-
bution). In: Proc. TACAS (2). pp. 408–412. LNCS 12079, Springer (2020).
https://doi.org/10.1007/978-3-030-45237-7_30
88.
Ponce-De-Leon, H., Haas, T., Meyer, R.: Dartagnan: Leveraging com-
piler optimizations and the price of precision (competition contribu-
tion). In: Proc. TACAS (2). pp. 428–432. LNCS 12652, Springer (2021).
https://doi.org/10.1007/978-3-030-72013-1_26
89.
Ponce-De-Leon, H., Haas, T., Meyer, R.: Dartagnan: Smt-based violation witness
validation (competition contribution). In: Proc. TACAS. LNCS 13244, Springer
(2022)
90.
Pratikakis, P., Foster, J.S., Hicks, M.: Locksmith: Practical static race de-
tection for C. ACM Trans. Program. Lang. Syst. 33(1) (January 2011).
https://doi.org/10.1145/1889997.1890000
91.
Păsăreanu, C.S., Visser, W., Bushnell, D.H., Geldenhuys, J., Mehlitz, P.C., Rungta,
N.: Symbolic PathFinder: integrating symbolic execution with model check-
ing for Java bytecode analysis. Autom. Software Eng. 20(3), 391–425 (2013).
https://doi.org/10.1007/s10515-013-0122-2
92.
Rakamarić, Z., Emmi, M.: SMACK: Decoupling source language details from
verifier implementations. In: Proc. CAV. pp. 106–113. LNCS 8559, Springer (2014).
https://doi.org/10.1007/978-3-319-08867-9_7
93.
Richter, C., Hüllermeier, E., Jakobs, M.C., Wehrheim, H.: Algorithm selection for
software validation based on graph kernels. Autom. Softw. Eng. 27(1), 153–186
(2020). https://doi.org/10.1007/s10515-020-00270-x
94.
Richter, C., Wehrheim, H.: PeSCo: Predicting sequential combinations of verifiers
(competition contribution). In: Proc. TACAS (3). pp. 229–233. LNCS 11429,
Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_19
95.
Saan, S., Schwarz, M., Apinis, K., Erhard, J., Seidl, H., Vogler, R., Vojdani, V.:
Goblint: Thread-modular abstract interpretation using side-effecting constraints
(competition contribution). In: Proc. TACAS (2). pp. 438–442. LNCS 12652,
Springer (2021). https://doi.org/10.1007/978-3-030-72013-1_28
96.
Scott, R., Dockins, R., Ravitch, T., Tomb, A.: Crux: Symbolic execution meets
SMT-based verification (competition contribution). Zenodo (February 2022).
https://doi.org/10.5281/zenodo.6147218
97.
Shamakhi, A., Hojjat, H., Rümmer, P.: Towards string support in JayHorn
(competition contribution). In: Proc. TACAS (2). pp. 443–447. LNCS 12652,
Springer (2021). https://doi.org/10.1007/978-3-030-72013-1_29
98.
Sharma, V., Hussein, S., Whalen, M.W., McCamant, S.A., Visser,
W.: Java Ranger at SV-COMP 2020 (competition contribution).
In: Proc. TACAS (2). pp. 393–397. LNCS 12079, Springer (2020).
https://doi.org/10.1007/978-3-030-45237-7_27
Progress on Software Verification: SV-COMP 2022 401
99.
Sharma, V., Hussein, S., Whalen, M.W., McCamant, S.A., Visser, W.:
Java Ranger: Statically summarizing regions for efficient symbolic ex-
ecution of Java. In: Proc. ESEC/FSE. pp. 123–134. ACM (2020).
https://doi.org/10.1145/3368089.3409734
100.
Ströder, T., Giesl, J., Brockschmidt, M., Frohn, F., Fuhs, C., Hensel, J., Schneider-
Kamp, P., Aschermann, C.: Automatically proving termination and memory safety
for programs with pointer arithmetic. J. Autom. Reasoning 58(1), 33–65 (2017).
https://doi.org/10.1007/s10817-016-9389-x
101. Tóth, T., Hajdu, A., Vörös, A., Micskei, Z., Majzik, I.: Theta: A framework for
abstraction refinement-based model checking. In: Proc. FMCAD. pp. 176–179
(2017). https://doi.org/10.23919/FMCAD.2017.8102257
102.
Visser, W., Geldenhuys, J.: Coastal: Combining concolic and fuzzing for Java
(competition contribution). In: Proc. TACAS (2). pp. 373–377. LNCS 12079,
Springer (2020). https://doi.org/10.1007/978-3-030-45237-7_23
103.
Vojdani, V., Apinis, K., Rõtov, V., Seidl, H., Vene, V., Vogler, R.: Static race
detection for device drivers: The Goblint approach. In: Proc. ASE. pp. 391–402.
ACM (2016). https://doi.org/10.1145/2970276.2970337
104.
Volkov, A.R., Mandrykin, M.U.: Predicate abstractions memory mod-
eling method with separation into disjoint regions. Proceedings of
the Institute for System Programming (ISPRAS) 29, 203–216 (2017).
https://doi.org/10.15514/ISPRAS-2017-29(4)-13
105.
Švejda, J., Berger, P., Katoen, J.P.: Interpretation-based violation witness valida-
tion for C: NitWit. In: Proc. TACAS. pp. 40–57. LNCS 12078, Springer (2020).
https://doi.org/10.1007/978-3-030-45190-5_3
106.
Wendler, P., Beyer, D.: sosy-lab/benchexec: Release 3.10. Zenodo (2022).
https://doi.org/10.5281/zenodo.5720267
107.
Wetzler, N., Heule, M.J.H., Jr., W.A.H.: Drat-trim: Efficient checking and
trimming using expressive clausal proofs. In: Proc. SAT. pp. 422–429. LNCS 8561,
Springer (2014). https://doi.org/10.1007/978-3-319-09284-3_31
108.
Wu, T., Schrammel, P., Cordeiro, L.: Wit4Java: A violation-witness validator for
Java verifiers (competition contribution). In: Proc. TACAS. LNCS 13244, Springer
(2022)
109.
Ádám, Z., Bajczi, L., Dobos-Kovács, M., Hajdu, A., Molnár, V.: Theta: Portfolio of
cegar-based analyses with dynamic algorithm selection (competition contribution).
In: Proc. TACAS. LNCS 13244, Springer (2022)
Open Access. This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/
),
which permits use, sharing, adaptation, distribution, and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
402 Dirk Beyer
AProVE: Non-Termination Witnesses
for CPrograms
(Competition Contribution)
Jera Hensel, Constantin Mensendiek , and Jürgen Giesl
LuFG Informatik 2, RWTH Aachen University, Germany
Abstract. To (dis)prove termination of Cprograms, AProVE uses sym-
bolic execution to transform the program’s LLVM code into an integer
transition system, which is then analyzed by several backends. The trans-
formation steps in AProVE and the tools in the backend only produce
sub-proofs in their domains. Hence, we now developed new techniques
to automatically combine the essence of these proofs. If non-termination
is proved, then they yield an overall witness, which identifies a non-
terminating path in the original Cprogram.
1 Verification Approach and Software Architecture
To prove (non-)termination of a Cprogram, AProVE uses the Clang compiler [7]
to translate it to the intermediate representation of the LLVM framework [15].
Then AProVE symbolically executes the LLVM program and uses abstraction to
obtain a finite symbolic execution graph (SEG) containing all possible program
runs. We refer to [14,17] for further details on our approach to prove termination.
To prove non-termination, AProVE runs three approaches in parallel, see Fig.
1. The first two approaches transform the lassos of the SEG to integer transition
systems (ITSs), which are then passed to the tools T2 [6] and LoAT [11]. If one
of the tools returns a proof of non-termination, AProVE uses it to construct a
non-terminating path through the Cprogram. The path of the first succeed-
ing approach is returned to the user while all other computations are stopped.
T2’s proof consists of a recurrent set characterizing those variable assignments
that lead to a non-terminating ITS run. Here, AProVE uses an SMT solver to
identify a corresponding concrete assignment of the variables in the ITS (which
correspond to the variables in the (abstract) program states of the SEG). The
third approach transforms the lassos of the SEG directly to SMT formulas which
are only satisfiable if there is a non-terminating path, and in this case, we can
deduce a variable assignment from the model of the formulas returned by the
solver. While the first and the third approach were already available in AProVE
before [13], we now extended them by the generation of non-termination wit-
nesses. To this end, the variable assignment obtained from these approaches
funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Founda-
tion) - 235950644 (Project GI 274/6-2)
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 403–407, 2022.
https://doi.org/10.1007/978-3-030-99527-0_21
C
Program
LLVM
Program
Symbolic
Execution
Graph
Lasso
ITS
SMT
Formula
NO +LoAT Proof
NO + Recurrent Set
Simplification
Tree
NO +
Variable
assignment
Concrete
Execution
Path in SEG
Path
in LLVM
Program
Path
in C
Program
LoAT
T2
SMT Solvers
Fig. 1: AProVE’s Workflow for Non-Termination Analysis
is used by AProVE to step through the corresponding lasso of the SEG in or-
der to obtain a concrete execution path which witnesses non-termination. To
ensure that the generation of the path terminates, AProVE stops as soon as a
program state of the SEG is visited twice. Thus, this approach only succeeds if
the first loop on the path whose body is executed several times is already the
non-terminating loop. However, it does not find non-termination witnesses for
programs with several loops, where the non-terminating path first leads through
several iterations of other loops before it ends in a non-terminating loop.
vo i d f ( x , y ) {
y = 0;
while ( x > 0) {
x = x - 1;
y = y + 1;
}
while ( y > 1)
y = y;
}
Fig. 2: Example CFunction
To handle such programs as well, we now
developed a novel second approach for prov-
ing non-termination which uses our tool LoAT
in the backend. To understand how LoAT finds
non-termination proofs, consider the function f
in Fig. 2. The first loop decrements xas long
as xis positive and increments yby the
same amount. Afterwards, the second loop
does not terminate if yis greater than 1.
Hence, the function fdoes not terminate
if the initial value of the parameter xis greater than 1. LoAT can
detect such coherences in the corresponding ITS (Fig. 3a) generated by
AProVE. To this end, LoAT uses different forms of loop acceleration:
r0:f(x, y)1(x, 0)
r1:1(x, y)1(x1, y +1) [x > 0]
r2:1(x, y)2(x, y ) [x0]
r3:2(x, y)2(x, y ) [y > 1]
Fig. 3a: Corresponding ITS
r4:1(x, y)1(0, y +x) [x > 0]
r5:2(x, y) [y > 1]
r6:f(x, y)1(0, x) [x > 0]
r7:f(x, y)2(0, x) [x > 0]
r8:f(x, y) [x > 1]
Fig. 3b: Simplified Rules
Finite acceleration combines several itera-
tions of a looping rule into a new rule. LoAT
applies this simplification to the rule r1rep-
resenting the first loop, resulting in the new
rule r4in Fig. 3b. In the second looping rule
r3, the guard is invariant w.r.t. the update
of the variables in this rule. In such a case,
LoAT applies non-terminating acceleration,
transforming r3to r5. Finally, chaining al-
lows to represent the successive execution of
two rules. For example, the rule r6is the
result of chaining r0and r4. The exact sim-
plification steps performed by LoAT in this
example are shown in Fig. 3c. Note that the
final rule r8starts from the initial function
404 J. Hensel et al.
symbol and directly goes to non-termination. Every variable assignment satisfy-
ing the respective final guard x > 1results in a non-terminating run.
r8
r7
r6
r0
r4
r1r2
r5
r3
chaining
chaining
chaining
finite
acceleration
non-terminating
acceleration
Fig. 3c: Simplification Tree
The simplification tree in Fig. 3c is also the
starting point for our new technique to generate
non-termination witnesses. AProVE constructs
this tree from LoAT’s proof output. Then, by
processing the leaves of the simplification tree
from left to right, a path through the SEG can
be derived. To determine how often one has to
traverse earlier loops on the path to the non-
terminating loop, AProVE uses an SMT solver
to find a concrete variable assignment that satisfies the final guard. In our ex-
ample, the final guard x > 1would be satisfied by {x= 2, y = 0}, for example.
Consequently, the corresponding concrete execution path includes two iterations
of the first loop before reaching the non-terminating second loop.
Once the path is constructed, AProVE extracts the LLVM program positions
from the states, obtaining a non-terminating path through the LLVM program in
form of a lasso. Using the Clang debug information output, AProVE then matches
the LLVM lines to the lines in the Cprogram. The resulting Cwitness can be
validated by the tools CPAchecker [5] and Ultimate Automizer [12].
2 Discussion of Strengths and Weaknesses
In general, AProVE is especially powerful on programs where a precise modeling
of the values of program variables and memory contents is needed to (dis)prove
termination. However, on large programs containing many variables which are
not relevant for termination, tools with CEGAR-based approaches are often
faster. The reason is that AProVE does not implement any techniques to decide
which variables are relevant for (non-)termination.
Furthermore, one of AProVE’s most crucial weaknesses when proving non-
termination in past editions of SV-COMP was to produce a meaningful witness.
Therefore, in the two approaches for proving non-termination in AProVE that
are based on T2 or on the direct analysis of lassos of the SEG, we added the
novel techniques presented in the current paper to generate non-termination
witnesses from the obtained variable assignments. Here, the problem is that
when computing a concrete execution path, we cannot be sure when to stop the
computation: Whenever we visit a program position repeatedly, we do not know
if this position is part of the non-terminating loop of the lasso, or if it is still
part of the finite path to the non-terminating loop.
In contrast, in our new approach based on LoAT, the simplification tree al-
lows us to infer the order in which the loops of the program are traversed and
this tree also contains the information which loop is the non-terminating one.
Thus, this approach extends AProVE’s power substantially, since it can find
non-termination witnesses for programs where all non-terminating paths lead
through several iterations of more than one loop. On the other hand, there are
AProVE: Non-Termination Witnesses for CPrograms 405
also examples where the other two approaches outperform the approach based on
LoAT, e.g., if T2 finds a non-termination proof and LoAT does not. Our observa-
tion is that especially for small programs containing only a single loop, the other
approaches are often faster. This is also confirmed by our results in the Termina-
tion category of SV-COMP 2022 : While in the sub-categories MainControlFlow
and MainHeap, 83% of the non-termination proofs are found using T2 or the
direct SMT approach, in Termination-Other, 95% of the non-termination proofs
result from the LoAT approach. This set consists of especially large programs,
which often contain more than one loop.
More information about SV-COMP 2022 including the competition results
can be found in the competition report [3].
3 Setup and Configuration
AProVE is developed in the Programming Languages and Verification group
headed by J. Giesl at RWTH Aachen University. On the web site [2], AProVE
can be downloaded or accessed via a web interface. Moreover, [2] also contains a
list of external tools used by AProVE and a list of present and past contributors.
In SV-COMP 2022,AProVE only participates in the category Termination”.
All files from the submitted archive must be extracted into one folder. AProVE
is implemented in Java and needs a Java 11 Runtime Environment. Moreover,
AProVE requires the Clang compiler [7] to translate Cto LLVM. To analyze the
resulting ITSs in the backend, AProVE uses LoAT [11] and T2 [6]. Furthermore,
it applies the satisfiability checkers Z3 [8], Yices [9], and MiniSAT [10] in parallel
(our archive contains all these tools). As a dependency of T2,Mono [16] (version
4.0) needs to be installed. Extending the path environment is necessary so that
AProVE can find these programs. Using the wrapper script aprove.py in the
BenchExec repository, AProVE can be invoked, e.g., on the benchmarks defined
in aprove.xml in the SV-COMP repository. The most recent version of AProVE
with the improved witness generation can be downloaded at [1].
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [3] and available on the competition web site. This includes
the verification tasks, results, witnesses, scripts, and instructions for reproduction.
The version of our verifier as used in the competition is archived together with other
participating tools [4].
References
1. AProVE:https://github.com/aprove-developers/aprove-releases/releases
2. AProVE Website: https://aprove.informatik.rwth-aachen.de/
3. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS ’22.
LNCS (2022)
4. Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software Verifi-
cation (SV-COMP 2022). Zenodo (2022), https://doi.org/10.5281/zenodo.5959149
406 J. Hensel et al.
5. Beyer, D., Keremoglu, M.E.: CPAchecker: A tool for configurable soft-
ware verification. In: Proc. CAV ’11. pp. 184–190. LNCS 6806 (2011),
https://doi.org/10.1007/978-3-642-22110-1_16
6. Brockschmidt, M., Cook, B., Ishtiaq, S., Khlaaf, H., Piterman, N.: T2: Tempo-
ral property verification. In: Proc. TACAS ’16. pp. 387–393. LNCS 9636 (2016),
https://doi.org/10.1007/978-3-662-49674-9_22
7. Clang:https://clang.llvm.org
8. de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Proc. TACAS ’08. pp.
337–340. LNCS 4963 (2008), https://doi.org/10.1007/978-3-540-78800-3_24
9. Dutertre, B., de Moura, L.: System Description: Yices 1.0 (2006),
https://yices.csl.sri.com/papers/yices-smtcomp06.pdf
10. Eén, N., Sörensson, N.: An extensible SAT-solver. In: Proc. SAT ’03. pp. 502–518.
LNCS 2919 (2003), https://doi.org/10.1007/978-3-540-24605-3_37
11. Frohn, F., Giesl, J.: Proving non-termination via loop acceleration. In: Proc. FM-
CAD ’19. pp. 221–230 (2019), https://doi.org/10.23919/FMCAD.2019.8894271
12. Heizmann, M., Dietsch, D., Leike, J., Musa, B., Podelski, A.: Ultimate Automizer
with array interpolation. In: Proc. TACAS ’15. pp. 455–457. LNCS 9035 (2015),
https://doi.org/10.1007/978-3-662-46681-0_43
13. Hensel, J., Emrich, F., Frohn, F., Ströder, T., Giesl, J.: AProVE: Proving
and disproving termination of memory-manipulating Cprograms (competi-
tion contribution). In: Proc. TACAS ’17. pp. 350–354. LNCS 10206 (2017),
https://doi.org/10.1007/978-3-662-54580-5_21
14. Hensel, J., Giesl, J., Frohn, F., Ströder, T.: Termination and complexity
analysis for programs with bitvector arithmetic by symbolic execution. Jour-
nal of Logical and Algebraic Methods in Programming 97, 105–130 (2018),
https://doi.org/10.1016/j.jlamp.2018.02.004
15. Lattner, C., Adve, V.S.: LLVM: A compilation framework for lifelong pro-
gram analysis & transformation. In: Proc. CGO ’04. pp. 75–88 (2004),
https://doi.org/10.1109/CGO.2004.1281665
16. Mono:https://www.mono-project.com/
17. Ströder, T., Giesl, J., Brockschmidt, M., Frohn, F., Fuhs, C., Hensel, J., Schneider-
Kamp, P., Aschermann, C.: Automatically proving termination and memory safety
for programs with pointer arithmetic. J. of Aut. Reasoning 58(1), 33–65 (2017),
https://doi.org/10.1007/s10817-016-9389-x
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
AProVE: Non-Termination Witnesses for CPrograms 407
BRICK: Path Enumeration Based Bounded
Reachability Checking of C Program
(Competition Contribution)?
Lei Bu() , Zhunyi Xie, Lecheng Lyu, Yichao Li, Xiao Guo, Jianhua Zhao,
and Xuandong Li
State Key Laboratory for Novel Software Technology, Nanjing University, China
bulei@nju.edu.cn
Abstract. BRICK is a bounded reachability checker for embedded C
programs. BRICK conducts a path-oriented style checking of the bounded
state space of the program, that enumerates and checks all the possi-
ble paths of the program in the threshold one by one. To alleviate the
path explosion problem, BRICK locates and records unsatisfiable core
path segments during the checking of each path and uses them to prune
the search space. Furthermore, derivative free optimization based falsi-
fication and loop induction are introduced to handle complex program
features like nonlinear path conditions and loops efficiently.
1 Verification Approach
Existing bounded software checkers usually encode the bounded state space of
the program into one constraint solving problem directly. However, in this man-
ner, when the size of the program or the bound of the checking increases, the
corresponding constraint solving problem explodes quickly and becomes difficult
to solve by existing SAT/SMT solvers.
To solve this problem, BRICK conducts a path-oriented style checking of the
bounded state space of the program, that enumerates and checks all the possible
paths in the threshold one by one [1,2]. The main merit of the approach is that,
in this case, the size of the problem needs to be solved by the constraint solver is
well controlled and can be easily handled. The main features of BRICK’s solving
are reported below:
1.1 Flexible Path Enumeration
BRICK enumerates potential paths from the control flow graph (CFG) of the
given program to the user-defined step bound. Two path enumeration strategies
are applied in BRICK, each with its own advantages.
?This work is supported in part by the National Key Research and Development Plan
(No. 2017YFA0700604), the Leading-Edge Technology Program of Jiangsu Natural
Science Foundation (No. BK20202001), and the National Natural Science Foundation
of China (No.62172200, No.61632015).
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 408–412, 2022.
https://doi.org/10.1007/978-3-030-99527-0_22
First, we can simply conduct classical Depth-first-search (DFS) to enumerate
program paths. The benefit of this approach is that, if the DFS stops without
touching the given bound, we can get a result that the target state is not reach-
able in general, not only in the bounded state space.
We have also implemented a special method to encode the jump-to relation
between different code blocks into an SAT formula and obtain the potential
path by SAT solving. The benefit is that if the potential path is confirmed to
be infeasible by following path condition solving, the infeasible path segment in
the path can be located and encoded back to the SAT formula to prune all the
future paths containing such infeasible segment.
1.2 Infeasible Path Segment Pool Guided State Space Pruning
BRICK conducts the lazy solving of the path by encoding the path condi-
tion of the potential path into a feasibility problem. BRICK asks a constraint
solver, i.e. SMT solver (Z3 [6]), interval analysis (dReal [4]), and derivative-free
optimization-based solving (Section 1.3), to solve the problem. If the path is de-
cided to be infeasible by the solver, BRICK tries to extract the unsatisfiable core
(UC) of the feasibility problem of this path, and maps the UC constraints to a
infeasible path segment in the path, which will be added to infeasible paths pool.
After that, all the paths that contain any infeasible path in the infeasible path
pool will be reported as unreachable directly in the following path enumeration.
1.3 Derivative-free Optimization Based Constraint Falsification
We can see that constraint solving plays an important role in BRICK. However,
complex path conditions, like nonlinear constraints, which widely appear in pro-
grams, are hard to be handled efficiently by the existing solvers. In BRICK, a
classification model-based derivative-free-optimization (DFO) approach is used
to alleviate this difficult situation by conducting a sample-feedback-learn style
DFO solving [8].
More specifically, the underlying solver guesses sample solution for the feasi-
bility problem. Then, we evaluate whether the sampled solution can satisfy the
path constraint or not, and calculate the distance between the sampled solution
and the correct one if the sampled one does not satisfy the path constraint. Such
distance will be used as the metric of feedback in the classification-based DFO
learning, to guide the solver to converge to the value that fits the path con-
straint. In practice, this approach works very well in nonlinear problem solving.
However, this DFO-based approach can not tell the target is not reachable, if it
fails to find a solution.
1.4 Induction-based Loop Handling
If the target program contains loops, the number of potential paths may explode.
To alleviate this problem, we conduct an induction-based proof to try to handle
the loop before we start to do the BMC.
BRICK: Path Enumeration Based Bounded Reachability Checking 409
First of all, we collect the constraints from the assertions and generate the
weakest precondition respectively. Then, we conduct the normal induction-based
proof to see whether such constraints are satisfied in any iteration. If no coun-
terexamples are returned, we know that the assertions won’t be violated in the
loop. Furthermore, we are also working on the integration of loop invariant gen-
eration to further refine the CFG under checking.
2 Software Architecture
The architecture of BRICK is shown in Fig.1. It consists of a loop processing
module, a path enumerating module, and a constraint solving module, all im-
plemented in C++ language.
In the loop processing module, if the program contains assertion-related loop,
BRICK conducts loop induction-based verification firstly. If the induction works,
BRICK reports unreachable; otherwise, it builds the program CFG, and performs
the following path enumeration based checking.
In the path enumerating module, BRICK employs SAT-based and DFS-based
path enumerating methods to extract the program path and its corresponding
path condition. The constraint solving module accepts the path condition and
performs constraint solving accordingly. All the techniques used has been men-
tioned in Section 1 respectively. The solvers used in BRICK including SAT solver
MiniSAT [3], SMT solver Z3 [6], interval analysis solver dReal [4], and our im-
plementation of the DFO method RACOS [9].
Fig. 1. Architecture of BRICK
3 Strengths and Weaknesses
Most of the bounded reachability checkers, i.e. CBMC [5], encode the bounded
state space to a huge SMT formula consisting of both conjunction and disjunc-
410 L. Bu et al.
tion of different kinds of formulas, which are difficult for the existing solvers to
handle and may cause memory explosion easily. Instead, BRICK conducts the
verification in a path-oriented way:
BRICK enumerates and checks all the potential paths one by one. In this
manner, the computation complexity is well controlled.
Meanwhile, as only the ongoing path is saved in the memory and the cor-
responding path constraints of the path will be disjunction-free, the solving
problem is much easier to handle.
For the sake of processing capability, UC-guided backtracking and path prun-
ing is proposed to prune the search space substantially, and DFO-based solv-
ing is conducted to handle complex nonlinear constraints efficiently.
BRICK had participated in the ReachSafety/Floats category of SV-COMP 2022
[10]. BRICK has successfully verified 439 of all the 469 tasks, ranked 1st in this
sub-category. Furthermore, we can see that for these 439 solved cases, BRICK
only used 1000 seconds in total. On the other hand, CoveriTeam and VeriAbs [7]
which won the 2nd and 3rd place in this category spent 9300 and 18000 seconds
respectively, which are 9 and 18 times higher than BRICK.
For the weakness, like all the other bounded checkers, BRICK may not be
able to give proofs of correctness of a program, if it can not finish the search
in the given step bound. In this case, BRICK can only report bounded true.
For example, on the cases of SV-COMP 2022, besides the 439 cases which are
proved by BRICK, there are also several programs that BRICK can only give
a bounded result or just timeout. Therefore, for the future work, we are im-
plementing techniques including loop summary, k-induction and so on to try to
abstract the loops and give a proof of the correctness in certain cases.
4 Tool Setup and Configuration
The binary file of BRICK for Ubuntu 20.04 is available at https://github.com/
brick-tool-dev/BRICK-2.0. To install the tool, please clone this repository and
follow the instruction in README.md. A tailored version of BRICK took part
in the ReachSafety/Floats category in SV-COMP 2022 [10]. The version [11]
supports the checking of reachability of Error Function. The BenchExec wrapper
script for the tool is brick.py and brick.xml is the benchmark description file.
5 Software Project and Contributors
BRICK is available under MIT License. The team of BRICK is from Software
Engineering Group, Nanjing University. We would like to thank Sicun Gao for
his kindly help with the usage of dReal.
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [10] and available on the competition web site. This includes
the verification tasks, results, witnesses, scripts, and instructions for reproduction.
The version of our verifier as used in the competition is archived together with other
participating tools [11].
BRICK: Path Enumeration Based Bounded Reachability Checking 411
References
1. L. Bu, et al.. BACH: Bounded Reachability Checker for Linear Hybrid Automata.
In FMCAD’08, pp. 65-68.
2. D. Xie, et al.. SAT-LP-IIS Joint-Directed Path-Oriented Bounded Reachability
Analysis of Linear Hybrid Automata. In FMSD. 45(1): 42-62, 2014.
3. N. enand N. orensson. An extensible SAT-solver. In SAT’03, 502-518.
4. S. Gao, et al.. dReal: An SMT solver for nonlinear theories over the reals. In
CADE’13, 208-214.
5. D. Kroening, et al.. CBMC - C Bounded Model Checker. In TACAS’14: 389-391.
6. L. De Moura and N. Bjørner. Z3: An Efficient SMT Solver. In TACAS’08: 337-340.
7. P. Darke, et al.. VeriAbs: Verification by Abstraction and Test Generation - (Com-
petition Contribution). In TACAS’18: 457-462.
8. L. Bu, et al.. Machine learning steered symbolic execution framework for complex
software code. In Formal Aspects of Computing, 33:3, 301-323, 2021.
9. Y. Yu, et al.. Derivative-free optimization via classification. In AAAI’16, 2286-2292.
10. D. Beyer, Progress on Software Verification: SV-COMP 2022. In TACAS’22
11. D. Beyer, Verifiers and Validators of the 11th Intl. Competition on Software Veri-
fication (SV-COMP 2022). Zenodo, DOI:10.5281/zenodo.5959149, 2022
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
412 L. Bu et al.
A Prototype for Data Race Detection in CSeq 3
(Competition Contribution)
Alex Coto, Omar Inverso, Emerson Sales, and Emilio Tuosto
Gran Sasso Science Institute, L’Aquila, Italy
{alex.coto,omar.inverso,emerson.sales,emilio.tuosto}@gssi.it
Abstract. We sketch a sequentialization-based technique for bounded
detection of data races under sequential consistency, and summarise the
major improvements to our verification framework over the last years.
Keywords: Bounded model checking ·Context-bounded analysis ·Se-
quentialization ·Data races ·Reachability ·Concurrency ·Threads
1 Verification Approach
Our approach is based on lazy sequentialization [7]. The idea is to convert the
concurrent program Pof interest into a non-deterministic sequential program
Qu,k that preserves all feasible executions of Pup to unwinding bound uand
krounds (or execution contexts [8]). Among different techniques [6], we choose
bounded model checking [3] to analyse Qu,k . In this section, we briefly overview
lazy sequentialisation, and sketch a novel extension to detect data races. Further
elements of novelty w.r.t. engineering of our tool are discussed in the next section.
Lazy Sequentialization. We unwind all loops and inline all functions in P,
except the main function and those from which a thread is spawned, obtaining a
bounded program Puthat preserves all feasible executions of Pup to the unwind-
ing bound u. We then transform each function of Puinto a thread simulation
function where each visible statement is assigned a numerical label and a guard,
and each call to a concurrency-specific function is replaced by a call to a function
that models the same intended semantics; for each simulation function, we add
a global variable to represent the program counter, initially set to zero.
A thread’s execution context of Puis simulated by invoking the corresponding
thread simulation function of Qu,k that executes from the first statement to a
non-deterministically selected label, updates the program counter, and returns.
Further execution contexts are simulated by re-invoking the simulation function,
where the guards ensure that the control is repositioned to the correct numerical
label via a sequence of jumps, and so on. To retain consistency of the local state of
the thread across different invocations of the simulation functions, static storage
This work has been partially funded by MIUR project PRIN 2017FTXR7S IT-
MATTERS and MUR project FISR2020IP 05310 MVM-Adapt.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 413–417, 2022.
https://doi.org/10.1007/978-3-030-99527-0_23
is enforced for all local variables. We drive the overall simulation of Pufrom the
main function of Qu,k, by invoking the thread simulation functions appropriately.
Data Race Detection. A program contains a data race if it can execute two
conflicting actions (i.e., one thread modifies a memory location and another
one reads or modifies the same location), at least one of which is not atomic,
and neither happens before the other [9]. Consider two threads performing the
operation v = v + 1 on a shared variable initialised to zero. Both threads try
to modify the data at the memory location reserved for v, but the necessary
sequences of memory accesses are not synchronised, and thus may interleave. If
a context-switch happens between the memory read and write operations in the
thread that runs first, both threads will read 0, and at the end of the execution
the value of vwill be 1. To detect such situation, we alter the encoding from Pu
k:
void *w addr = &v;
assert(w addrs[1] != w addr);
w addrs[0] = w addr;
v = v + 1;
k+1:
w addrs[0] = 0;
to Qu,k by (i) adding a shared array w addrs that
stores a pointer to the memory location targeted
by a write operation for each thread, (ii) injecting
additional control code at each visible statement,
and (iii) splitting the modified sequentialised en-
coding of the visible statement into two separate
sequentialised statements to allow in-between context switching. The code frag-
ment shows the modified sequentialised encoding (no guards for simplicity, in-
jected code greyed out) for the statement v = v + 1 of the first thread of the
program described above. We store in w addr the address of the variable being
written, and then assert that the other thread is not writing to the same location;
in the same (simulated) execution context, we store w addr in w addrs, so that
the assertion can be checked within the other thread too. We reset w addrs right
after the statement under consideration. Note the label k+1 that allows thread
pre-emption. Now, one of the threads can execute the simulated statement at
label kand context-switch at label k+1 while w addrs still points to v; this makes
it possible to schedule the other thread, and fail the assertion in there.
In the general case, handling multiple memory write accesses for a single
statement requires a slightly different tracking mechanism for write addresses,
or decomposition into simpler statements. Statements with read-only shared
memory access are handled without updating w addrs. Programs with more
than two threads require multiple assertions.
2 Software Architecture
CSeq is a framework for quick development of static analysis and program trans-
formation prototypes. For parsing the input program CSeq relies on pycparserext
(pypi.org/project/pycparserext), an extension of pycparser (github.com/
eliben/pycparser), which in turn is built on top of PLY (www.dabeaz.com/
ply), a Python implementation of Lex and Yacc. All the mentioned components
as well as CSeq are entirely written in Python.
We combined several groups of modules in CSeq, namely (i) program sim-
plification, (ii) program unfolding, (iii) sequentialization, (iv) instrumentation,
414 E. Sales et al.
and (v) backend invocation and counterexample generation. For the analysis of
the sequentialised program we rely on CBMC (www.cprover.org/cbmc), that in
turn embeds the DPLL-style MiniSat SAT solver (minisat.se).
CSeq 3.0 incorporates a significant number of enhancements. At an architec-
tural level, the main element of novelty is in the modularity between the general-
purpose functionalities of the framework and the specific lazy sequentialization,
which opens up to the possibility of prototyping different static analysers for
other applications (e.g., [11,10]) as well as improving older sequentialization-
based prototypes (e.g., [4,12,13] and variations thereof). The enhancements to
the framework include: Python 3 support, support for GNU C compiler exten-
sions, a fully re-implemented symbol table, revised general-purpose modules such
as constant propagation, function inlining, and loop unrolling, and a custom-
built version of CBMC (not used in the competition) for SAT-solving under
assumptions. For the competition we include (experimental) enhanced constant
propagation, and simplified function inlining. Besides the data race checking
extension, the sequentialization modules include improvements from earlier im-
plementations [5,8,6] and for different editions of SV-COMP up to date, in
particular: extended pthread API support (conditional waiting, barriers, and
thread-specific data management), context-bounded analysis, and a major code
overhaul.
3 Strengths and Weaknesses
The table below summarises the performance of our tool on the 764 cases of the
Concurrency category and the 162 cases of the data race demo category.
Overall instances 764 162
Correct safe 202 37
unsafe 320 61
Unknown
reject 9 19
internal error 18 17
out of time 159 20
out of memory 56 2
Incorrect safe 0 0
unsafe 0 6
Our technique excels at hunting bugs, as shown
by the number of correct unsafe (incl. 17 mal-
formed witnesses and 50 unconfirmed witnesses),
but gets quickly expensive with larger bounds,
hitting the resource limits. The additional context-
switch points and the use of pointers for data race
detection introduce further overhead. The other
failures are due to limiting assumptions or glitches in the implementation. All
the false positives are due to corner cases in the encoding.
4 Setup and Configuration
We competed in the ConcurrencySafety category and in the data race detection
demo category. CSeq 3.0 is available at https://github.com/omainv/cseq/
releases.
Installation instructions are in the README file within the package. A wrap-
per script (lazy-cseq.py) invokes CSeq up to three times, with the options
-l lazy for lazy sequentialisation, --sv-comp to enable the required violation
witnesses format, --atomic-parameters to assume atomic passing of function
A Prototype for Data Race Detection in CSeq 3 415
arguments, --nondet-condvar-wakeups for non-deterministic spurious condi-
tional variables wake-up calls, --deep-propagation for experimental constant
folding and propagation, --32 for 32-bit architectures, --threads 100 to limit
the overall number of threads, --data-race-check when required, and --backend
cbmc to use CBMC 5.4 for sequential analysis.
For reachability checking, on different invocations the script adds different pa-
rameters: -r2 -w2 -f2,-r4 -w3 -f5, and -r20 -w1 -f11, where ris the number
of rounds, and fand ware the unwind bounds for for (i.e., potentially bounded)
and while (i.e., potentially unbounded) loops, respectively; on the last invoca-
tion --softunwindbound and --unwind-for-max 10000 are also added to fully
unfold for loops if a static bound can be found, up to the given hard bound. For
data race detection, the above parameters are replaced with -c4 -u2,-c10 -u10,
and -c50 -w20 -f20 with --unwind-for-max 100. Note that in this case the
bound is on the number of execution contexts rather than rounds (-c vs. -r),
and -u is used as a shorthand for -f and -w.
We leave the analysis running to completion every time. When the result is
TRUE, the scripts restarts the analysis with the next set of parameters. As soon
as the script gets FALSE, it returns FALSE. Only if the analysis using the last set
of parameters is finished and the result is TRUE, then the script returns TRUE.
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [1] and available on the competition web site. This includes
the verification tasks, results, witnesses, scripts, and instructions for reproduction.
The version of our verifier as used in the competition is archived together with other
participating tools [2].
References
1. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS.
Springer (2022)
2. Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software Verifi-
cation (SV-COMP 2022). Zenodo (2022). https://doi.org/10.5281/zenodo.5959149
3. Clarke, E.M., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs.
In: TACAS. Lecture Notes in Computer Science, vol. 2988, pp. 168–176. Springer
(2004). https://doi.org/10.1007/978-3-540-24730-2 15
4. Fischer, B., Inverso, O., Parlato, G.: Cseq: A concurrency pre-processor
for sequential C verification tools. In: ASE. pp. 710–713. IEEE (2013).
https://doi.org/10.1109/ASE.2013.6693139
5. Inverso, O., Nguyen, T.L., Fischer, B., Torre, S.L., Parlato, G.: Lazy-cseq: A
context-bounded model checking tool for multi-threaded c-programs. In: ASE. pp.
807–812. IEEE Computer Society (2015). https://doi.org/10.1109/ASE.2015.108
6. Inverso, O., Tomasco, E., Fischer, B., La Torre, S., Parlato, G.: Bounded verifica-
tion of multi-threaded programs via lazy sequentialization. ACM Trans. Program.
Lang. Syst. 44(1) (dec 2021). https://doi.org/10.1145/3478536
7. Inverso, O., Tomasco, E., Fischer, B., Torre, S.L., Parlato, G.: Bounded model
checking of multi-threaded C programs via lazy sequentialization. In: CAV.
Lecture Notes in Computer Science, vol. 8559, pp. 585–602. Springer (2014).
https://doi.org/10.1007/978-3-319-08867-9 39
416 E. Sales et al.
8. Inverso, O., Trubiani, C.: Parallel and distributed bounded model check-
ing of multi-threaded programs. In: PPoPP. pp. 202–216. ACM (2020).
https://doi.org/10.1145/3332466.3374529
9. ISO/IEC: ISO/IEC 9899:2018: Information technology Programming languages
C (Jun 2018)
10. Simic, S., Bemporad, A., Inverso, O., Tribastone, M.: Tight error analysis in fixed-
point arithmetic. In: IFM. Lecture Notes in Computer Science, vol. 12546, pp.
318–336. Springer (2020). https://doi.org/10.1007/978-3-030-63461-217
11. Simic, S., Inverso, O., Tribastone, M.: Bit-precise verification of discontinuity er-
rors under fixed-point arithmetic. In: SEFM. Lecture Notes in Computer Science,
vol. 13085, pp. 443–460. Springer (2021). https://doi.org/10.1007/978-3-030-92124-
8 25
12. Tomasco, E., Inverso, O., Fischer, B., Torre, S.L., Parlato, G.: Verifying concurrent
programs by memory unwinding. In: TACAS. Lecture Notes in Computer Science,
vol. 9035, pp. 551–565. Springer (2015). https://doi.org/10.1007/978-3-662-46681-
0 52
13. Tomasco, E., Nguyen, T.L., Inverso, O., Fischer, B., Torre, S.L., Parlato, G.: Lazy
sequentialization for TSO and PSO via shared memory abstractions. In: FMCAD.
pp. 193–200. IEEE (2016). https://doi.org/10.1109/FMCAD.2016.7886679
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
A Prototype for Data Race Detection in CSeq 3 417
Dartagnan: SMT-based Violation Witness
Validation (Competition Contribution)
Hern´an Ponce-de-Le´on1()?, Thomas Haas2, and Roland Meyer2
1Bundeswehr University Munich, Munich, Germany
2TU Braunschweig, Braunschweig, Germany
hernan.ponce@unibw.de,t.haas@tu-braunschweig.de,roland.meyer@tu-bs.de
Abstract. The validation of violation witnesses is an important step
during software verification. It hides false alarms raised by verifiers from
engineers, which in turn helps them concentrate on critical issues and
improves the verification experience. Until the 2021 edition of the Com-
petition on Software Verification (SV-COMP), CPAchecker was the
only witness validator for the ConcurrencySafety category. This article
describes how we extended the Dartagnan verifier to support the valida-
tion of violation witnesses. The results of the 2022 edition of the competi-
tion show that, for witnesses generated by different verifiers, Dartagnan
succeeds in the validation of witnesses where CPAchecker does not.
Our extension thus improves the validation possibilities for the overall
competition. We discuss Dartagnan’s strengths and weaknesses as a
validation tool and describe possible ways to improve it in the future.
1 Introduction
Most software verification tools report witnesses to property violations. Since
SV-COMP 2015, there is a common format in which witnesses are represented
by automata [4]. Each edge of such an automaton is annotated with data that
can be used to match program executions. A data annotation can be, e.g., “as-
sumption specifying constraints on values of variables in a given state, “control”
specifying the outcome of a branch condition, or “startline” specifying a concrete
line in the source code. More details about data annotations and their semantics
can be found in the exchange format documentation [1].
A witness validator checks that a violation can be reproduced using the
information provided by the witness. Automata-based verifiers can easily be
converted into validators by analyzing the synchronized product of the program
with the witness automaton. In this setting, the witness automaton guides the
verifier. If none of the outgoing edges on the program state match the next
edge of the witness automaton, then the verifier cannot explore the current path
further. If the edge on the program state matches, then the witness automaton
and the program proceed to the next state, eventually leading to a violation.
?Jury member.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 418–423, 2022.
https://doi.org/10.1007/978-3-030-99527-0_24
While this idea allows one to easily convert any automata-based verifier into a
validator, not all verifiers are automata-based.
Dartagnan is an SMT-based verifier. In the next section, we explain how to
convert it into a validator. The idea is to extract information from the witness
and use it to reduce the search space explored by the backend SMT solver.
2 Validation Approach
Given a concurrent program and a specification in the form of assertions, Dartag-
nan generates an SMT formula ϕVer =ϕCf ϕDf ϕSc ϕwhich is satisfiable
if and only if some assertion fails [17,16]. The formulas ϕCf and ϕDf encode
(respectively) the control flow and the data flow of the program. Formula ϕSc
encodes scheduling constraints. Finally, ϕexpresses that at least one asser-
tion must fail. If the formula is satisfiable, then a violation exists. The goal of
Dartagnan (as a verifier) is to find such a violation. This amounts to finding
an appropriate scheduling among the threads. Such a scheduling is encoded as
ahappens-before relation between the instructions. Dartagnan thus searches
the space of all viable happens-before relations to find a violation or prove that
none exists.
We now explain how to extend Dartagnan into a violation witness validator.
The idea is to extract from the violation witness a formula ϕYthat we conjoin
to the rest of Dartagnan’s encoding, resulting in ϕVal =ϕVer ϕY.The
extra constraints in ϕYreduce the search space for the SMT solver. For the
verification of concurrent programs taking inputs from the environment, there
are two sources of non-determinism: the data coming from the input (which
might influence the control flow) and the scheduling. The purpose of ϕYis to
reduce this non-determinism. Extending the SMT encoding as described in ϕVal
is conceptually easy. The interesting question is “what information from the
witness shall we use?” The less information we use, the more we move from
pure validation to full verification.
While automata-based validators can use some information in a straight-
forward manner, this is not the case for Dartagnan.
1. A violation witness can contain cycles to represent infinitely many execu-
tions. However, SMT-based tools unroll cycles and perform bounded verifi-
cation, thus only part of this information is helpful.
2. Since Dartagnan (as many other BMC tools) does not keep an explicit
notion of state, using state information is not trivial.
The exchange format for violation witnesses allows for expressing informa-
tion about state assumptions, the control flow, and the scheduling. We abstract
out from the former two and only use scheduling information. We assume that
witness automata represent a single path and that the edges contain “startline”
data corresponding to read or write instructions1. Those are the only instructions
1Our validator accepts witnesses that do not satisfy the second assumption, but it
filters out the corresponding edges.
Da rtag n an : SMT-based Violation Witness Validation 419
that can affect our happens-before relation. While we do not explicitly encode
the outcome of control-flow instructions, certain control-flow information is im-
plicitly encoded based on which instructions are executed. We explain the reason
behind these design decisions and assumptions, discuss its limitations, and de-
scribe how we plan to improve this in the future in Section 3. Despite these
limitations, and as we show in Section 4, our validator performs well in practice.
Let (S, E) be a witness automaton with states Sand edges E. For each
eE, function e2i(e) returns the set of read or write instructions coming from
the “startline” in the C file that corresponds to the given edge. Since witnesses
represent single paths, they can be seen as a word over S. Let wSbe a
witness, we define the witness-to-formula function which constructs ϕYas
w2f(w) =
true if w=
w2f(w0)W
i1e2i(( ,s))
i2e2i((s, ))
happens-before(i1, i2) if w=s·w0
3 Strengths and Weaknesses
The main strengths of our validation approach are simplicity and modularity.
The approach just requires to add a new sub-formula to the SMT encoding
used for verification. The validator is modular in the sense that using more or
different information from the witness does not change the validation approach.
For example, adding information from the witness about the control flow just
requires adding more constraints to ϕY.
Our validation approach assumes that witness automata represent single
paths. This is a limitation not imposed by the exchange format. However, veri-
fiers tend to stop as soon as they find one violation and thus generate witnesses
representing a single violation path. A second limitation is that we do not ex-
plicitly consider control-flow information. This might impact the performance
of the validation since not all non-determinism is removed and the search space
might still be large. Converting such control-flow information into SMT is simple
in principle. However, since Dartagnan internally converts the C program into
Boogie [15], matching conditionals with the corresponding assembly-like jumps
requires some work. A second consequence of not extracting control-flow infor-
mation from the witness is that we might validate witnesses that do not lead
to a violation. This is because we over-approximate the paths of the program
represented by the witness and thus our approximation might include the path
leading to the violation even if the witness did not.
4 Validation Results
We inspected the results of SV-COMP 2022 [5] to answer the following questions
RQ1: What percentage of the witnesses can Dartagnan validate?
RQ2: What percentage can Dartagnan not validate and why?
420 Hern´an Ponce-de-Le´on et al.
RQ3: Can Dartagnan validate witnesses that CPAchecker cannot?
RQ4: Can CPAchecker validate witnesses that Dartagnan cannot?
From the 20 verifiers in ConcurrencySafety, we selected five tools imple-
menting different verification approaches. We consider them good representa-
tives of the whole category: (i) CBMC [13] (used as a backend by Deagle [9]
and Lazy-CSeq [11]), (ii) CPAchecker [7] (used as a backend by CPA-
Lockator [3] and Graves [14]), (iii) EBF [2] (combines BMC with fuzzing, a
very effective technique to find bugs), (iv) Dartagnan [17] (only tool where the
memory model, here sequential consistency, is taken as an input), and (v) Gem-
Cutter [12] (shares the codebase with UTaipan [8] and UAutomizer [10]).
Table 1 presents the results of the validation in SV-COMP 2022. We report
the number of witnesses generated by each verifier (“Witnesses”). For each
of the validators (columns Dartagnan and CPAchecker”), we report the
number of cases where the validation conclusively finished (i.e., it returned True
or False), whether the violation was confirmed (left of “/”) or not (right of “/”),
and the number of correct validations by one tool where the other did not report
a result (columns Dart \CPA and CPA \Dart”, respectively).
Tool Witnesses Dartagnan CPAchecker Dart \CPA CPA \Dart
CBMC 305 193/0 95/0 117 19
CPAchecker 256 0/0 256/0 0 256
Dartagnan 273 245/1 35/6 204 0
EBF 290 219/0 57/0 177 15
GemCutter 299 18/237 262/1 15 28
Table 1. Results of the validation in SV-COMP 2022.
For the SMT-based verifiers CBMC and EBF,Dartagnan has 63.28%
resp. 75.52% success rate in the validation (against 31.15% resp. 19.66% success
rate for CPAchecker). Unfortunately, it did not validate any of the witnesses
generated by CPAchecker. This was due to a bug in the witness parser that
has been identified and fixed after the competition. CPAchecker validated all
the witnesses that it generated as a verifier. Dartagnan validated 89.74% of its
own witnesses while CPAchecker only validated 12.82%. For GemCutter, the
validation success of Dartagnan is only 6.02%. This is because, due to another
bug, it wrongly marked 237 witnesses as not validated. The fixed version of
Dartagnan is able to validate all such cases. Despite this, from the 18 witnesses
that Dartagnan validated, 15 of them were not validated by CPAchecker,
thus improving the validation possibilities for the overall competition.
5 Software Project and Configuration
The project home page is https://github.com/hernanponcedeleon/Dat3M. To
run Dartagnan as a validator, use the following command:
$ Dartagnan-SVCOMP.sh -witness <witness> <property> <program>
Dartagnan: SMT-based Violation Witness Validation 421
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [5] and available on the competition web site. This includes
the verification tasks, results, witnesses, scripts, and instructions for reproduction.
The version of our verifier as used in the competition is archived together with other
participating tools [6].
References
1. Exchange Format for Violation Witnesses and Correctness Witnesses. https://
github.com/sosy-lab/sv-witnesses.
2. Fatimah Aljaafari, Lucas C. Cordeiro, Mustafa A. Mustafa, and Rafael Menezes.
EBF: A hybrid verification tool for finding software vulnerabilities in iot crypto-
graphic protocols. CoRR, abs/2103.11363, 2021.
3. Pavel S. Andrianov, Vadim S. Mutilin, and Alexey V. Khoroshilov. cpalocka-
tor: Thread-modular analysis with projections - (Competition Contribution). In
TACAS (2), volume 12652 of Lecture Notes in Computer Science, pages 423–427.
Springer, 2021. doi:10.1007/978-3- 030-72013-1_25.
4. Dirk Beyer. Software verification and verifiable witnesses - (report on SV-COMP
2015). In TACAS, volume 9035 of Lecture Notes in Computer Science, pages 401–
416. Springer, 2015. doi:10.1007/978-3- 662-46681-0_31.
5. Dirk Beyer. Progress on software verification: SV-COMP 2022. In TACAS (2).
Springer, 2022.
6. Dirk Beyer. Verifiers and validators of the 11th Intl. Competition on Software
Verification (SV-COMP 2022). Zenodo, 2022. doi:10.5281/zenodo.5959149.
7. Dirk Beyer and M. Erkan Keremoglu. CPAchecker: A tool for configurable software
verification. In CAV, volume 6806 of Lecture Notes in Computer Science, pages
184–190. Springer, 2011. doi:10.1007/978-3- 642-22110-1_16.
8. Daniel Dietsch, Matthias Heizmann, Alexander Nutz, Claus Sch¨atzle, and Frank
Sch¨ussele. Ultimate Taipan with symbolic interpretation and fluid abstrac-
tions - (Competition Contribution). In TACAS (2), volume 12079 of Lec-
ture Notes in Computer Science, pages 418–422. Springer, 2020. doi:10.1007/
978-3-030-45237- 7_32.
9. Fei He, Zhihang Sun, and Hongyu Fan. Deagle: An SMT-based verifier for multi-
threaded programs (Competition Contribution). In TACAS (2). Springer, 2022.
10. Matthias Heizmann, Yu-Fang Chen, Daniel Dietsch, Marius Greitschus, Jochen
Hoenicke, Yong Li, Alexander Nutz, Betim Musa, Christian Schilling, Tanja
Schindler, and Andreas Podelski. Ultimate Automizer and the search for per-
fect interpolants - (Competition Contribution). In TACAS (2), volume 10806
of Lecture Notes in Computer Science, pages 447–451. Springer, 2018. doi:
10.1007/978-3-319-89963- 3_30.
11. Omar Inverso, Ermenegildo Tomasco, Bernd Fischer, Salvatore La Torre, and Gen-
naro Parlato. Lazy-CSeq: A lazy sequentialization tool for C - (Competition Con-
tribution). In TACAS, volume 8413 of Lecture Notes in Computer Science, pages
398–401. Springer, 2014. doi:10.1007/978-3- 642-36742-7_46.
12. Dominik Klumpp, Daniel Dietsch, Matthias Heizmann, Frank Sch¨ussele, Marcel
Ebbinghaus, Azadeh Farzan, and Andreas Podelski. Ultimate GemCutter and the
axes of generalization (Competition Contribution). In TACAS (2). Springer, 2022.
422 Hern´an Ponce-de-Le´on et al.
13. Daniel Kroening and Michael Tautschnig. CBMC - C bounded model checker -
(Competition Contribution). In TACAS, volume 8413 of Lecture Notes in Com-
puter Science, pages 389–391. Springer, 2014. doi:10.1007/978- 3-642-54862- 8_
26.
14. William Leeson and Matthew Dwyer. GraVeS: Graph-based verifier selector (Com-
petition Contribution). In TACAS (2). Springer, 2022.
15. K. Rustan M. Leino. This is Boogie 2. 2008. URL: https://www.microsoft.com/
en-us/research/publication/this-is- boogie-2- 2/.
16. Hern´an Ponce de Le´on, Florian Furbach, Keijo Heljanko, and Roland Meyer.
Portability analysis for weak memory models. PORTHOS: One tool for all mod-
els. In SAS, volume 10422 of LNCS, pages 299–320. Springer, 2017. doi:
10.1007/978-3-319-66706- 5\_15.
17. Hern´an Ponce de Le´on, Florian Furbach, Keijo Heljanko, and Roland Meyer.
Dartagnan: Bounded model checking for weak memory models (Competition Con-
tribution). In TACAS (2), volume 12079 of LNCS, pages 378–382. Springer, 2020.
doi:10.1007/978-3-030-45237- 7_24.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Dartagnan: SMT-based Violation Witness Validation 423
Deagle: An SMT-based Verifier for
Multi-threaded Programs
(Competition Contribution) ?
Fei He1,2,3() , Zhihang Sun1,2,3, and Hongyu Fan1,2,3
1School of Software, Tsinghua University, Beijing, China
2Key Laboratory for Information System Security, MoE, Beijing, China
3Beijing National Research Center for Information Science and Technology,
Beijing, China
Abstract. Deagle is an SMT-based multi-threaded program verification
tool. It is built on top of CBMC (front-end) and MiniSAT (back-end). The
basic idea of Deagle is to integrate into the SMT solver an ordering con-
sistency theory that handles ordering relations over the shared variable
accesses in the program. The front-end encodes the input program into
an extended propositional formula that contains ordering constraints.
The back-end is reinforced with a solver for the ordering consistency
theory. This paper presents the basic idea, architecture, installation, and
usage of Deagle.
Keywords: Program verification ·Satisfiability modulo theories ·Con-
currency.
1 Verification Approach
Given a multi-threaded program, the thread communication behaviors can be
modeled using the happens-before relations over memory access (read/write)
events [1]. There are various kinds of happens-before relations: program order
(PO), read-from order (RF ), write serialization order (WS ), and from-read order
(FR). A happens-before ordering formula (abbreviated as ordering formula) is
a logical formula that involves only memory access events and happens-before
relations.
Deagle is an SMT-based multi-threaded program verifier, which consists of
a front-end that encodes the intra-threaded behaviors (e.g., the control and
data flow per thread) into propositional formulas, and the inter-threaded
behaviors (i.e., the communication between threads) into ordering formulas;
a back-end that extends MiniSAT with an ordering consistency theory solver
[8] by following the DPLL(T) framework [7], and is able to solve propositional
formulas and ordering formulas mixed together.
?This work was supported in part by the National Key Research and Development
Program of China (No. 2018YFB1308601) and the National Natural Science Foun-
dation of China (No. 62072267 and No. 62021002).
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 424–428, 2022.
https://doi.org/10.1007/978-3-030-99527-0_25
Compared with [8]: The theory solver in [8] uses a from-read axiom to de-
rive FR orders. Besides the from-read axiom,Deagle also implements a write-
serialization axiom [11], with which WS orders can also be derived. In return,
the front-end of Deagle need not encode both FR and WS orders explicitly.
2 Software Architecture
Deagle is developed on top of CBMC [9] and MiniSAT [6] using C++. Addition-
ally, for ease of usage and debugging, Deagle reuses some modules developed
in Yogar-CBMC [10,11]. Deagle is not a strategy selection-based verifier. Deagle
runs the following procedures successively to verify a given C program:
Preprocessing (from Yogar-CBMC)For each global structure variable in the
C program, the preprocessing procedure unfolds it by creating a fresh variable
for each member. Note that arrays need no preprocessing; CBMC is able to handle
each array as an entity.
Parsing and Goto-Program Generation (originally in CBMC)CBMC em-
ploys Flex and Bison to transform the preprocessed C program into an abstract
syntax tree (AST ). Then CBMC builds a goto program, where all branching state-
ments and loop statements are represented with (conditional) goto statements.
Library Function Modeling (extended from CBMC)CBMC models each
multithreading-related library function (e.g., pthread cond wait). For example,
mutex mcontains a Boolean variable m locked indicating whether mis locked;
pthread mutex lock(&m) assumes m locked to be originally f alse and sets
m locked to true. Based on CBMC, we extend Deagle to support the modeling of
more library functions.
Unwinding We employ bounded model checking (BMC ) [3,4,5] to handle loops.
If the program contains loops, we determine an unwinding limit and unwind the
program to a loop-free bounded program:
If the maximal loop time of the program can be determined through static
analysis, e.g.,
for (i= 0; i < 10; i+ +)
we set the unwinding limit to this maximal loop time;
If the maximal loop time depends on non-determinism. e.g.,
for (i= 0; i<n;i+ +)
where nis attained from the function VERIFIER nondet int, we report
UNKNOWN since such loops cannot be fully unwound.
Otherwise, we set the unwinding limit to 2.
Deagle: An SMT-based Verifier for Multi-threaded Programs 425
Formula Generation (extended from CBMC)After unwinding, the loop-free
program is represented in the static single assignment (SSA) form, where each
thread is a chain of assignments. These assignments can be directly modeled
into first-order logic formulas (for ease of solving, we further convert them into
propositional logic formulas). Additionally, an assignment may contain global
memory access events; we model program orders and read-from orders (please
refer to [8] for more information) of these events into the formulas.
Constraint Solving (extended from MiniSAT)We develop an ordering con-
sistency theory solver and integrate it into the DPLL(T) framework [8]. For
efficiency, we extend MiniSAT, an SAT-based solver, to run our theory solver ex-
clusively. Please refer to [8] for the detailed algorithms of our decision procedure.
Witness Generation (adapted from Yogar-CBMC)If the back-end solver
returns satisfiable (i.e., finds a counterexample violating the property), our or-
dering consistency theory solver reports a sequence (total order) of these events,
which can be used for generating the witness of the counterexample.
3 Strengths and Weaknesses
Compared to the traditional method [1] which explicitly converts ordering for-
mulas into propositional formulas, Deagle employs a dedicated theory solver to
handle ordering formulas, which improves both time and space efficiency. Ignor-
ing some tasks in goblint-regression that require unwinding 10000 times, Deagle
reports TIMEOUT in only 9 tasks and OUT OF MEMORY in only 7 tasks
fewer than most ConcurrencySafety competitors.
In most weaver tasks (117 out of 169), the number of loop iterations is non-
deterministic. As is mentioned in previous section, Deagle reports UNKNOWN.
Since such tasks are common in real-world programs, we are exploring an ap-
proach to dealing with such programs in the future work.
4 Tool Setup and Configuration
The source code of Deagle 1.3 (the submitted version in SV-COMP 2022 [2]) is
publicly accessible 4. Please refer to README for more installation instructions.
In SV-COMP 2022,Deagle participates in ConcurrencySafety category and only
checks property Unreach-Call 5. By setting parameters
32 no unwinding assertions closure
one can reproduce Deagle’s results of SV-COMP 2022.
4Deagle repository: https://github.com/thufv/Deagle
5The benchmark definition of Deagle:https://gitlab.com/sosy-lab/sv-comp/
bench-defs/-/blob/main/benchmark-defs/deagle.xml
426 F. He et al.
4.1 Parameter Definition
Deagle inherits lots of parameters from CBMC. Due to the page limit, we only
describe parameters related to the competition or newly added in Deagle:
* 32/ 64: sets the width of integers to 32/64.
*−−nounwindingassertions: does not generate unwinding assertions into
the formula. Assuming a loop is unwound ntimes, its unwinding assertion
asserts the loop condition to be f alse after niterations. Since unwinding
assertions can lead to false counterexamples, we disable the generation of
unwinding assertions.
* closure/ icd (new in Deagle): uses our proposed approach. Once
the parameter closure is enabled, Deagle employs a transitive closure-
based theory solver (recommended). If icd is enabled, Deagle employs
an incremental cycle detection-based solver. In SV-COMP 2022 [2], Deagle
solves all tasks with the parameter closure.
5 Software Project
Deagle is developed by Fei He, Zhihang Sun, and Hongyu Fan from the Formal
Verification Lab6in Tsinghua University. Deagle is licensed under GPLv3. Since
Deagle is developed over CBMC and MiniSAT, and reuses some modules from
Yogar-CBMC, it also contains copyright of those tools.
6 Acknowledgement
We appreciate SV-COMP hosts for holding the competition and giving advice on
participating. We are also grateful to developers, maintainers, and contributors
of CBMC,MiniSAT, and Yogar-CBMC, on which Deagle is based.
References
1. Alglave, J., Kroening, D., Tautschnig, M.: Partial orders for efficient bounded
model checking of concurrent software. In: Sharygina, N., Veith, H. (eds.) Com-
puter Aided Verification. pp. 141–157. Springer Berlin Heidelberg, Berlin, Heidel-
berg (2013). https://doi.org/10.5555/2958031.2958083
2. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS.
Springer (2022)
3. Biere, A., Cimatti, A., Clarke, E.M., Fujita, M., Zhu, Y.: Symbolic model
checking using SAT procedures instead of BDDs. In: Proceedings of the
36th Annual ACM/IEEE Design Automation Conference. p. 317–320. DAC
’99, Association for Computing Machinery, New York, NY, USA (1999).
https://doi.org/10.1145/309847.309942
6homepage: https://thufv.github.io/team
Deagle: An SMT-based Verifier for Multi-threaded Programs 427
4. Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without
BDDs. In: Cleaveland, W.R. (ed.) Tools and Algorithms for the Construction and
Analysis of Systems. pp. 193–207. Springer Berlin Heidelberg, Berlin, Heidelberg
(1999). https://doi.org/10.1007/3-540-49059-0 14
5. Clarke, E., Biere, A., Raimi, R., Zhu, Y.: Bounded model checking us-
ing satisfiability solving. Form. Methods Syst. Des. 19(1), 7–34 (Jul 2001).
https://doi.org/10.1023/A:1011276507260
6. en, N., orensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella,
A. (eds.) Theory and Applications of Satisfiability Testing. pp. 502–518. Springer
Berlin Heidelberg, Berlin, Heidelberg (2004). https://doi.org/10.1007/3-540-49059-
0 14
7. Ganzinger, H., Hagen, G., Nieuwenhuis, R., Oliveras, A., Tinelli, C.: DPLL(T):
Fast decision procedures. In: CAV (2004). https://doi.org/10.1007/978-3-540-
27813-9 14
8. He, F., Sun, Z., Fan, H.: Satisfiability modulo ordering consistency theory for
multi-threaded program verification. In: Proceedings of the 42nd ACM SIGPLAN
International Conference on Programming Language Design and Implementation.
p. 1264–1279. PLDI 2021, Association for Computing Machinery, New York, NY,
USA (2021). https://doi.org/10.1145/3453483.3454108
9. Kroening, D., Tautschnig, M.: CBMC–C bounded model checker. In: International
Conference on Tools and Algorithms for the Construction and Analysis of Systems.
pp. 389–391. Springer (2014). https://doi.org/10.1007/978-3-642-54862-8 26
10. Yin, L., Dong, W., Liu, W., Li, Y., Wang, J.: Yogar-CBMC: CBMC with schedul-
ing constraint based abstraction refinement. In: Beyer, D., Huisman, M. (eds.)
Tools and Algorithms for the Construction and Analysis of Systems. pp. 422–426.
Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-
319-89963-3 25
11. Yin, L., Dong, W., Liu, W., Wang, J.: Scheduling constraint based abstraction
refinement for multi-threaded program verification. IEEE Transactions on Software
Engineering PP (08 2017). https://doi.org/10.1109/TSE.2018.2864122
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
428 F. He et al.
The Static Analyzer Frama-C in SV-COMP
(Competition Contribution)
Dirk Beyer Band Martin Spiessl
LMU Munich, Munich, Germany
Abstract. Frama-C
is a well-known platform for source-code analysis of
programs written in C. It can be extended via its plug-in architecture by
various analysis backends and features an extensive annotation language
called ACSL. So far it was hard to compare
Frama-C
to other software
verifiers. Our competition participation contributes an adapter named
Frama-C-SV
, which makes it possible to evaluate
Frama-C
against other
software verifiers. The adapter transforms standard verification tasks
(from the well-known SV-Benchmarks collection) in a way that can be
understood by
Frama-C
and produces a verification witness as output.
While
Frama-C
provides many different analyses, we focus on the Evolved
Value Analysis (EVA), which uses a combination of different domains to
over-approximate the behavior of the analyzed program.
Keywords: Software verification ·Program analysis ·Formal methods ·Compe-
tition on Software Verification ·Comparative Evaluation ·SV-COMP ·Frama-C
1 Approach
This competition contribution is based on
Frama-C
[12], a program-analysis
platform for C programs. The purpose of the participation in the comparative
evaluation SV-COMP is to show the strengths of
Frama-C
when applied to
the problem of verifying C programs from the SV-Benchmarks [4] collection of
verification tasks.
2 Architecture
Although
Frama-C
has a large configuration space, it does not support standard
specifications as used in SV-COMP, and it does not produce verification witnesses
as default. In order to overcome this obstacle we implemented an adapter for
Frama-C
using input and output transformers, and the adaption architecture
is illustrated in Fig. 1. In the following, we describe the artifacts and actors of
the participating verifier: in Sect. 2.1 we describe all the components that are
developed as part of the adapter, while in Sect. 2.2 we describe in more detail
how the used EVA analysis of Frama-C works.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 429–434, 2022.
https://doi.org/10.1007/978-3-030-99527-0_26
Program
Specification
Program’
Harness
TRUE
UNKNOWN
FALSE
Input
Transformer
Frama-C Output
Transformer
Witness
Witness
Configuration
Options
Output
Frama-C-SV
Fig. 1: Architecture of
Frama-C-SV
: the inputs and outputs of
Frama-C
are
translated to interface with the established standards as used by SV-COMP; the
components that are necessary to adapt
Frama-C
for comparison with other
verifiers amount to 678 lines of code mostly written in Python
2.1 Frama-C-SV
Input Transformer. The input transformer takes the program
p
and speci-
fication
s
and creates a new program
p0
in which the specification
s
has been
expressed as
Frama-C
-specific annotations.
Frama-C
uses ACSL [1] as language
to specify annotations. The input transformer also selects configuration param-
eters for
Frama-C
that are best suited for the verification task. Currently we
encode reachability tasks into signed integer overflows by adding an artificial
overflow to the body of the function
reach_error
. This works well in practice
and is also sound, since if there were any other overflows, the task would contain
undefined behavior and would not be a valid reachability task in the first place.
Configuration Options. Depending on the input program and specification, we
can choose different options that are passed to
Frama-C
. In essence, this acts like
an algorithm selection [14] and, e.g., allows us to choose a different configuration
of Frama-C depending on the specified property.
Harness. Some programs in the SV-Benchmarks collection use specific func-
tions to model non-determinism. We provide implementations for those functions
(
__VERIFIER_*
) in a separate C program such that the semantics of those func-
tions can be understood by
Frama-C
. This separate C program is passed to
Frama-C together with the transformed program p0.
Output Transformer. The output of
Frama-C
needs to be interpreted regard-
ing the original specification, and depending on the outcome, a verification witness
needs to be generated. Thus, we need an output transformer for (a) providing a
verdict for the verification task and (b) providing a verification witness. Regard-
ing (a), the output transformer interprets the CSV report that can be generated
by
Frama-C
to determine whether the program was proven to be safe (verdict
TRUE
), whether a specification violation occurred (verdict
FALSE
), or whether
no such statement can be made (verdict
UNKNOWN
). We also generate a minimal
correctness or violation witness for the verdicts
TRUE
and
FALSE
, respectively.
430 D. Beyer and M. Spiessl
The witness automata consist of only one node, which for violation witnesses is
marked as violation node. In the future we plan to augment these witnesses with
information such as invariants that have been found by Frama-C.
2.2 Frama-C
One of the strengths of
Frama-C
is its modular architecture [10], which allows
a configuration of the best possible analysis backends for a certain verification
problem. We choose the plug-in EVA [9], which is well suited for an automatic
analysis. Other plug-ins such as the Weakest-Preconditions (WP) plug-in require
hints from the user in order to be effective. In the following we will briefly describe
the most important aspects of the EVA analysis configuration that we use. For a
more detailed description, we refer the reader to the relevant literature [7,8,9].
Frama-C
provides a meta-option called
-eva-precision
for the EVA plug-in
with possible values ranging from 0 to 11. With higher values for this option more
precise domains and thresholds are used, at the cost of increased computation
time. We currently use the maximum value of 11 in order to make the best use
of the 900 s CPU time limit. In the future we might want to iteratively increase
this value starting at lower precisions.
Domains. The EVA analysis always uses the domain cvalue, which tracks values
of variables either as constant values, sets, or intervals of possible values (including
modular congruence constraints). For pointer addresses, these are either tracked
as addresses with offsets or as so-called garbled mix, which overapproximates the
set of possible memory locations. In addition, depending on the precision level,
various other domains are used that we describe in the following. The domain
symbolic-locations tracks a map of symbolic locations to values, which is, e.g.,
helpful for analyzing expressions containing array accesses such as
a[i]<a[j]
.
The equality domain tracks equalities of C expressions found in the code, whereas
the gauges domain tracks relations between variables in a loop with the goal
to discover linear inequality invariants [16]. Lastly the octagon domain tracks
certain linear constraints between pairs of variables [13]. As we use the highest
precision level, all of these domains are used in our contribution.
Precision of the State-Space Exploration. Apart from the domains, the
precision of state-space exploration in
Frama-C
is affected by various options. We
will describe some of these in the following; a complete list of affected settings and
values is always printed by
Frama-C
when the option eva-precision is specified
by the user. Option slevel (set to
5 000
) determines how many separate states
are kept before new states will be joined into existing ones. Option ilevel (set
to
256
) determines how many different values are tracked per variable before
overapproximating the value range. Option plevel (set to
2 000
) affects the size
up to which arrays are tracked. The option auto-loop-unrol l (set to
1 024
) will
determine up to which bound a loop is considered for unrolling.
The Static Analyzer Frama-C in SV-COMP 431
3 Strengths and Weaknesses
The competition contribution shows the strengths of
Frama-C
in checking C pro-
grams for overflows and also —in the currently supported sub-categories
1
for
reachability. Here we are able to show that our results are comparable and often
surpass those of other tools based on abstract interpretation [11] such as Gob-
lint [15]. While the EVA analysis of
Frama-C
that we use is based on abstract
interpretation, the precision options described in Sect. 2.2 allow for a more precise
state-space exploration, which behaves more like model checking. More details
about the results can be found in the competition report [2] and artifact [3].
The approach that we describe in this paper creates a compatibility
layer between the abilities used by
Frama-C
and the standards used in the
SV-Benchmarks
collection. While still a work in progress, we have shown that
it is possible to bridge this gap while preserving overall soundness. It is also
interesting to consider the results on verification tasks from the SV-Benchmarks
collections for a tool that did not participate before.
Although our approach is sound in general, we are likely not showcasing the full
potential of
Frama-C
. One aspect to consider here is the large configuration space,
which means there might be ways to verify more tasks with a better heuristic
for selecting the configuration options. The other aspect is that
Frama-C
also
provides different plug-ins such as the WP plug-in, which requires more (manual)
annotations, but can also potentially solve more tasks than the more automatic
EVA plug-in.
4 Software Project and Contributors
The software project
Frama-C
is developed at
https://git.frama-c.com/
pub/frama-c/
and our adapter
Frama-C-SV
is developed at
https://gitlab.
com/sosy-lab/software/frama-c-sv
, both being released under open-source
licenses. The exact version of the adapter that participated in SV-COMP 2022
is also archived in the competition’s tool-archive repository
2
[6].
Frama-C
was
funded by the European Commission in program Horizon 2020. The adapter
Frama-C-SV
was funded by the DFG. We thank the
Frama-C
authors
3
for their
contribution to the software-verification community.
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [2] and available on the competition web site. This includes
the verification tasks [4], competition results [3], verification witnesses [5], scripts, and
instructions for reproduction. The version of
Frama-C-SV
as used in the competition is
archived together with other participating tools [6].
Funding Statement. This work was funded in part by the Deutsche Forschungsge-
meinschaft (DFG) 378803395 (ConVeY).
1
We opted out of subcategories with unsound results caused by
Frama-C
making
assumptions that are different from the conventions of SV-COMP.
2https://gitlab.com/sosy-lab/sv- comp/archives-2022/blob/svcomp22/2022/frama- c- sv.zip
3https://frama-c.com/html/authors.html
432 D. Beyer and M. Spiessl
References
1.
Baudin, P., Cuoq, P., Filliâtre, J.C., Marché, C., Monate, B., Moy, Y., Prevosto,
V.: ACSL: ANSI/ISO C specification language version 1.17 (2021), available at
https://frama-c.com/download/acsl-1.17.pdf
2.
Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS (2).
Springer (2022)
3.
Beyer, D.: Results of the 11th Intl. Competition on Software Verification (SV-COMP
2022). Zenodo (2022). https://doi.org/10.5281/zenodo.5831008
4.
Beyer, D.: SV-Benchmarks: Benchmark set for software verification and testing (SV-
COMP 2022 and Test-Comp 2022). Zenodo (2022).
https://doi.org/10.5281/
zenodo.5831003
5.
Beyer, D.: Verification witnesses from verification tools (SV-COMP 2022). Zenodo
(2022). https://doi.org/10.5281/zenodo.5838498
6.
Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software
Verification (SV-COMP 2022). Zenodo (2022).
https://doi.org/10.5281/zenodo.
5959149
7.
Blazy, S., Bühler, D., Yakobowski, B.: Structuring abstract interpreters through
state and value abstractions. In: Proc. VMCAI. pp. 112–130. LNCS 10145, Springer
(2017). https://doi.org/10.1007/978-3-319-52234- 0_7
8.
Bühler, D.: Structuring an Abstract Interpreter through Value and State Abstrac-
tions: EVA, an Evolved Value Analysis for Frama-C. Ph.D. thesis, University
of Rennes 1, France (2017), available at
https://tel.archives-ouvertes.fr/
tel-01664726
9.
Bühler, D., Cuoq, P., Yakobowski, B., Lemerre, M., Maroneze, A., Perelle, V.,
Prevosto, V.: Eva: The Evolved Value Analysis plug-in (2020), available at
https:
//frama-c.com/download/frama-c-eva- manual.pdf
10.
Correnson, L., Cuoq, P., Kirchner, F., Maroneze, A., Prevosto, V., Puccetti, A.,
Signoles, J., Yakobowski, B.: Frama-C user manual (2020), available at
https:
//frama-c.com/download/frama-c-user- manual.pdf
11.
Cousot, P., Cousot, R.: Abstract interpretation: A unified lattice model for the
static analysis of programs by construction or approximation of fixpoints. In: Proc.
POPL. pp. 238–252. ACM (1977)
12.
Cuoq, P., Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.:
Frama-C. In: Proc. SEFM. pp. 233–247. Springer (2012).
https://doi.org/10.
1007/978-3-642-33826- 7_16
13.
Miné, A.: The octagon abstract domain. Higher-Order and Symbolic Computation
19(1), 31–100 (2006). https://doi.org/10.1007/s10990-006- 8609-1
14.
Rice, J.R.: The algorithm selection problem. Advances in Computers 15, 65–118
(1976). https://doi.org/10.1016/S0065-2458(08)60520-3
15.
Saan, S., Schwarz, M., Apinis, K., Erhard, J., Seidl, H., Vogler, R., Vojdani, V.: Gob-
lint: Thread-modular abstract interpretation using side-effecting constraints (com-
petition contribution). In: Proc. TACAS (2). pp. 438–442. LNCS 12652, Springer
(2021). https://doi.org/10.1007/978-3-030-72013- 1_28
16.
Venet, A.: The gauge domain: Scalable analysis of linear inequality invariants. In:
Proc. CAV. pp. 139–154. LNCS 7358, Springer (2012).
https://doi.org/10.1007/
978-3-642-31424- 7_15
The Static Analyzer Frama-C in SV-COMP 433
0/
), which permits use, sharing, adaptation, distribution, and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
434 D. Beyer and M. Spiessl
Open Access. This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.
GDart: An Ensemble of Tools for Dynamic
Symbolic Execution on the Java Virtual
Machine (Competition Contribution)?
Malte Mues (B)1and Falk Howar 1,2
1TU Dortmund University, Dortmund, Germany
{malte.mues, falk.howar}@tu-dortmund.de
2Fraunhofer ISST, Dortmund Germanny
Abstract. GDart is an ensemble of tools allowing dynamic symbolic
execution of JVM programs. The dynamic symbolic execution engine is
decomposed into three different components: a symbolic decision engine
(DSE), a concolic executor (SPouT), and a SMT solver backend allow-
ing meta-strategy solving of SMT problems (JConstraints). The symbolic
decision component is loosely coupled with the executor by a newly in-
troduced communication protocol. At SV-COMP 2022, GDart solved
471 of 586 tasks finding more correct false results (302) than correct true
results (169). It scored fourth place.
Keywords: Dynamic Symbolic Execution ·Software Verification
1 Verification Approach
This paper presents the GDart ensemble tool, a dynamic symbolic execution
engine for the JVM. Dynamic symbolic execution is a well-established technique
for software testing (cf. DART [6]) and there have been already two contestants
to SV-COMP 2021 using this technique (cf. JDart [7,9] and COASTAL3).
It is a search algorithm for systematic exploration of a program’s state space
for a property violation which either stops after exhausting the resource limits,
exploring the complete symbolic state space, or encountering an error. The end
of the search is fully configurable in GDart.
In SV-COMP 2022 [3], a dynamic symbolic execution tool (JDart (714
Points)) wins the Java track for the first time beating JBMC (700 Points) [4],
a bounded model checker for Java, and Java Ranger (670 Points) [11], a
symbolic execution engine extended by veritesting [1] for Java.JDart’s result
underlines the potential of dynamic symbolic execution for the verification of
Java programs in general. The concrete implementation of JDart is closely
coupled to the Java PathFinder VM (JPF-VM) [12] running the complete anal-
ysis within one virtual machine. The advantage of the JPF-VM is that it runs
?This work has been partially founded by an Amazon Research Award
3https://github.com/DeepseaPlatform/coastal
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 435–439, 2022.
https://doi.org/10.1007/978-3-030-99527-0_27
as a guest JVM on top of a host JVM. The analysis might mock parts of the
guest JVM and use the host JVM for running side computation required to
compute results used in the mock. The downside of the JPF-VM is its research
tool status and that it is costly to maintain it given Java’s fast pace in releasing
new features.
COASTAL demonstrated for the first time what a loosely coupled architec-
ture between the symbolic exploration engine and a concolic execution engine
might look like. It instruments the bytecode with ASM4, a java bytecode manip-
ulation framework, to obtain symbolic traces. This makes the analysis indepen-
dent of the JPF-VM. The downside is that bytecode manipulation offers less
flexibility than hooking directly into the JVM.
2 Software Architecture
Constraint-
Solving
(CVC4,Z3, . . . )
Symbolic
Exploration
(DSE/JConstraints)
Concolic
Execution
(SPouT)
Concrete ValuesSMT Problem
Symbolic TraceModel
Fig. 1: GDart’s ensemble architecture and the interplay between the components.
GDart takes the strengths of JDart’s mocking flexibility and combines it
with COASTAL’s modular design. Figure 1demonstrates the architecture of the
GDart ensemble tool. The main analysis component is the symbolic explorer.
It orchestrates the concolic executor and requests solutions for SMT problems
from the constraint solvers powering the symbolic exploration.
Symbolic Exploration. We name the symbolic explorer DSE component as it
does symbolic exploration and starts the concolic executor, the two main steps in
applying dynamic symbolic execution. It manages the constraint tree and guides
its exploration. Both steps together are the main tasks of a dynamic symbolic
execution engine. To explore a path, it computes a set of concrete values that
drives the concolic executor down the path of interest and seeds the executor
with these values. After the termination of the executor, it parses the obtained
symbolic trace and integrates it into the symbolic tree. Next, it constructs from
the symbolic tree a SMT problem that describes the next path to explore and
starts a constraint solver to get a model suitable to drive the execution down
this path or an unsatisfiable verdict implying that the path is unreachable. The
4https://asm.ow2.io
436 M. Mues and F. Howar
search behavior of GDart is configured in the DSE. Once the search terminates,
DSE generates a verification witness from the constraint tree.
Concolic Executor. One of the core contributions of GDart is the new concolic
executor SPouT implemented as part of the Espresso guest language running on
top of the GraalVM [13]5. The GraalVM is an industrial-grade JVM main-
tained by Oracle allowing to use most of the architectural benefits the JPF-VM
offered apart from state tracking. But concolic execution does not require JPF-
VM’s state tracing feature. SPouT can be seeded with concrete values to drive
down the execution along a concrete path. In addition, it can introduce new
symbolic variables for previously unknown inputs. During execution, it records
manipulation and constraints checks on symbolic variables and reports a sym-
bolic execution trace together with the concrete execution result on termination
of the path exploration. Decisions on the symbolic variables are encoded in the
SMT-Lib format. As SPouT maintains the two VM layers, it allows mocking
of behavior in the Espresso VM running the analysis and implements a substi-
tute executed on the host GraalVM during concolic execution the same way
JDart does for mocking the environment if needed. The feature is also used
for intercepting invocations of the string library in Java and encoding them
symbolically.
Constraint Solving. The third component is constraint solving. DSE uses the
JConstraints library to model SMT-Lib constraints internally and interact
with the solver. GDart is backed by CVC4 [2] and Z3 [5]. We combine these
two SMT solvers in a portfolio approach according to the CvcSeqEval strategy
presented in our previous work [8].
3 Strengths and Weaknesses
GDart is the fourth place with 640 points behind JDart (714 points), JBMC
(700 points), and Java Ranger (670 points). Dynamic symbolic execution tools
tend to be stronger in finding property violations than confirming the absence of
property violations on the SV-COMP benchmark. This is partially by design as
some of the problems (e.g., those problems in the jayhorn-recursive subgroup)
aim for testing the handling of tremendously large and hard to explore state
spaces. GDart disproves the property in 302 cases and confirms it in 169 cases.
In total, GDart answered 471 of 586 tasks correctly and none incorrect. These
are 40 more correct false proved tasks than Java Ranger found (262 correct
false tasks out of 466 solved tasks). In total GDart solved five more tasks than
Java Ranger and 35 less than JBMC.
In direct comparison with GDart,JDart solved 192 (+23) correct true
tasks and 330 (+28) correct false tasks. Three factors are contributing to the gap
between GDart and JDart: the performance overhead of spinning up one JVM
per executor run (We do not have the exact number, but spinning up a JVM
5https://www.graalvm.org
GDart: An Ensemble of Tools for Dynamic Symbolic Execution 437
costs at least 500 ms per JVM affecting especially tasks with huge exploration
trees.), technical maturity of the implementation as JDart is around for more
time, and a value tracing heuristic built into JDart for tracking numerical
values origin from a serialized string representation not built into GDart. The
performance overhead for spinning up multiple JVMs is the only drawback that
is influenced by the modular design of GDart and will not go away in the
future. JDart’s time per task after archiving 600 points is close to five seconds
CPU time in the score-based quantile plots for CPU time while GDart’s time
per task reaches close to 50 seconds CPU time for the same score.
The weakness of dynamic symbolic execution is state space explosion which
also affects GDart. Slowing down each executor run by spinning up new VMs
is a disadvantage given the resource constraints of SV-COMP. On the bright
side, with more relaxed resource limits it is possible to run the execution runs in
parallel to the symbolic exploration of the constraints tree as future work for the
DSE component allowing parallel breadth-first search on multi-core machines.
At the moment all paths are explored sequentially.
4 Tool Setup
GDart is run with various configuration options hard-coded into the SV-COMP
run scripts. More precisely, we enabled witness generation, used the described
solver strategy in the constraint backend, chose a breadth-first search on the
constraint tree, and used the same bounded solving as JDart. The search is
configured to terminate on the first hit assertion error.
5 Software Project
The components are currently all developed at TU Dortmund by the group led
by Falk Howar. DSE6is available under the Apache 2.0 license, JConstraints7
as well, and SPouT8is available under the GPL v2 license. We also provide the
run scripts for SV-COMP on GitHub9.
6 Data Availability Statement
The GDart archive used for SV-COMP 2022 is available at Zenoodo [10].
References
1. Avgerinos, T., Rebert, A., Cha, S.K., Brumley, D.: Enhancing symbolic ex-
ecution with veritesting. In: Proc. ICSE. pp. 1083–1094 (2014). https://-
doi.org/10.1145/2568225.2568293
6https://github.com/tudo-aqua/dse
7https://github.com/tudo-aqua/jconstraints
8https://github.com/tudo-aqua/spout
9https://github.com/tudo-aqua/gdart-svcomp
438 M. Mues and F. Howar
2. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T.,
Reynolds, A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) Proc.
CAV. pp. 171–177. Springer (2011). https://doi.org/10.1007/978-3-642-22110-
1_14
3. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS (2).
Springer (2022)
4. Cordeiro, L., Kroening, D., Schrammel, P.: JBMC: Bounded model checking for
Java bytecode. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Proc.
TACAS. pp. 219–223. Springer (2019). https://doi.org/10.1007/978-3-030-17502-
3_17
5. De Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Proc. TACAS. pp.
337–340. Springer (2008). https://doi.org/10.1007/978-3-540-78800-3_24
6. Godefroid, P., Klarlund, N., Sen, K.: Dart: Directed automated random testing.
In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Lan-
guage Design and Implementation. pp. 213–223. PLDI ’05, ACM (2005). https://-
doi.org/10.1007/978-3-642-19237-1_4
7. Luckow, K., Dimjaevi, M., Giannakopoulou, D., Howar, F., Isberner, M., Kahsai,
T., Rakamari, Z., Raman, V.: JDart: A dynamic symbolic analysis framework. In:
TACAS 2016 (2016). https://doi.org/10.1007/978-3-662-49674-9_26
8. Mues, M., Howar, F.: Data-driven design and evaluation of SMT meta-solving
strategies: Balancing performance, accuracy, and cost. In: Proc. ASE. pp. 179–190
(2021). https://doi.org/10.1109/ASE51524.2021.9678881
9. Mues, M., Howar, F.: JDart: Portfolio solving, breadth-first search and smt-lib
strings. In: Proc. TACAS (2021). https://doi.org/10.1007/978-3-030-72013-1_30
10. Mues, M., Howar, F.: Gdart artifact for sv-comp 2022 (Feb 2022). https://-
doi.org/10.5281/zenodo.5957294
11. Sharma, V., Hussein, S., Whalen, M.W., McCamant, S., Visser, W.: Java Ranger:
Statically summarizing regions for efficient symbolic execution of Java. In: Proc.
ESEC/FSE 2020. pp. 123–134 (2020). https://doi.org/10.1145/3368089.3409734
12. Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking pro-
grams. Automated Software Engineering 10(2), 203–232 (Apr 2003). https://-
doi.org/10.1023/A:1022920129859
13. Würthinger, T., Wimmer, C., Wöß, A., Stadler, L., Duboscq, G., Humer, C.,
Richards, G., Simon, D., Wolczko, M.: One VM to rule them all. In: Proc. SPLASH.
pp. 187–204 (2013)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
GDart: An Ensemble of Tools for Dynamic Symbolic Execution 439
Graves-CPA: A Graph-Attention Verifier
Selector (Competition Contribution)
Will Leeson() and Matthew B. Dwyer
University of Virginia, Charlottesville VA 22903, USA
{will-leeson,matthewbdwyer}@virginia.edu
Abstract. Graves-CPA is a verification tool which uses algorithm se-
lection to decide an ordering of underlying verifiers to most effectively
verify a given program. Graves-CPA represents programs using an
amalgam of traditional program graph representations and uses state-
of-the-art graph neural network techniques to dynamically decide how
to run a set of verification techniques. The Graves technique is imple-
mentation agnostic, but it’s competition submission, Graves-CPA, is
built using several CPAchecker configurations as its underlying verifiers.
Keywords: Software Verification ·Graph Attention Networks ·Graph
Neural Networks ·Algorithm Selection
1 Verification Approach
Graves-CPA is an algorithm selector for software verification based on graph
neural network techniques. As the tool PeSCo [14] has shown, dynamic order-
ing of verification techniques can result in faster and more accurate verification.
Computing an ordering on techniques dynamically will incur some runtime, but
an effective ordering will oftentimes make this overhead insignificant in compari-
son to the time saved by using a more appropriate technique. Like most algorithm
selectors, Graves-CPA uses machine learning to make its selections. However,
it uses graph neural networks (GNNs) so it can represent programs using tra-
ditional program abstractions, such as abstract syntax trees (ASTs). Graves-
CPA uses a variant of GNNs called Graph Attention Networks (GATs) [16].
GATs use a learned attention mechanism which is trained to learn the impor-
tance of edges in a given graph.
GNNs are an emerging field in machine learning. Traditional neural networks
accept input vectors, which have a fixed size and a natural ordering on elements,
but graphs, in general, have neither. GNNs avoid these issues by operating on
individual nodes in the graph, instead of the graph as a whole [15]. Typically, the
input to a GNN is the current representation of a node vand a collation of the
representations of its neighboring nodes. The output is then a new representation
for v. This process is repeated independently for all nodes in the graph. Thus, the
number of nodes in the graph and order in which they are processed is irrelevant.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 440–445, 2022.
https://doi.org/10.1007/978-3-030-99527-0_28
The Graves technique is tool agnostic [11], meaning it can be trained to
select from any set of verifiers. Our competition contribution selects an ordering
from the techniques utilized by CPAchecker [3], similar to PeSCo in previous
competitions.
To form its selection, Graves-CPA produces a graph representation of a
given program, G, which is based on its AST with control flow, data flow, and
function call and return edges added between the tree’s nodes. The AST’s nodes
and edges ensure the semantics of the statements in the program are maintained.
Control flow edges maintain the branching and order of execution between these
statements. Data flow edges explicitly relate the definitions, uses, and interac-
tions of values in the program. Gis passed to a GNN, consisting of a series
of GATs, which outputs a graph feature vector This feature vector is finally
passed to a fully connected neural network which decides the sequence in which
Graves-CPA’s suite of verification techniques are run.
2 System Architecture
2.1 Graph Generation
To generate a graph from a program, Graves-CPA relies on the AST produced
by the C compiler Clang [10]. Using a visitor pattern [9], Graves-CPA walks the
AST to generate data flow edges and the edges of the program’s Interprocedural
Control Flow Graph (ICFG). Function call and return edges in the ICFG are
those which can be determined purely syntactically. Using the ICFG and data
flow edges, Graves-CPA produces additional data flow edges using the work-
list reaching definition algorithm [1]. We limit the number of iterations of the
reaching definition algorithm, making our data edges an under-approximation of
possible data flow edges. Once this graph is generated, it is parsed into a list of
nodes and several edge sets. Nodes represent the AST token which corresponds
to them using a one-hot encoding. These nodes and edges are used as input to
the GNN.
2.2 Prediction
To form a prediction, Graves-CPA uses a GNN, visualized in Figure 1, which
consists of 2 GAT layers, a jumping knowledge layer [17], and an attention-
based pooling layer [12]. The GAT layers are crucial to our technique. When
propagating data through the graph, the attention mechanisms in each layer
weights edges so information important to predictions is more prominent than
superfluous data.
The jumping knowledge layer concatenates intermediate graph representa-
tions, denoted by A, B, and C, allowing the model to learn from each represen-
tation. The attention-based pooling layer calculates an attention value for each
node in the graph. All nodes are weighted by their respective attention values
and then summed together to form a graph feature vector. The combination of
Graves-CPA 441
Fig. 1. Graves uses a GNN comprised of 2 GAT layers, a Jumping Knowledge layer,
and attention pooling layer. These layers produce a graph feature vector which a 3
layer prediction network uses to order verifiers for sequential execution. An in depth
description of this architecture can be found in Leeson et al. [11].
GAT layers and the attention-based pool allows the network to weigh the im-
portance of both edges and nodes when forming the graph feature vector. This
feature vector is fed to a three layer neural network which decides the sequence
of tool execution.
Graves-CPA was trained using data collected from running 5 configurations
of the CPAchecker framework on the verification tasks from SV-COMP 2021.
Labels for each configuration come from the SV-COMP score the configuration
would receive for a given program minus a time penalty. Similar to CPAchecker’s
competition contribution, these configurations are symbolic execution [6], value
analysis [7], value analysis with CEGAR [7], predicate analysis [5], and bounded
model checking with k-induction [4]. To prevent Graves-CPA from overfitting
to the SV-COMP benchmarks, we train on a subset of the dataset, only utilizing
20% of it. Like previous iterations of PeSCo, the network is trained to rank the
configurations in the order in which they should be executed.
Graves-CPA uses the machine learning libraries PyTorch [13] and PyTorch-
Geometric [8], an extension of PyTorch for graphs and other irregularly shaped
data, to implement its machine learning components. Graves-CPA is imple-
mented using a combination of Python, C++, and Java.
2.3 Execution
Using the ordering produced by the previous step, CPAchecker is run in a se-
quential fashion with each verification configuration. If a technique goes past a
given time limit or fails to produce a result, the next technique is executed.
442 W. Leeson et al.
3 Strengths and Weaknesses
Graves-CPA operates on program graphs which are an abstraction of the pro-
gram. Its underlying model uses this abstraction to learn what software patterns
a particular verification technique excels at handling. This allows Graves-CPA
to produce a dynamic ordering which should run techniques more equipped to
the given problem first, reducing run time. In [11], the authors perform a qual-
itative study which suggests the network learns to rank verification techniques
using program features an expert would use to decide between techniques.
In SV-COMP 2022 [2], there were 4,548 problems both Graves-CPA and
CPA-checker reported the correct result. Graves-CPA’s dynamic selection of
CPA-checker’s static configuration ordering allowed it to solve these problems
37 hours faster. Further, Graves-CPA was able to solve 142 problems that
CPAchecker could not, due to resource constraints or other issues.
Machine learning relies on the fact that training data is representative of the
real world. If this is not the case, the model can easily make poor predictions.
These poor decisions can be seen in competition in the 559 instances where
Graves-CPA chooses an ordering that doesn’t produce the correct result, but
CPAchecker does. In most of these instances, Graves-CPA runs out of resources
or incorrectly predicts the remaining techniques will not produce a correct result.
4 Tool Setup and Configuration
Graves-CPA is built on the PeSCo codebase, which in turn is built on the
CPAchecker codebase, and participates in the ReachSafety and Overall cate-
gories. It can be downloaded as a fork: https://github.com/will-leeson/cpachecker.
Graves-CPA requires cmake, LLVM, either make or ninja, and ant (a CPAchecker
dependency) to be built and the python libraries PyTorch and PyTorch-Geometric
to be executed. To build the project, simply run the shell script setup.sh and
add our graph generation tool, graph-builder, to your path. Now, you may
verify a program with Graves-CPA using the command:
scripts/cpa.sh -svcomp22-graves -spec [prop.prp] [file.c]
5 Software Project and Contributions
Graves-CPA is an open source project developed by the authors at the Uni-
versity of Virginia. We would like to thank the team behind the PeSCo and
CPAChecker tools for allowing us to build on their work.
Acknowledgements
We would like to thank Hongning Wang for his advice on graph neural networks
and prediction systems. This material is based in part upon work supported by
the U.S. Army Research Office under grant number W911NF-19-1-0054 and by
the DARPA ARCOS program under contract FA8750-20-C-0507.
Graves-CPA 443
References
1. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, principles, techniques. Addison
wesley 7(8), 9 (1986)
2. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS (2).
Springer (2022)
3. Beyer, D., Dangl, M.: Strategy selection for software verification based on boolean
features. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal
Methods, Verification and Validation. Verification. pp. 144–159. Springer Interna-
tional Publishing, Cham (2018)
4. Beyer, D., Dangl, M., Wendler, P.: Boosting k-induction with continuously-refined
invariants. In: International Conference on Computer Aided Verification. pp. 622–
640. Springer (2015)
5. Beyer, D., Keremoglu, M.E., Wendler, P.: Predicate abstraction with adjustable-
block encoding. In: Formal Methods in Computer Aided Design. pp. 189–197. IEEE
(2010)
6. Beyer, D., Lemberger, T.: Cpa-symexec: efficient symbolic execution in cpachecker.
In: Proceedings of the 33rd ACM/IEEE International Conference on Automated
Software Engineering. pp. 900–903 (2018)
7. Beyer, D., owe, S.: Explicit-state software model checking based on cegar and in-
terpolation. In: International Conference on Fundamental Approaches to Software
Engineering. pp. 146–162. Springer (2013)
8. Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric.
In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
9. Johnson, R., Vlissides, J.: Design patterns. Elements of Reusable Object-Oriented
Software Addison-Wesley, Reading (1995)
10. Lattner, C.: Clang: a c language family frontend for llvm, https://clang.llvm.org/
11. Leeson, W., Dwyer, M.B.: Algorithm selection for software verification using graph
attention networks (2022), https://arxiv.org/abs/2201.11711
12. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural
networks (2017)
13. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T.,
Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito,
Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chin-
tala, S.: Pytorch: An imperative style, high-performance deep learning library. In:
Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch´e-Buc, F., Fox, E., Garnett,
R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035.
Curran Associates, Inc. (2019), https://papers.neurips.cc/paper/9015- pytorch-an-
imperative-style-high- performance- deep-learning-library.pdf
14. Richter, C., Wehrheim, H.: Pesco: Predicting sequential combinations of verifiers.
In: International Conference on Tools and Algorithms for the Construction and
Analysis of Systems. pp. 229–233. Springer (2019)
15. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph
neural network model. IEEE transactions on neural networks 20(1), 61–80 (2008)
16. Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph
attention networks. arXiv preprint arXiv:1710.10903 (2017)
17. Xu, K., Li, C., Tian, Y., Sonobe, T., ichi Kawarabayashi, K., Jegelka, S.: Repre-
sentation learning on graphs with jumping knowledge networks (2018)
444 W. Leeson et al.
Graves-CPA 445
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
GWIT: A Witness Validator for Java based on
GraalVM (Competition Contribution)?
Falk Howar (B)1,2and Malte Mues 1
1TU Dortmund University, Dortmund, Germany
{falk.howar, malte.mues}@tu-dortmund.de
2Fraunhofer ISST, Dortmund Germanny
Abstract. GWIT is a validator for violation witnesses produced by
Java verifiers in the SV-COMP software verification competition. GWIT
weaves assumptions documented in a witness into the source code of a
program, effectively restricting the part of the program that is explored
by a program analysis. It then uses the GDart tool (dynamic symbolic
execution) to search for reachable errors in the modified program.
1 Introduction
Software verification tools, like any other software, can contain bugs. Given their
intended use, i.e., proving the absence of errors in programs, however, bugs in
verification tools are particularly problematic. On the other hand, verification
tools can generate certificates for computed verdicts (e.g., counterexamples) that
can be used to validate verification results. In the SV-COMP competition on
software verification violation witnesses and correctness witnesses, based on an-
notated abstract control-flow automata have been established as a standardized
representation of such certificates [1,2]. Participating verifiers are expected to
produce witnesses for verdicts and witness validators are used for confirming
verdicts based on these witnesses.
In this paper, we present GWIT (as in Guess What I’m Thinking” or as
in GDart-based witness validator), a validator of violation witnesses for Java
programs, based on the GDart tool ensemble [6]. GWIT validates violation
witnesses by weaving the assumptions documented in a witness into the orig-
inal program under analysis and checks the restricted program with dynamic
symbolic execution.
2 Witness Validation in GWIT
We illustrate the operation of GWIT for the small example shown in Figure 1:
In the program, a String value is created nondeterministically before asserting
that the value of this String value should not be “whoopsy”. This program con-
tains a reachable error: in case the value “whoopsy” is returned by the call to
Verifier.nondetString(), an assertion violation will be triggered.
?This work has been partially founded by an Amazon Research Award
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 446–450, 2022.
https://doi.org/10.1007/978-3-030-99527-0_29
1public static void main(String[] args) {
2String s =Verifier.nondetString();
3assert !s.equals("whoopsy")
4}
Fig. 1: Small program with reachable error.
Java verifiers will generate a violation witness in such a case. In SV-COMP,
witnesses are produced in a standardized format, conceptually based on control-
flow automata and technically realized as models in the GraphML format [2].
Figure 2shows an excerpt of such a witness for the above example. The witness
makes an assumption on the state of the program when executing line 2of the
example program, namely that variable shas value “whoopsy”. As discussed,
execution paths on which this assumption holds, will lead to an error.
GWIT weaves the assumptions from the witness into the original program,
restricting the number of program paths that have to be explored for finding the
error. Figure 3shows the result for our example: a call to Witness.assume(...)
is generated from the assumption from the witness in Figure 2. The assume
method wraps potentially many calls to the Verifier.assume(...) method,
enabling multiple assumptions on the same line of code (e.g., due to execution
of that line in a loop). The counters array keeps statistic on assumptions per
line. The Verifier.assume(...) method is used by GDart to stop analysis
on paths that violate the corresponding assumption.
Figure 4, finally, shows the effect of weaving the witness into the code on the
obtained constraints-trees. In the left of the figure, the tree computed by GDart
for the original program is shown. The tree has two satisfiable paths, branching
on the condition of the assert statement. The right of the figure shows the tree
for the modified program. This tree contains a node for the assumption, one path
that is not executed after the violation of the assumption, one path that is not
feasible after the assumption for the assert statement, and one path leading to an
error (i.e., assertion violation). In this small example, the tree for the modified
program is more complex than the tree for the original program, but it has fewer
complete execution paths. In more complex programs, assumptions will typically
remove multiple execution paths, making the validation task significantly easier
than the original verification task.
<edge source="n0" target="n1">
<data key="originfile">Main.java</data>
<data key="startline">2</data>
<data key="threadId">0</data>
<data key="assumption">s.equals("whoopsy")</data>
<data key="assumption.scope">...</data>
</edge>
Fig. 2: Excerpt of violation witness produced by GDart or JBMC.
GWIT: A Witness Validator for Java based on GraalVM 447
1static int[] counters =new int[] {0};
2public static void assume(int id, boolean ... assumptions) {
3int idx =counters[id];
4counters[id]++;
5Verifier.assume(assumptions[idx]);
6}
7
8public static void main(String[] args) {
9String s =Verifier.nondetString();
10 Witness.assume(0, s.equals("whoopsy"));
11 assert !s.equals("whoopsy")
12 }
Fig. 3: Program with assumption from witness weaved into the code.
s6="whoopsy"
ok err
true f alse
assume
s="whoopsy"
s6="whoopsy"
unsat err
assumption
violation
true f alse
true f alse
Fig. 4: Constraints-tree for original program (left) and modified program (right).
3 Performance and Limitations
While the approach of GWIT is sound for violation witnesses, the current imple-
mentation still has limitations, validating roughly half of the witnesses provided
by verifiers.
Soundness. GWIT is sound: weaving a witness into the code adds additional
decision nodes to the constraints-tree. In the sub-tree rooted at such a new node,
some paths become unsatisfiable and will not be explored. Every complete path
ψin the modified tree has an equivalent path φin the original constraints-tree
such that ψ=φ. If an error is reached in the modified tree, it is also reachable
in the original program.
Performance. For programs with few decisions, the modified program may actu-
ally be more complex than the original program, but GDart does only explore
more paths than in the original program in cases where the initial value along
some path does not satisfy an assumption. Comparing the CPU times of GDart
used as a verifier and used through GWIT, using almost identical configuration
448 F. Howar and M. Mues
options (only difference: GWIT does not produce witnesses), complexity is re-
duced for most benchmark instances that do not fail due to syntactic errors
during weaving (see below).
Two extreme examples are the BellmanFord-FunSat02 for which weaving a
witness with 13 assumptions increases CPU time more than twice, leading to a
timeout during validation and the nanoxml_eqchk/prop2 instance for which the
CPU time required for validation is less than 14% of the CPU time needed for
the original verification task.
Overall, GWIT successfully validates 301 of 614 witnesses provided by GDart
and JBMC [3] (the only Java verifiers that currently produce witnesses). In 286
cases, validation failed with inconclusive verdicts due to currently unsupported
features of witness. In 15 cases, incorrect weaving (see below) prevented valida-
tion of witnesses. For 12 witnesses, validation attempts exhaust resource limits.
Limitations. First, GWIT currently only supports violation witnesses. In princi-
ple, it should be possible to validate verification witnesses by weaving assertions
into the program code, but it is not obvious that such an approach makes the
validation of witnesses a simpler problem than the original verification task. Sec-
ond, since weaving witnesses is done on the source code, it only works correctly
on proper blocks, delimited with braces, and with one statement per line. While
this does not affect soundness, it makes the validation of witnesses impossible in
some cases.
4 Tool Setup
GWIT is shipped as a git repository with sub-projects delivering all required
components. Checking out the repository and initializing all sub-projects pulls
in all required source code. For building the SPouT component, the mx build
system3maintained by the GraalVM [7] team is required. Other components are
built with maven. Once all build systems are available, the ./compile-all.sh
script builds GWIT. The ./run-gwit.sh is used to validate witnesses, taking
the witness file and source folders of a benchmark instance as parameters. GWIT
currently does not expose any other configuration parameters.
5 Software Project
The GWIT tool is available on GitHub4.GWIT’s scripts are licensed under the
Apache 2.0 license. The sub-project bring their own license as follows: DSE5is
available under the Apache 2.0 license, JConstraints6[4] as well, and SPouT7
is available under the GPL v2 license. The components of GWIT and GWIT
itself are currently developed at TU Dortmund by the group led by Falk Howar.
3https://github.com/graalvm/mx
4https://github.com/tudo-aqua/gwit
5https://github.com/tudo-aqua/dse
6https://github.com/tudo-aqua/jconstraints
7https://github.com/tudo-aqua/spout
GWIT: A Witness Validator for Java based on GraalVM 449
6 Data Availability Statement
The GWIT archive used for SV-COMP 2022 is available at Zenoodo [5].
References
1. Beyer, D., Dangl, M., Dietsch, D., Heizmann, M.: Correctness witnesses: Exchang-
ing verification results between verifiers. In: Proc. FSE. p. 326337. FSE 2016,
Association for Computing Machinery, New York, NY, USA (2016). https://-
doi.org/10.1145/2950290.2950351
2. Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Stahlbauer, A.: Witness validation
and stepwise testification across software verifiers. In: Proc. FSE. p. 721733. ES-
EC/FSE 2015, Association for Computing Machinery, New York, NY, USA (2015).
https://doi.org/10.1145/2786805.2786867
3. Cordeiro, L., Kroening, D., Schrammel, P.: JBMC: Bounded model checking for
Java bytecode. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Proc.
TACAS. pp. 219–223. Springer International Publishing, Cham (2019). https://-
doi.org/10.1007/978-3-030-17502-3_17
4. Howar, F., Jabbour, F., Mues, M.: JConstraints: A library for working with logic
expressions in Java. In: Models, Mindsets, Meta: The What, the How, and the Why
Not?, pp. 310–325. Springer (2019). https://doi.org/10.1007/978-3-030-22348-9_19
5. Howar, F., Mues, M.: Gwit artifact for sv-comp 2022 (Feb 2022). https://-
doi.org/10.5281/zenodo.5956885
6. Mues, M., Howar, F.: GDart: An ensemble of tools for dynamic symbolic execu-
tion on the java virtual machine (competition contribution). In: Proc. TACAS (2).
Springer (2022)
7. Würthinger, T., Wimmer, C., Wöß, A., Stadler, L., Duboscq, G., Humer, C.,
Richards, G., Simon, D., Wolczko, M.: One VM to rule them all. In: Proc. SPLASH.
pp. 187–204 (2013)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
450 F. Howar and M. Mues
The Static Analyzer Infer in SV-COMP
(Competition Contribution)
Matthias Kettl() and Thomas Lemberger
LMU Munich, Germany
Abstract.
We present
Infer-sv
, a wrapper that adapts
Infer
for SV-
COMP.
Infer
is a static-analysis tool for C and other languages, developed
by Facebook and used by multiple large companies. It is strongly aimed
at industry and the internal use at Facebook. Despite its popularity, there
are no reported numbers on its precision and efficiency. With
Infer-sv
,
we take a first step towards an objective comparison of
Infer
with other
SV-COMP participants from academia and industry.
1 Facebook Infer
Infer [6] is a compositional and incremental static-analysis tool developed at
Facebook. Infer supports a wide array of analyses; this includes memory safety,
buffer overruns, performance constraints and different reachability analyses for
C, C++, Objective C, Java, C#, and .Net. For memory analysis, Infer uses
bi-abduction [7] with separation logic [14]. Infer supports the integration of
new abstract domains through the abstract-interpretation framework Infer:AI.
Infer analyzes programs compositionally (building method summaries) and
incrementally (only analyzing changed program parts). In contrast to most other
tools that participate in SV-COMP,Infer is not an academic verifier. Instead, it is
aimed at practical use during software development. This has direct implications
on the development focus: When Infer is told to incrementally analyze software,
it outputs only newly discovered bugs and does not re-report bugs found in
previous analyses. This allows developers to ignore warnings not deemed relevant
and reduces the cognitive burden on developers due to false alarms. Multiple
large companies use Infer—among others: Amazon Web Services, Facebook,
Microsoft, Mozilla, and Spotify. At the time of this writing, Infer has more than
12 000
stars on GitHub and was forked over
1 500
times. Despite its popularity,
there are no reported numbers on Infer’s precision and soundness. With the
participation of Infer in the C language track of SV-COMP ’22, we hope to take
a first step towards an objective comparison of Infer with other verifiers.
The following other commercial verifiers participate in SV-COMP ’22: 2ls [16],
Cbmc [10], Crux 1,Frama-C [5], VeriAbs [12], and VeriFuzz [9].
1https://crux.galois.com/
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 451–456, 2022.
https://doi.org/10.1007/978-3-030-99527-0_30
2 Infer in SV-COMP
2.1 Infer-SV
Verification
. We provide the wrapper Infer-sv to adapt Infer to the SV-COMP
specification format for program properties. Infer-sv parses the property to
analyze, adjusts the program under analysis for Infer, runs Infer with fitting
analyses, and reports a verification verdict based on the feedback produced by
Infer.Infer-sv supports the following SV-COMP program properties:
no-overflow. The aim is to check for arithmetic overflows on signed-integer types.
Infer-sv runs Infer’s buffer-overrun analysis 2to detect these.
unreach-call. The aim is to check for reachable calls to function
reach_error
.
Infer provides a function-call reachability analysis
3
, but this analysis proved very
imprecise. To mitigate this, Infer-sv performs a program transformation
4
: It
replaces each call to function
reach_error
with an overflow-provoking statement
int __reach_error_x = 0x7fffffff + 1
. No task with property unreach-call
contains a signed-integer overflow, so the original reachability property holds if
and only if any of the introduced overflows is reachable. Infer-sv runs Infer’s
buffer-overrun analysis on the transformed program to check this.
valid-memsafety. The aim is to check for invalid pointer dereferences, invalid
frees of memory, and memory leaks. To analyze memory safety, Infer-sv uses
two analyses: bi-abduction
5
and Infer:Pulse
6
.SV-COMP requires verifiers to
report the concrete type of violation detected: valid-deref,valid-memtrack, or
valid-free.Infer-sv analyzes the error codes reported by Infer to determine the
exact violation found. If Infer reports multiple fitting warnings, we take the first.
Witnesses
.SV-COMP requires participants to report GraphML verification-
result witnesses [3,4] in tandem with each result, and these witnesses must be
successfully validated by at least one participating witness validator. Natively,
Infer does not support the generation of GraphML witnesses. To mitigate this,
Infer-sv creates generic witnesses: When reporting a violation, it generates a
violation witness [4] that represents all possible program paths. When reporting a
program safe, it generates a correctness witness [3] that only contains the trivial
invariant ‘true’. These witnesses do not helpfully guide towards a violation or
proof, but are valid according to the SV-COMP rules.
Participation
.Infer-sv participates hors concours in the categories ReachSafety,
ConcurrencySafety, NoOverflows, and SoftwareSystems. Because of missing sup-
port, we exclude Infer-sv from categories aimed at float handling, as well as
category MemSafety-MemCleanup.
2https://fbinfer.com/docs/checker-bufferoverrun
3https://fbinfer.com/docs/checker-annotation-reachability
4https://github.com/facebook/infer/issues/763
5https://fbinfer.com/docs/checker-biabduction
6https://fbinfer.com/docs/checker-pulse
452 M. Kettl and T. Lemberger
.1 1 10 100
.1
1
10
100
900
900
CPU Time CPAchecker (s)
CPU Time Infer (s)
.1 1 10 100
.1
1
10
100
900
900
CPU Time Symbiotic (s)
CPU Time Infer (s)
.1 1 10 100
.1
1
10
100
900
900
CPU Time VeriAbs (s)
CPU Time Infer (s)
Fig. 1: Comparison of the run time (in CPU time seconds) of three SV-COMP ’22
medalists and Infer, across all tasks correctly solved by the respective pair
1in t m ai n () {
2if ( 0 ) {
3in t x = 0 x7 f ff ff f f + 1;
4}
5}
(a) Infer correctly reports safety
1vo id r ea ch _ er ro r () {
2in t x = 0 x7 f ff ff f f + 1;
3}
4in t m ai n () {
5if ( 0 ) {
6re a ch _e rr or ( );
7}
8}
(b) Infer incorrectly reports an alarm
1in t m ai n () {
2in t x = 0 x 7 f f f f ff f ;
3in t y = -1 ;
4wh i l e ( x > 0 ) {
5x = x - 2* y ;
6}
7}
(c) Infer correctly reports an alarm
1in t m ai n () {
2in t x = 0 x7 f ff f ff f ;
3in t y = -1 ;
4wh i l e ( x > 0 ) {
5x = x - 2* y ;
6y = y + 2;
7}
8}
(d) Infer incorrectly reports safety
Fig. 2: Examples of Infer’s inconsistent results
2.2 Strengths of Infer
Infer scales well [6]. This shows in the SV-COMP results: For
6 000
out of
8 000
tasks with a verification verdict, Infer finishes the analysis in less than
one second of CPU time. The remaining
2 000
tasks each take less than
100 s
of CPU time. This means that Infer stays significantly below the time limit of
900 s
per task. Figure 1 compares the run time of Infer (in CPU-time seconds)
to the best SV-COMP ’22 tools in the categories that Infer participated in:
CPAchecker [11], Symbiotic [8], and VeriAbs [12]. Each plot shows the
run time for all tasks that are correctly solved by both Infer and the respective
other verifier (independent of result validation). It is visible that Infer (y-axis) is
significantly faster than the other tools (x-axis) for almost all tasks. This speed
makes Infer integrate well in continuous-integration development systems [13,15].
The Static Analyzer Infer in SV-COMP (Competition Contribution) 453
2.3 Weaknesses of Infer
Infer demonstrates low analysis precision. Figures 2a and 2b illustrate a low
precision across function calls (intraprocedural analysis): Both programs contain
an unreachable, signed integer overflow. The only difference is the indirection in
Fig. 2b due to the additional function call. Infer correctly reports Fig. 2a safe,
but incorrectly reports an alarm for Fig. 2b. We assume that the intraprocedural
analysis of Infer does not check whether
reach_error
is reachable from the
program entry. Infer-sv mitigates this issue for property unreach-call through
the mentioned program transformation, but this imprecision still leads Infer to
report wrong alarms across all program properties.
Infer can also show imprecision within a single function. Consider Figs. 2c
and 2d: The only change between Fig. 2c and Fig. 2d is the addition of a
statement in line 6,
y=y+2
. This has no influence on the integer overflow in
line 5, so both programs contain an overflow. Infer correctly reports the overflow
for Fig. 2c, but wrongly reports Fig. 2d safe.
These imprecisions strongly reflect in the SV-COMP results of Infer, leading
to many incorrect proofs and alarms.
3 Usage
Infer-sv requires Python 3.6 or later. Script
setup.sh
downloads and extracts
version 1.1.0 of Infer. From the tool’s directory, Infer-sv can be run with the
following command:
./ i nf er -w r ap p er . p y \
- -d at a - m od e l { I L P3 2 o r L P 64 } \
- - pr o p er t y p a th / t o / p ro p e rt y . p r p \
- -p r og r am p at h / to / pr o gr a m .c \
Setting the data model is optional. Infer-sv will print the recognized property
and the command line it uses to call Infer.Infer-sv prints the full output of
Infer, including all warnings, and the final verification verdict on the last line.
The verification verdict can be true,false,unknown or error.
4 Conclusion
The participation of Infer in SV-COMP allows an objective comparison with
other verifiers for C. This shows that the selected analyses of Infer are very
efficient, but suffer from strong imprecision on the considered benchmark tasks.
Contributors
.Infer
7
is developed by Facebook and the open-source community
under the MIT license, and Infer-sv
8
is developed under the Apache 2.0 license at
the Software and Computational Systems Lab at LMU Munich, led by Dirk Beyer.
7https://github.com/facebook/infer
8https://gitlab.com/sosy-lab/software/infer-sv
454 M. Kettl and T. Lemberger
Funding Statement
. This work was funded in part by the Deutsche Forschungs-
gemeinschaft (DFG) 418257054 (Coop).
Data Availability Statement
. All data of SV-COMP 2022 are archived as
described in the competition report [1] and available on the competition web site.
This includes the verification tasks, results, witnesses, scripts, and instructions
for reproduction. The version of our verifier as used in the competition is archived
together with other participating tools [2].
References
1.
Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS.
Springer (2022)
2.
Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software Verifi-
cation (SV-COMP 2022). Zenodo (2022). https://doi.org/10.5281/zenodo.5959149
3.
Beyer, D., Dangl, M., Dietsch, D., Heizmann, M.: Correctness witnesses: Exchanging
verification results between verifiers. In: Proc. FSE. pp. 326–337. ACM (2016).
https://doi.org/10.1145/2950290.2950351
4.
Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Stahlbauer, A.: Witness validation
and stepwise testification across software verifiers. In: Proc. FSE. pp. 721–733. ACM
(2015). https://doi.org/10.1145/2786805.2786867
5.
Beyer, D., Spiessl, M.: The static analyzer Frama-C in SV-COMP (competition
contribution). In: Proc. TACAS (2). Springer (2022)
6.
Calcagno, C., Distefano, D., Dubreil, J., Gabi, D., Hooimeijer, P., Luca, M.,
O’Hearn, P.W., Papakonstantinou, I., Purbrick, J., Rodriguez, D.: Moving fast
with software verification. In: Proc. NFM. pp. 3–11. LNCS 9058, Springer (2015).
https://doi.org/10.1007/978-3-319-17524-9_1
7.
Calcagno, C., Distefano, D., O’Hearn, P.W., Yang, H.: Compositional
shape analysis by means of bi-abduction. ACM
58
(6), 26:1–26:66 (2011).
https://doi.org/10.1145/2049697.2049700
8.
Chalupa, M., Řechtáčková, A., Mihalkovič, V., Zaoral, L., Strejček, J.: Symbiotic
9: Parallelism and invariants (competition contribution). In: Proc. TACAS (2).
Springer (2022)
9.
Chowdhury, A.B., Medicherla, R.K., Venkatesh, R.: Verifuzz: Program aware fuzzing
- (competition contribution). In: Proc. TACAS, part 3. pp. 244–249. LNCS 11429,
Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_22
10.
Clarke, E.M., Kröning, D., Lerda, F.: A tool for checking ANSI-C programs. In: Proc.
TACAS. pp. 168–176. LNCS 2988, Springer (2004). https://doi.org/10.1007/978-3-
540-24730-2_15
11.
Dangl, M., Löwe, S., Wendler, P.: CPAchecker with support for recursive programs
and floating-point arithmetic (competition contribution). In: Proc. TACAS. pp.
423–425. LNCS 9035, Springer (2015). https://doi.org/10.1007/978-3-662-46681-
0_34
12.
Darke, P., Agrawal, S., Venkatesh, R.: VeriAbs: A tool for scalable verification
by abstraction (competition contribution). In: Proc. TACAS (2). pp. 458–462.
LNCS 12652, Springer (2021). https://doi.org/10.1007/978-3-030-72013-1_32
13.
Distefano, D., Fähndrich, M., Logozzo, F., O’Hearn, P.W.: Scaling static analyses at
Facebook. Commun. ACM 62(8), 62–70 (2019). https://doi.org/10.1145/3338112
The Static Analyzer Infer in SV-COMP (Competition Contribution) 455
14.
Distefano, D., O’Hearn, P.W., Yang, H.: A local shape analysis based on sepa-
ration logic. In: Proc. TACAS. LNCS, vol. 3920, pp. 287–302. Springer (2006).
https://doi.org/10.1007/11691372_19
15.
Harman, M., O’Hearn, P.W.: From start-ups to scale-ups: Opportunities and open
problems for static and dynamic program analysis. In: Proc. SCAM. pp. 1–23. IEEE
(2018). https://doi.org/10.1109/SCAM.2018.00009
16.
Malík, V., Schrammel, P., Vojnar, T.: 2ls: Heap analysis and memory safety (com-
petition contribution). In: Proc. TACAS (2). pp. 368–372. LNCS 12079, Springer
(2020). https://doi.org/10.1007/978-3-030-45237-7_22
Open Access
This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
456 M. Kettl and T. Lemberger
LART: Compiled Abstract Execution
(Competition Contribution)
Henrich Lauko ⋆⋆ and Petr Roˇckai
Faculty of Informatics, Masaryk University, Brno, Czech Republic
xlauko@mail.muni.cz
Abstract. lart llvm abstraction and refinement tool originates
from the divine model-checker [5,7], in which it was employed as an
abstraction toolchain for the llvm interpreter. In this contribution, we
present a stand-alone tool that does not need a verification backend but
performs the verification natively. The core idea is to instrument abstract
semantics directly into the program and compile it into a native binary
that performs program analysis. This approach provides a performance
gain of native execution over the interpreted analysis and allows compiler
optimizations to be employed on abstracted code, further extending the
analysis efficiency. Compilation-based abstraction introduces new chal-
lenges solved by lart , like domain interaction of concrete and abstract
values simulation of nondeterministic runtime or constraint propagation.
Keywords: Abstract interpretation ·Compilation-based abstraction ·
llvm ·lart ·divine ·Formal verification ·Symbolic execution.
1 Verification Approach and Software Architecture
As it is with many tasks in computer science, one can approach them in multiple
ways, and verification is not an exception. In general, tools approach program
analysis using an interpretation, giving them complete control over a program
state and program execution but paying the cost for performance. Our tool lart
challenges the task utilizing the toolset from the opposite side of the spectrum
compilation using a technique of so-called compilation-based abstraction. The
main idea of this approach is to compile nondeterministic execution directly
into the executable and perform reachability analysis by its native execution.
This approach is most similar to one presented in symcc [6]. Symcc performs a
compilation of symbolic execution into the native binary. In contrast, we present
a more general approach that allows arbitrary abstraction. Spin model checker
[4] also provides a mode where the model is compiled together with a verifier to
a single executable.
During the compilation, lart performs llvm-to-llvm transformation to aug-
ment instructions that can manipulate with nondeterministic values. This is
This work has been partially supported by Red Hat, Inc.
⋆⋆ Jury member representing lart at sv-comp 2022.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 457–461, 2022.
https://doi.org/10.1007/978-3-030-99527-0_31
a purely syntactic abstraction of a program, e.g., add instruction is replaced
by call to lart add. Additionally, lart provides a set of semantic libraries
(abstract domains) to give meaning to abstract instruction. Each abstract do-
main defines the native representation of abstract values, implements abstract
instructions and transformations to and from concrete values and other domains.
The tool provides multiple domains that allow analyses with various precisions,
e.g., interval analysis, nullity analysis, or symbolic analysis. Finally, to allow
native execution, domains are present in static libraries linked to instrumented
programs under test.
In comparison to concrete programs, abstracted programs also exhibit non-
deterministic control flow. To explore all possible execution paths, lart provides
a configurable runtime library. The overall architecture of compilation-based ab-
straction is depicted in figure Figure 1.
The configuration used in the competition contribution employs an itera-
tive deepening search of program paths. At each branching point of a program,
the execution forks to explore all possibilities. Finally, the main process of the
analysis gathers results from explored paths and notifies the user if an error
is reachable. This approach eventually suffers from potential infinite loops and
path explosion problem. However, it is sufficient for bug hunting or even verifi-
cation in the case of employed overapproximative abstraction, which widens the
effect of infinite loops. Also, in many simple cases, a compiler can summarize
the effects of program loops, minimizing the impact of path explosion.
C program
Compiler -Os
aa +dfa
instrumentation abstract binary
domain lib.
Z3 lib.
runtime lib.
native binary
Z3 solver
fault-handler
shadow mem.
correct/error
compilation execution resultlinking
link
Fig. 1. lart architecture overview.
In order to obtain a performant result, we strive to minimize the amount
of syntactic abstraction. Instrumentation achieves this by combining forward
dataflow analysis and Andersen alias analysis [1], tainting only those instructions
that might encounter nondeterministic values, and abstracting only the tainted
instructions. This analysis is entirely overapproximative and detects all possible
candidates for abstraction quickly. The actual abstract computation is resolved
later during execution.
However, we don’t want to perform expensive abstract computation when
tainted instructions do not obtain nondeterministic operands. This might occur
when a C function at one point receives concrete arguments and at another call
458 Henrich Lauko and Petr Roˇckai
site some abstract arguments. In the former case, we would like to perform it fully
concretely. While in the latter, we want to execute only the necessary amount
of tainted instructions abstractly. Therefore, lart synthesizes simple dispatch
routines that pick a concrete or abstract instruction depending on the operands.
The dispatch routine also handles the possibility of mixing concrete and abstract
operands lifting concrete values to an abstract domain if necessary. We require
that all operands of abstract instructions are in the same domain. See an example
of dispatch in Figure 2.
__lart_value __lart_dispatch_add(__lart_value a , __lart_value b ) {
if ( i s _a bs t ra c t (a ) | | i s _a b st r ac t ( b) ) {
if ( ! i s _a b s t ra c t ( a ) )
a. ab s t ra c t = li ft ( a . c on c re t e );
el s e i f (! is _ a b st r a ct ( b ))
b. ab s t ra c t = li ft ( b . c on c re t e );
return do m ai n : : a dd ( a . ab s tr a ct , b . a b st r a ct ) ;
}
return a. c o nc r et e + b. c o nc r et e ;
}
Fig. 2. Syntactically abstracted values in lart are represented in union type of an
abstract or concrete type ( lart value). The dispatch routine lifts operands to an
abstract domain and resolves in which domain the instruction should be executed.
Since the abstraction dispatch is purely syntactic, it can be inlined to abstracted source
code and further optimized. This gives the compiler a possibility to optimize repeated
checks in dispatch routines.
The runtime for native execution takes care of multiple responsibilities. First
of all, it implements an execution fork when a branch is conditioned by the
abstract value that results in both possibilities, e.g., when a branch is conditioned
on symbolic term x < 5, both outcomes are possible. Furthermore, the runtime
takes care of memory management of abstraction. To not disrupt the original
program’s memory layout, lart keeps all abstract data in a shadow memory.
Therefore the union values presented in Figure 2 are split into two separately
addressed regions concrete program memory and abstract shadow memory.
The information on whether variables hold an abstract value is also kept in the
shadow memory.
2 Strengths and Weaknesses
The main strength of the compilation-based abstraction is in the utilization of
native runtime and compiler optimizations on abstracted code. From theory, the
native execution should consistently outperform the same interpreted analysis.
However, it comes at the cost of a more complex source transformation that is
harder to relate to its origin. Furthermore, the overapproximative nature of the
syntactic analysis produces unnecessary execution of dispatch functions when
not needed. In contrast, an interpreter can compute in specific domain without
additional dispatches. Another advantage of the approach is a reusable result
of syntactic abstraction that can be linked with various domains to perform
analysis concurrently without repeated llvm instrumentation.
LART: Compiled Abstract Execution 459
The best comparison of lart is with the divine model-checker, which uses
lart’s transformation and domain libraries internally, but instead of compiling
to native executable, it interprets abstracted llvm ir. Results from the com-
petition support the hypothesis that the compilation-based approach of lart
outperforms divine in all reachability subcategories, except one where longer
times are caused by different state space exploration order.
Given the simplistic runtime, abstracted binaries produced by lart lack
further analysis optimizations and verification capabilities. Presently, the explo-
ration algorithm only supports reachability analysis of single-threaded programs.
However, we plan to support memory safety and overflow checking using sani-
tizers like approach.
Another goal of lart’s compilation-based approach is to provide a reusable
abstraction component for verification tools. The proof of this concept is shown
on divine and now on the native mode that can be analyzed by standard pro-
grammer toolset, like debuggers or sanitizers.
3 Tool Setup and Configuration
The verifier archive can be found on the sv-comp 2022 [2] page under the name
lart. In case the binary distribution does not work on your system, we also pro-
vide a source distribution and build instructions at https://github.com/xlauko/
lart/tree/svcomp-2022. It is sufficient to run lart using compiler wrapper script
as follows: lartcc <domain> testcase.c -o abstract and then execute the
abstract binary to perform the analysis.
For sv-comp contribution, lart wrapper handles additional settings and
setup of workflow presented in Figure 1. The wrapper sets lart options based
on the property file and the benchmark. In particular, lart enables symbolic
mode if any nondeterminism is found, and it sets which errors should be reported
based on the property file. It also generates witness files. More details can be
found on the aforementioned distribution page. Due to support limitation lart
participates only in ReachSafety and DeviceDrivers categories.
4 Software Project and Contributors
The project home page is https://github.com/xlauko/lart. The lart is open
source software distributed under the MIT license. Active contributors to the
tool are listed as authors of this paper.
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [2] and available on the competition web site. This includes
the verification tasks, results, witnesses, scripts, and instructions for reproduction.
The version of our verifier as used in the competition is archived together with other
participating tools [3].
460 Henrich Lauko and Petr Roˇckai
References
1. Andersen, L.O.: Program analysis and specialization for the C programming lan-
guage. Ph.D. thesis, Citeseer (1994)
2. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS.
Springer (2022)
3. Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software Verifi-
cation (SV-COMP 2022). Zenodo (2022). https://doi.org/10.5281/zenodo.5959149
4. Holzmann, G., Najm, E., Serhrouchni, A.: Spin model checking: An introduction.
STTT 2, 321–327 (03 2000). https://doi.org/10.1007/s100090050039
5. Lauko, H., Roˇckai, P., Barnat, J.: Symbolic computation via program
transformation. Theoretical Aspects of Computing ICTAC 2018 (2018).
https://doi.org/10.1007/978-3-030-02508-3 17
6. Poeplau, S., Francillon, A.: Symbolic execution with SymCC: Don’t interpret,
compile! In: 29th USENIX Security Symposium (USENIX Security 20). pp.
181–198. USENIX Association (Aug 2020), https://www.usenix.org/conference/
usenixsecurity20/presentation/poeplau
7. Roˇckai, P., ˇ
Still, V., ˇ
Cern´a, I., Barnat, J.: DiVM: Model Checking with
LLVM and Graph Memory. Journal of Systems and Software 143, 1–13 (2018).
https://doi.org/10.1016/j.jss.2018.04.026
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
LART: Compiled Abstract Execution 461
Symbiotic 9: String Analysis and Backward
Symbolic Execution with Loop Folding
(Competition Contribution)
Marek Chalupa B, Vincent Mihalkoviˇc, Anna ˇ
Recht´ckov´a,
Luk´s Zaoral, and Jan Strejˇcek
Masaryk University, Brno, Czech Republic
Abstract. The development of Symbiotic 9 focused mainly on two
components. One is the symbolic executor Slowbeast, which newly
supports backward symbolic execution including its extension called loop
folding. This technique can infer inductive invariants from backward sym-
bolic execution states. Thanks to these invariants, Symbiotic 9 is able
to produce non-trivial correctness witnesses, which is a feature that is
missing in previous versions of Symbiotic. We have also extended for-
ward symbolic execution in Slowbeast with a basic support for par-
allel programs. The second component with significant improvements is
the instrumentation module. In particular, we have extended the static
analysis of accesses to arrays with features designed for programs that
manipulate C strings.
Symbiotic 9 is the Overall winner of SV-COMP 2022. Moreover, it won
also the categories MemSafety and SoftwareSystems, and placed third in
FalsificationOverall.
1 Verification Approach
Symbiotic 9 combines fast static analyses with code instrumentation and pro-
gram slicing [13] to speed up the code verification. In the SV-COMP configura-
tion of Symbiotic 9, the code verification is performed by symbolic executors,
namely by Slowbeast [8] and our fork of Klee [4].
As Symbiotic works internally with llvm [10], it first compiles the given C
program into llvm bitcode. The following steps depend on the verified property.
Verification of the Property unreach-call For this property, Symbiotic 9
directly slices the llvm bitcode to remove instructions that have no influence
on the reachability of error calls and then run Klee with the time limit of
333 seconds. Klee is very efficient and often decides the task within this time
limit. If Klee fails to decide, we parse its output and proceed according to the
case of the failure. If Klee failed because the program contains threads, we
This work has been supported by the Czech Science Foundation grant GA19-24397S.
BJury member and the corresponding author: chalupa@fi.muni.cz
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 462–467, 2022.
https://doi.org/10.1007/978-3-030-99527-0_32
Table 1. The comparison of supported features of Klee (our fork and the upstream)
and Slowbeast (SV-COMP 2022 and SV-COMP 2021 versions). The marks 3/3/7
mean supported/partially supported/unsupported.
Klee Klee Slowbeast Slowbeast
upstream our fork SV-COMP 2021 SV-COMP 2022
Backward SE 7 7 7 3
Loop folding 7 7 33
Invariant generation 7 7 3 3
Symbolic floats 7 7 3 3
Symbolic pointers 3 3 7 3
Symbolic-sized allocations 7 3 7 3
Symbolic addresses 7 3 7 3
Parallel programs 7 7 7 3
Incremental solving 7 7 33
Caching solver calls 3 3 7 7
Lazy memory 7 7 3 3
run Slowbeast with forward symbolic execution (SE) and the threads support
turned on. If Klee failed for any other reason, we run Slowbeast with backward
symbolic execution with loop folding (BSELF) [8] described later. If BSELF also
fails (the current implementation supports only selected program features), we
run Slowbeast with forward symbolic execution.
Note that running forward symbolic execution first with Klee and then with
Slowbeast if Klee fails does make a good sense as Klee and Slowbeast sup-
port a different set of features. The main differences between these tools (and
the upstream Klee and the version of Slowbeast used in Symbiotic 8) are
summarized in Table 1. Row symbolic addresses indicates whether tools model
the non-determinism in the placement of allocated objects (this is useful, e.g.,
when comparing addresses of such objects). Row incremental solving indicates
whether tools can associate the state of an SMT solver to every symbolic execu-
tion state and incrementally add constraints instead of always solving formulas
from the scratch. Row caching solver calls indicates whether tools can remem-
ber results of solver calls and use them later to quickly decide some other solver
calls. Finally, row lazy memory indicates if the tool can create memory objects
on-demand when first accessing them, without their previous allocation (it as-
sumes that the accesses to memory are valid). This feature is crucial when we
want execute a program by parts, without starting from the entry point. The
meaning of the remaining rows should be clear or is explained later.
If an error is found by either tool, it is replayed on the unsliced code. If the
replay succeeds, we generate a violation witness. If no error is found and the anal-
ysis was complete, we generate a correctness witness. If the program correctness
was proved by Slowbeast with BSELF, we generate a witness containing the
computed invariants, otherwise we generate a trivial correctness witness as we
have no invariants at hand. In all other cases, Symbiotic 9 answers unknown.
Symbiotic 9: String Analysis and BSELF 463
Verification of Other Properties For verification of other properties than
unreach-call,Symbiotic 9 uses the same workflow as Symbiotic 8 [7]. In
brief, the instrumentation module marks program instructions that can po-
tentially violate the considered property. The module employs suitable fast
static analyses to identify these instructions (e.g., when checking the property
no-overflow, it uses a range analysis to discover the instructions that may per-
form a signed integer overflow). The bitcode with marked instructions is sliced
such that the arguments and the reachability of these instructions are preserved.
The sliced bitcode is passed to Klee. If it discovers a property violation and
then replays it on the unsliced code, we produce a violation witness. If Klee
completes its analysis without any property violation found, we produce a trivial
correctness witness. In all other cases, Symbiotic 9 returns unknown.
Backward Symbolic Execution with Loop Folding (BSELF) [8]Slow-
beast newly implements backward symbolic execution (BSE) [9], which explores
the program backward from target locations towards the initial location and
incrementally computes weakest preconditions for the explored program paths.
BSE is a valuable technique on its own as it precisely corresponds to k-induction
on control-flow paths [8]. Loop folding is a technique that aims to infer induc-
tive invariants during BSE. Roughly speaking, when BSE starts from an error
location and reaches a loop header, loop folding creates an initial invariant can-
didate that is disjoint with the current weakest precondition (i.e., the states that
can reach the error location). If the invariant candidate is actually an invariant,
we know that the error location is not reachable via the explored path. Oth-
erwise, a pre-image of the invariant candidate along a loop path is computed,
over-approximated, and added to the candidate. This process is repeated until an
invariant is found or until it fails for some reason, e.g., when it discovers that the
error location is actually reachable. Loop folding can infer complex disjunctive
invariants and since it uses the error states, it is also property-driven.
String Analysis and Other Improvements The second major improvement
in Symbiotic 9 is in the instrumentation for the property valid-memsafety. We
have improved the analysis for the identification of out-of-bounds array accesses.
In Symbiotic 8, this analysis only determined whether an array access done
via the index variable is in bounds [14]. The analysis in Symbiotic 9 also handles
more general patterns where the array contains a concrete value (0 in the case
of C strings) and the index pointer is incremented by one until it points to this
concrete value, and where the pointer is incremented a fixed number of times.
Further, we have extended the forward symbolic execution in Slowbeast to
handle parallel programs. For now, the symbolic execution is highly inefficient
as it examines each interleaving of globally visible events. We plan to implement
some reductions in the future. Slowbeast has been also extended to generate
witnesses as this functionality was missing. Notably, it can generate non-trivial
correctness witnesses using the invariants computed by BSELF. Previous ver-
sions of Symbiotic generate only trivial correctness witnesses.
464 Chalupa et al.
Slicing has been also improved. It now applies a fast and coarse slicing be-
fore the main slicing. The coarse slicing detects all basic blocks from which no
slicing criterion (i.e., an instruction whose reachability and arguments should
be preserved) is syntactically reachable and replaces them by calls to abort.
2 Strengths and Weaknesses
Forward symbolic execution is unable to fully analyze unbounded loops or in-
finite execution paths. Hence, unless program slicing removes the unbounded
computation from the program, forward symbolic execution cannot verify it.
However, backward symbolic execution and BSELF can fully analyze at least
some unbounded programs [8]. Still, both these methods are computationally
complex as the number of paths they must search may be enormous and their
exploration may involve many non-trivial calls to the SMT solver. Therefore,
these methods do not scale to real-world programs.
A strong aspect of Symbiotic is the very interplay of fast static analyses
in the instrumentation, program slicing, and forward and backward symbolic
execution. Fast static analyses are able to deem correct many parts of the code
(with respect to the verified property). These parts of the code are then usu-
ally removed by slicing and only the possibly unsafe parts of the program (and
their dependencies) get into a symbolic executor. In this sense, Symbiotic does
incremental or conditional [3] verification.
Results of Symbiotic 9 in SV-COMP 2022 In SV-COMP 2022 [1], Sym-
biotic 9 won categories MemSafety,SoftwareSystems, and Overall, and got the
3rd place in FalsificationOverall. Moreover, it produced 1529 correct answers
that were not confirmed, which is the highest number in SV-COMP 2022. 1073
unconfirmed answers are in MemSafety-Juliet, where we produced some incorrect
witnesses due to a bug. Another 258 unconfirmed answers are in Termination.
Symbiotic 9 produced only 3 incorrect answers caused by a bug in the replay
mode of Slowbeast.
3 Software Project and Contributors
All components of Symbiotic 9 use llvm 10 [10]. Slicer and instrumentation
module are written in C++ and extensively use the library DG [5]. Klee is
implemented in C++ and Slowbeast [12] is written in Python. Both symbolic
executors use Z3 [11] as the SMT solver. Control scripts are written in Python.
Symbiotic 9 and all its components and external libraries are available under
open-source licenses that comply with SV-COMP’s policy for the reproduction
of results. Symbiotic 9 participated in all categories of SV-COMP 2022 except
the categories with Java programs.
Symbiotic 9 has been developed by Marek Chalupa, Vincent Mihalkoviˇc,
Anna ˇ
Recht´ckov´a, and Luk´s Zaoral under the supervision of Jan Strejˇcek.
Symbiotic 9: String Analysis and BSELF 465
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [1] and available on the competition web site. This includes
the verification tasks, results, witnesses, scripts, and instructions for reproduction.
The version of Symbiotic used in the competition is archived together with other
participating tools [2] and also in its own artifact [6] at Zenodo.
References
1. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS.
Springer (2022)
2. Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software Verifi-
cation (SV-COMP 2022). Zenodo (2022). https://doi.org/10.5281/zenodo.5959149
3. Beyer, D., Jakobs, M.: Fred: Conditional model checking via reducers and
folders. In: SEFM 2020. LNCS, vol. 12310, pp. 113–132. Springer (2020).
https://doi.org/10.1007/978-3-030-58768-0 7
4. Cadar, C., Dunbar, D., Engler, D.R.: KLEE: Unassisted and automatic gener-
ation of high-coverage tests for complex systems programs. In: OSDI. pp. 209–
224. USENIX Association (2008), http://www.usenix.org/events/osdi08/tech/full
papers/cadar/cadar.pdf
5. Chalupa, M.: DG: analysis and slicing of LLVM bitcode. In: ATVA 2020.
LNCS, vol. 12302, pp. 557–563. Springer (2020), https://doi.org/10.1007/
978-3-030-59152- 6 33
6. Chalupa, M.: Symbiotic 9: String analysis and backward symbolic execution with
loop folding (artifact). Zenodo (2022). https://doi.org/10.5281/zenodo.5947909
7. Chalupa, M., Jaˇsek, T., Nov´ak, J., ˇ
Recht´ckov´a, A., ˇ
Sokov´a, V., Strejˇcek, J.: Sym-
biotic 8: Beyond symbolic execution - (competition contribution). In: TACAS 2021.
LNCS, vol. 12652, pp. 453–457. Springer (2021). https://doi.org/10.1007/978-3-
030-72013-1 31
8. Chalupa, M., Strejˇcek, J.: Backward symbolic execution with loop folding. In: SAS
2021. LNCS, vol. 12913, pp. 49–76. Springer (2021). https://doi.org/10.1007/978-
3-030-88806-0 3
9. Chandra, S., Fink, S.J., Sridharan, M.: Snugglebug: a powerful approach
to weakest preconditions. In: PLDI 2009. pp. 363–374. ACM (2009).
https://doi.org/10.1145/1542476.1542517
10. Lattner, C., Adve, V.S.: LLVM: A compilation framework for lifelong program
analysis & transformation. In: CGO 2004. pp. 75–88. IEEE Computer Society
(2004), https://doi.org/10.1109/CGO.2004.1281665
11. de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: TACAS
2008. LNCS, vol. 4963, pp. 337–340. Springer (2008), https://doi.org/10.1007/
978-3-540-78800- 3 24
12. Slowbeast repository.https://gitlab.com/mchalupa/slowbeast (2021)
13. Weiser, M.: Program slicing. In: Proceedings of ICSE. pp. 439–449. IEEE (1981)
14. ˇ
Recht´ckov´a, A.: Improving out-of-bound access checking in Symbiotic (2020),
https://is.muni.cz/th/tmq7m/, bachelor thesis, accessed 2022-02-02
466 Chalupa et al.
Symbiotic 9: String Analysis and BSELF 467
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
Symbiotic-Witch: A Klee-Based
Violation Witness Checker?
(Competition Contribution)
Paulína Ayaziová, Marek Chalupa , and Jan Strejček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
{xayaziov,chalupa,strejcek}@fi.muni.cz
Abstract. Symbiotic-Witch is a new tool for checking violation wit-
nesses in the GraphML-based format used at SV-COMP since 2015.
Roughly speaking, Symbiotic-Witch symbolically executes a given pro-
gram with Klee and simultaneously tracks the set of nodes the witness
automaton can be in. Moreover, it reads the return values of nondeter-
ministic functions specified in the witness and uses them to prune the
symbolic execution. The violation witness is confirmed if the symbolic
execution reaches an error and the current set of witness nodes contains
a matching violation node.
Symbiotic-Witch currently supports violation witnesses of reachability
safety,memory safety,memory cleanup, and overflow properties.
1 Verification Approach
We present a new checker of violation witnesses called Symbiotic-Witch. The
checker first loads a given violation witness in the GraphML format [5] and a
given program. Then it performs symbolic execution [11] of the program and
simultaneously tracks the progress of the execution in the witness automaton.
More precisely, every state of symbolic execution is accompanied by the set of
witness automaton nodes that can be reached under the executed program path.
If the symbolic execution detects a violation of the considered property and the
tracked set of witness automata nodes contains a violation node, the witness is
confirmed.
Note that the original description of the witness format [5] does not provide
any formal semantics of the format. We interpret it in the way that if an edge
in a witness automaton matches an executed program instructions, then we can
follow the edge but we can also stay in its starting node. Hence, if we have the
set of witness automaton nodes reached under a certain program path, then
prolongation of this path can add some nodes to this set, but it never removes
any node from the set. A brief reading of an upcoming detailed description of
the format [4] reveals that it can be the case that an edge matching an executed
program instruction has to be taken. If this is indeed the case, we will adjust
?This work has been supported by the Czech Science Foundation grant GA19-24397S.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 468–473, 2022.
https://doi.org/10.1007/978-3-030-99527-0_33
our tool, but the current implementation and the following texts consider the
former semantic.
Before Symbiotic-Witch starts the symbolic execution, we remove from
the witness automaton all nodes that are not on any path from the entry node
to a violation node. In general, witness automata are related to program exe-
cutions using node and edge attributes. Symbiotic-Witch currently supports
only some attributes of witness edges to map a program execution to a given
witness automaton. Namely, it uses the line number of executed instructions, the
information whether true or false branch is taken, and the information about
entering a function or returning from a function. Additionally, if the witness au-
tomaton contains a single path from the entry node to a violation node and there
is some information about return values of the __VERIFIER_nondet_* functions
on this path, then we use these values in the symbolic execution of the program.
Return values not provided in the witness are treated as symbolic values.
A more precise description of the approach can be found in the bachelor’s
thesis of P. Ayaziová [1].
2 Software Architecture
The approach has been implemented in a tool called Symbiotic-Witch, which
is basically a modification of the symbolic executor Klee [8]. More precisely,
it is derived from the clone of Klee used in Symbiotic, which employs the
SMT solver Z3 [13] and supports symbolic pointers, memory blocks of symbolic
sizes etc. For parsing of witnesses in the GraphML format, we use the library
RapidXML.
As Klee executes programs in llvm [12], a given C program has to be
translated to llvm first. We use Clang for this translation as explained in
Section 4.
The current version of Symbiotic-Witch runs on llvm version 10.0.0.
3 Strengths and Weaknesses
Existing violation witness checkers (excluding Dartagnan [10] designed for con-
current programs) can be roughly divided into two categories.
CPA-witness2test [6], FShell-witness2test [6], and Nitwit [14] per-
form one program execution based on the information in the witness. If this
execution violates the specification, the witness is confirmed. This approach
is very efficient for witnesses fully describing one program execution that vi-
olates the property. However, if a witness describes more program executions
and only some of them violate the property, these tools can easily miss the
violating executions. In particular, if a witness does not specify some return
value of a __VERIFIER_nondet_* function, FShell-witness2test uses the
default value 0, Nitwit picks a random value, and CPA-witness2test
fails the witness confirmation.
Symbiotic-Witch:AKlee-Based Violation Witness Checker 469
CPAchecker [5], UltimateAutomizer [5], and MetaVal [7] create a
product of a given witness automaton and the original program and analyze
it. As a result, some execution paths of the original program can be ana-
lyzed repeatedly for different paths in the witness automaton. To suppress
this effect, these checkers usually ignore the possibility to stay in a witness
automaton node whenever there is a matching transition leaving the node.
Unfortunately, a valid witness can be unconfirmed due to this strategy.
We believe that our approach to checking violation witnesses removes all
mentioned disadvantages. Symbolic execution allows us to efficiently examine
many program executions corresponding to a given witness automaton, and pro-
gram executions are not analyzed repeatedly. The approach can easily handle
witnesses based on return values from the __VERIFIER_nondet_* functions as
well as those based on description of branching.
There is only one principal case when a valid witness is not confirmed by
Symbiotic-Witch (ignoring the cases when Symbiotic-Witch simply runs
out of resources). This case can arise when Symbiotic-Witch uses the infor-
mation about return values of __VERIFIER_nondet_* functions stored in the
witness. Symbiotic-Witch uses the information immediately when the sym-
bolic execution calls such a function and there is a matching edge in the witness
with a return value that has not been used yet (i.e., the starting node of the
edge is in the set of tracked witness nodes and the target node is not). This “ea-
ger approach” usually works very well, especially for witnesses containing return
values for all calls of __VERIFIER_nondet_* functions. However, there can be
witnesses where some return values are missing and a particular contained return
value should not be used for the first matching call of the __VERIFIER_nondet_*
function. Such witnesses can be valid, but Symbiotic-Witch can fail to confirm
them. As far as we know, such witnesses do not appear in SV-COMP and other
witness checkers would probably fail to confirm them as well.
On the negative side, our approach inherits the disadvantages and limitations
of symbolic execution and Klee. In particular, it can suffer the path explosion
problem on witnesses that do not provide return values of __VERIFIER_nondet_*
functions. Further, Symbiotic-Witch does not support parallel programs as
Klee does not support them.
Our current approach is suitable for cases when a witness can be checked
based on a finite program execution. That is why our tool supports violation
witnesses of safety properties. Table 1shows the numbers of violation witnesses
confirmed in SV-COMP 2022 [2] by individual witness checkers in the categories
supported by Symbiotic-Witch.
We believe that symbolic execution can be also used for checking termination
violation witnesses and for checking correctness witnesses. We plan to extend
Symbiotic-Witch in these directions. We also plan to add a witness refinement
mode [5] already provided by CPAchecker and UltimateAutomizer. In this
mode, when a witness is confirmed, Symbiotic-Witch would produce another
witness describing a single program execution (by specifying return values for all
calls of __VERIFIER_nondet_* functions) that exhibits the property violation.
470 P. Ayaziov´aetal.
Table 1. The numbers of confirmed witnesses in relevant SV-COMP 2022 categories
ReachSafety MemSafety NoOverflows SoftwareSystems
number of witnesses 26 797 16 984 2 808 2 102
CPAchecker 14 908 12 594 2 334 621
CPA-witness2test 8 628 231 887 6
FShell-witness2test 14 168 954 1 436 33
MetaVal 0 116 1 982 0
Nitwit 15 507 - - 0
Symbiotic-Witch 11 176 8 394 2 609 179
UltimateAutomizer 8 592 4 197 2 468 26
4 Tool Setup and Configuration
For the use in SV-COMP 2022, we have integrated our witness checker (origi-
nally called Witch-Klee) with Symbiotic [9], which takes care of translation
of a given C program into llvm using Clang and then slightly modifies the
llvm program to improve the efficiency of witness checking.
The archive with Symbiotic-Witch can be downloaded from SV-COMP
archives. The witness checking process is invoked by
./symbiotic [–prp <prop>] [–32] –witness-check <wit.graphml> <prog.c>
where <wit.graphml> is a violation witness to be checked and <prog.c> is the
corresponding program. By default, the tool considers reachability safety prop-
erty and 64-bit architecture. The considered property can be changed by the –prp
option and <prop> instantiated to memsafety or memcleanup or no-overflow.
The 32-bit architecture is set by –32.
Our witness checker can be also downloaded directly from its repository men-
tioned below. The version used in SV-COMP 2022 is marked with the tag SV-
COMP22. It can be executed without Symbiotic via a shell script as
./witch.sh <prog.c> <wit.graphml>
which calls Clang to translate <prog.c> to llvm and then passes the llvm
program and the witness <wit.graphml> to the witness checker.
5 Software Project and Contributors
Symbiotic-Witch has been developed at Faculty of Informatics, Masaryk Uni-
versity by Paulína Ayaziová under the guidance of Marek Chalupa and Jan
Strejček. The tool is available under the MIT license and all used tools and
libraries (llvm,Klee,Z3,RapidXML,Symbiotic) are also available under
open-source licenses that comply with SV-COMP’s policy for the reproduction
of results. The source code of our witness checker can be found at:
https://github.com/ayazip/witch-klee
Symbiotic-Witch: A Klee-Based Violation Witness Checker 471
Data Availability Statement. All data of SV-COMP 2022 are archived as described
in the competition report [2] and available on the competition web site. This includes
the verification tasks, results, witnesses, scripts, and instructions for reproduction. The
version of Symbiotic-Witch used in the competition is archived together with other
participating tools [3].
References
1. Ayaziová, P.: Klee-based error witness checker. Bachelor’s thesis, Masaryk Univer-
sity (2021), https://is.muni.cz/th/rnv19/?lang=en
2. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS.
Springer (2022)
3. Beyer, D.: Verifiers and validators of the 11th Intl. Competition on Software Verifi-
cation (SV-COMP 2022). Zenodo (2022). https://doi.org/10.5281/zenodo.5959149
4. Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Lemberger, T., Tautschnig, M.:
Verification witnesses. ACM Trans. Softw. Eng. Methodol. (2022), to appear.
5. Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Stahlbauer, A.: Witness valida-
tion and stepwise testification across software verifiers. In: Nitto, E.D., Harman,
M., Heymans, P. (eds.) Proceedings of the 2015 10th Joint Meeting on Foundations
of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September
4, 2015. pp. 721–733. ACM (2015), https://doi.org/10.1145/2786805.2786867
6. Beyer, D., Dangl, M., Lemberger, T., Tautschnig, M.: Tests from witnesses -
execution-based validation of verification results. In: Dubois, C., Wolff, B. (eds.)
Tests and Proofs - 12th International Conference, TAP@STAF 2018, Toulouse,
France, June 27-29, 2018, Proceedings. Lecture Notes in Computer Science, vol.
10889, pp. 3–23. Springer (2018), https://doi.org/10.1007/978-3-319-92994-1_1
7. Beyer, D., Spiessl, M.: Metaval: Witness validation via verification. In: Lahiri, S.K.,
Wang, C. (eds.) Computer Aided Verification - 32nd International Conference,
CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part II. Lecture
Notes in Computer Science, vol. 12225, pp. 165–177. Springer (2020), https://doi.
org/10.1007/978-3-030-53291-8_10
8. Cadar, C., Dunbar, D., Engler, D.R.: KLEE: Unassisted and automatic gener-
ation of high-coverage tests for complex systems programs. In: OSDI. pp. 209–
224. USENIX Association (2008), http://www.usenix.org/events/osdi08/tech/
full_papers/cadar/cadar.pdf
9. Chalupa, M., Řechtáčková, A., Mihalkovič, V., Zaoral, L., Strejček, J.: Symbiotic
9: Parallelism and invariants (competition contribution). In: Proc. TACAS (2).
Springer (2022)
10. Haas, T., Meyer, R., de León, H.P.: Dartagnan: SMT-based violation witness
validation (competition contribution). In: Proc. TACAS (2). Springer (2022)
11. King, J.C.: Symbolic execution and program testing. Communications of ACM
19(7), 385–394 (1976), https://doi.org/10.1145/360248.360252
12. Lattner, C., Adve, V.S.: LLVM: A compilation framework for lifelong program
analysis & transformation. In: CGO 2004. pp. 75–88. IEEE Computer Society
(2004), https://doi.org/10.1109/CGO.2004.1281665
13. de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: TACAS
2008. LNCS, vol. 4963, pp. 337–340. Springer (2008), https://doi.org/10.1007/
978-3-540-78800-3_24
472 P. Ayaziov´a et al.
14. Švejda, J., Berger, P., Katoen, J.: Interpretation-based violation witness validation
for C: NITWIT. In: Biere, A., Parker, D. (eds.) Tools and Algorithms for the
Construction and Analysis of Systems - 26th International Conference, TACAS
2020, Held as Part of the European Joint Conferences on Theory and Practice
of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Part
I. Lecture Notes in Computer Science, vol. 12078, pp. 40–57. Springer (2020),
https://doi.org/10.1007/978-3-030-45190-5_3
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
Symbiotic-Witch: A Klee-Based Violation Witness Checker 473
Theta: portfolio of CEGAR-based analyses with
dynamic algorithm selection (Competition
Contribution)
Zs´ofia ´
Ad´am1, Levente Bajczi1, Mih´aly Dobos-Kov´acs1,´
Akos Hajdu2,
and Vince Moln´ar1(B)
1Department of Measurement and Information Systems
Budapest University of Technology and Economics, Budapest, Hungary
molnarv@mit.bme.hu
2Meta Platforms Inc., London, United Kingdom
Abstract. Theta is a model checking framework based on abstraction
refinement algorithms. In SV-COMP 2022, we introduce: 1) reasoning at
the source-level via a direct translation from C programs; 2) support for
concurrent programs with interleaving semantics; 3) mitigation for non-
progressing refinement loops; 4) support for SMT-LIB-compliant solvers.
We combine all of the aforementioned techniques into a portfolio with
dynamic algorithm selection.
1 Verification Approach and Software Architecture
Theta [10] is a generic and configurable model checking framework written in
Java 11. A simplified version of the architecture (focusing on software verification
aspects) can be seen in Figure 1.
C code ANTLR
parser XCFA CEGAR
analysis Result
processing
Í/?/ë
Witness
Simplification
passes
SMT interface
MathSAT CVC4 Z3
Metadata
Fig. 1. Architecture of Theta.
The input is a C program that is first translated to extended control-flow
automata (XCFA). Previously, Theta used LLVM [3], which had various advan-
tages, but its static single assignment (SSA) form proved overall disadvantageous
for abstraction-based algorithms. This year we use a new, direct translation (no
Jury member representing Theta at SV-COMP 2022.
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 474–478, 2022.
https://doi.org/10.1007/978-3-030-99527-0_34
intermediate language and SSA form) via an ANTLR parser. Furthermore, the
CFA being “extended” refers to the fact that since this year we support con-
current programs by an analysis with interleaving semantics. After parsing we
apply various passes to the XCFA (e.g., large-block encoding or partial order re-
duction). The core of Theta is a CEGAR-based analysis framework, targeting
reachability properties via predicate and explicit analyses [8], along with inter-
polation and Newton-based refinements [7]. This year, Theta added generic
support for SMT solvers (including interpolation) via the SMT-LIB interface.
At SV-COMP’22 we use CVC4 [4], MathSAT [6], and Z3 [9], where the latter
is used via the Java API from before. Finally, a verdict (safe, unsafe, unknown)
and a witness is produced corresponding to the C program (using metadata from
the translation).
Has floats
Has bitvectors
Has loops and
cycl. compl. 30
300s
Mf/E/N Mf/PC/N
300s
C/E/N C/PC/N
yes
no ? ? ?
solver issue
300s
M/E/S M/PC/B
300s
Z/E/N Z/PC/N
? ? ?
solver issue
30s
Z/PC/B
yes
no
Havocs 5 and
variables >10
400s
Z/EA/S
500s
Z/E/S
Z/PC/B
Z/PB/B
yes
no
?yes
no
?
?
?
Fig. 2. Overview of the dynamic portfolio of Theta.
Verification portfolio. Based on preliminary experiments and domain knowl-
edge, we manually constructed a dynamic algorithm selection portfolio [1] for
SV-COMP’22, illustrated by Figure 2. Rounded white boxes correspond to deci-
sion points. We start by branching on the arithmetic (floats, bitvectors, integers).
Under integers, there are further decision points based on the cyclomatic com-
plexity and the number of havocs and variables. Grey boxes represent configura-
tions, defining the solver /domain /refinement in this order. Lighter and darker
grey represents explicit and predicate domains respectively. Internal timeouts
are written below the boxes. An unspecified timeout means that the configura-
tion can use all the remaining time. The solver can be CVC4 (C) [4], MathSAT
(M), MathSAT with floats (Mf) [6] or Z3 (Z) [9]. Abstract domains are explicit
values (E), explicit values with all variables tracked (EA), Cartesian predicate
abstraction (PC) or Boolean predicate abstraction (PB) [8]. Finally, refinement
can be Newton with weakest preconditions (N) [7], sequence interpolation (S) or
backward binary interpolation (B) [8]. Arrows marked with a question mark (?)
indicate an inconclusive result, that can happen due to timeouts or unknown re-
Theta 475
sults. Furthermore, this year’s portfolio also includes a novel dynamic (run-time)
check for refinement progress between iterations that can shut down potential
infinite loops (by treating them as unknown result) [1]. Note also that for solver
issues (e.g., exceptions from the solver) we have different paths in some cases.
2 Strengths and Weaknesses
Theta currently targets ReachSafety and ConcurrencySafety with limited sup-
port for structs, arrays and pointers, and no support for dynamic memory al-
location, mutexes and recursion. Due to this, Theta fails for most tasks in
ProductLines,Recursive,Heap and Arrays. Out of the 6163 tasks, roughly 2/3
can be translated and there are 888 confirmed correct (541 safe, 347 unsafe), 116
unconfirmed correct, and only 15 incorrect (11 false positive, 4 false negative)
results [5]. Note that almost all unsupported cases are detected and reported as
an error, and we only have a few incorrect results due to subtle issues.
The main strength of the tool is the combination of algorithm selection (pick
algorithm based on input) and portfolios (try multiple algorithms until one suc-
ceeds). Out of the 1004 correct results, 315 could not be solved by the first
configuration that the portfolio tries: dynamic checks intervened for 181 internal
timeouts, 72 solver issues (e.g. wrong models), 19 non-progressing refinements,
and 74 other (unknown) faults before the eventual success.
Having a diverse portfolio also paid off. Bitvector and float arithmetic tasks
were either solved by explicit analyses (with a mixture of interpolation- and
Newton-based refinements) before even trying predicate configurations, or if ex-
plicit analyses failed, predicate configurations were unsuccessful too. The inte-
ger arithmetic required a more diverse configuration set: Predicate abstraction
solved roughly 48% of the tasks (45% Cartesian, 3% Boolean) and explicit anal-
ysis solved 52% (33% with empty precision, 19% with all variables tracked).
The SMT-LIB support provided a great improvement: previously we only
had Z3, which still dominates the integer cases. However, all of the bitvector
tasks were solved by MathSAT, making Z3 an unused backup. With floats,
roughly half of the tasks were solved by MathSAT, while the other half needed
CVC4 as backup. Since floats are reduced to bitvectors, we did not rely on Z3
based on poor performance in our preliminary experiments.
The most successful subcategories are BitVectors,ControlFlow,Loops,XCSP
(38-45% correct), mostly because they use features of C that our frontend sup-
ports well. We plan to mitigate the high number of timeouts in the future with
approximations (e.g. mixing integers and bitvectors), and further analyses (e.g.,
inferring loop invariants). We also have a significant amount of unconfirmed
results: we believe this can be improved by generating more compact witnesses.
This year Theta added support for sequential concurrency via a preprocess-
ing step: it yields an encoding where exploring all interleavings preserve inter-
thread behaviors. The analyses treat consecutive non-global memory accesses
as one atomic block, reducing the exploration of unnecessary total orders. A
drawback of using preprocessing for partial order reduction instead of an on-line
476 Zs. ´
Ad´am et al.
algorithm is the superfluous exploration of certain total orders, e.g., all inter-
leavings of independent global memory accesses will also be explored. This is
because such accesses might overlap with non-independent memory accesses at
other times, and the preprocessing step is not aware of such details.
Using a wrapper, Theta integrates concurrency seamlessly with the exist-
ing framework (abstract domains, refinements), except the error location-based
search [8] (used for non-concurrent cases) because the required distance metric is
not well defined in concurrent programs. Instead, we opted to use a breadth-first
search, which had outperformed depth-first strategies in preliminary tests. We
theorize that this is due to bugs being reachable within the first few instructions
most of the time, but only via a specific total order. The performance for con-
current programs is still limited though, and we plan to integrate a declarative
approach in the future, which could be used for weakly-ordered programs as well.
3 Tool Setup and Configuration
The competition contribution is based on Theta 3.0.0-svcomp22-v1.3Addition-
ally, Theta uses CVC4 v1.9, MathSAT v5.6.6 and Z3 v4.5.0. The project’s
repository contains build instructions, but an archive can be found at the SV-
COMP repository4and Zenodo [2]. with pre-built binaries for Ubuntu 20.04
(LTS). The toolchain requires packages openjdk-11-jre-headless,libgomp1
and libmpfr-dev to be installed. The entry point of the toolchain is the script
theta/theta-start.sh, which takes the verification task (C program) as its
only mandatory input and runs the portfolio. As additional arguments we use
--portfolio COMPLEX --witness-only --loglevel RESULT. Additional argu-
ments are described in the readme included with the binaries.
4 Software Project
Theta is maintained by the Critical Systems Research Group5of the Budapest
University of Technology and Economics with various contributors. The project
is available open-source on GitHub3under an Apache 2.0 license.
Data Availability. The version of Theta used in this paper is available at [2].
Acknowledgment and Funding. The authors would like to thank Tam´as oth,
Mil´an Mondok, Istv´an Majzik, Zolt´an Micskei and Andr´as or¨os for their con-
tributions to the project; and the competition organizers, especially Dirk Beyer
for their help during the preparation for SV-COMP. The research contributions
of the authors from the Budapest Univ. of Tech. and Econ. were funded by the
EC and NKFIH through the Arrowhead Tools project (EU grant No. 826452,
NKFIH grant 2019-2.1.3-NEMZ ECSEL-2019-00003), and by the UNKP-21-2
New National Excellence Program of ITM from the NRDI Fund.
3https://github.com/ftsrg/theta/releases/tag/svcomp22-v1
4https://gitlab.com/sosy-lab/sv-comp/archives-2022/- /blob/main/2022/theta.zip
5https://ftsrg.mit.bme.hu
Theta 477
References
1. ´
Ad´am, Zs.: Efficient techniques for formal verification of C programs. Bachelor’s
thesis, Budapest University of Technology and Economics (2021)
2. ´
Adam, Zs., Levente, B., Dobos-Kov´acs, M., Hajdu, ´
A., Moln´ar, V.: Theta: portfolio
of CEGAR-based analyses with dynamic algorithm selection (competition contri-
bution): Tool archive (data set) (2022). https://doi.org/10.5281/zenodo.5956737
3. ´
Ad´am, Zs., Sallai, Gy., Hajdu, ´
A.: Gazer-Theta: LLVM-based verifier portfolio with
BMC/CEGAR (competition contribution). In: TACAS 2021, LNCS, vol. 12652, pp.
435–439. Springer (2021). https://doi.org/10.1007/978-3-030-72013-1 27
4. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanovi´c, D., King, T.,
Reynolds, A., Tinelli, C.: CVC4. In: CAV 2011, LNCS, vol. 6806, pp. 171–177.
Springer (2011). https://doi.org/10.1007/978-3-642-22110-1 14
5. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS.
Springer (2022)
6. Cimatti, A., Griggio, A., Schaafsma, B., Sebastiani, R.: The MathSAT5 SMT
solver. In: TACAS 2013, LNCS, vol. 7795, pp. 93–107. Springer (2013).
https://doi.org/10.1007/978-3-642-36742-7 7
7. Dobos-Kov´acs, M., Hajdu, ´
A., or¨os, A.: Bitvector support in the Theta formal
verification framework. In: Proceedings of the 2nd Workshop on Validation and
Verification of Future Cyber-Physical Systems (2021), in press.
8. Hajdu, ´
A., Micskei, Z.: Efficient strategies for CEGAR-based model
checking. Journal of Automated Reasoning 64(6), 1051–1091 (2020).
https://doi.org/10.1007/s10817-019-09535-x
9. de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: TACAS 2008, LNCS,
vol. 4963, pp. 337–340. Springer (2008). https://doi.org/10.1007/978-3-540-78800-
3 24
10. oth, T., Hajdu, ´
A., or¨os, A., Micskei, Z., Majzik, I.: Theta: a framework for ab-
straction refinement-based model checking. In: FMCAD 2017. pp. 176–179 (2017).
https://doi.org/10.23919/FMCAD.2017.8102257
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
478 Zs. ´
Ad´am et al.
Ultimate GemCutter
and the Axes of Generalization
(Competition Contribution)
Dominik Klumpp?1, Daniel Dietsch1, Matthias Heizmann1,
Frank Sch¨ussele1, Marcel Ebbinghaus1, Azadeh Farzan2, and
Andreas Podelski1
1University of Freiburg, Freiburg im Breisgau, Germany
klumpp@informatik.uni-freiburg.de
2University of Toronto, Toronto, Canada
Abstract. Ultimate GemCutter verifies concurrent programs using
the CEGAR paradigm, by generalizing from spurious counterexample
traces to larger sets of correct traces. We integrate classical CEGAR gen-
eralization with orthogonal generalization across interleavings. Thereby,
we are able to prove correctness of programs otherwise out-of-reach for
interpolation-based verification. The competition results show significant
advantages over other concurrency approaches in the Ultimate family.
1 Verification Approach
Ultimate GemCutter is a verification tool for concurrent programs based on
the CEGAR paradigm: It (1) picks a trace from the set of all program inter-
leavings (a possible “counterexample”), (2) proves correctness of this trace (the
counterexample is “spurious”), and (3) generalizes the proof to conclude that a
larger (usually infinite) set of traces is correct. Classically, CEGAR focuses on
generalization across traces with varying numbers of loop iterations, by finding
inductive loop invariants. GemCutter proposes additional generalization along
an orthogonal axis: across interleavings.
interleavings
iterations
cl(L)
L= (a1a2)b
equivalence class [τ]
τ=a1a2b a1ba2ba1a2
(a1a2)2b
(a1a2)3b
Concurrent programs contain many
redundant interleavings of actions from
different threads, i.e., interleavings with
the same (input/output-) behaviour. A
na¨ıve application of CEGAR requires ex-
plicit proofs of correctness for all these
interleavings. Intermediate states during
execution of redundant interleavings dif-
fer, and different interleavings often re-
quire different correctness proofs. Gem-
Cutter addresses this as illustrated in the figure on the right: We prove cor-
rectness of a trace τ, here τ=a1a2b, where a1, a2are actions of the first thread,
?Jury Member: Dominik Klumpp
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 479–483, 2022.
https://doi.org/10.1007/978-3-030-99527-0_35
and bis an action of the second thread. The proof of correctness is generated
using Craig interpolation or similar techniques. We generalize this proof into a
Floyd-Hoare automaton [8] to show that a regular language L(green area in
the figure above) of traces is correct. The new contribution is the subsequent
generalization step: If a trace τ1differs from a (correct) trace τ2in Lonly by
the ordering of independent statements, these traces are (Mazurkiewicz-) equiv-
alent [3]. We conclude that τ1is also correct. Hence, the set of all such traces,
denoted cl(L) (pink area), contains only correct traces. If the set of all program
interleavings Pis a subset of cl(L), we conclude that the program is correct.
To soundly make this conclusion, we need a suitable notion of independence
between statements, which guarantees that the order of execution of two indepen-
dent statements does not matter for program correctness. An intuitive sufficient
condition is that neither statement writes to a memory location read or written
by the other statement. If we cannot establish this condition syntactically, we use
an SMT solver to check if executing the statements in either order is guaranteed
to give the same result. We use information from the Floyd-Hoare automaton to
refine this check in the style of conditional independence [5]. Such information
can for instance express (but is not limited to) non-aliasing of pointers.
However, the inclusion Pcl(L) is in general undecidable [3]; cl (L) may
not be regular. We reverse our viewpoint to provide a sufficient condition that
can be effectively checked: Rather than adding all equivalent traces to L thus
obtaining cl(L) –, we instead remove all but one trace of each equivalence class
from P yielding a reduction P0of P(formally, cl(P0) = P). We use the
sleep set technique [5] to remove transitions from an automaton for Pto get an
automaton that recognizes one such reduction P0. We then check whether the
(regular) reduction P0is included in the (regular) language L. If this inclusion
P0Lholds, it implies that Pcl(L) also holds, and the program is correct.
If the inclusion does not (yet) hold, GemCutter picks another program trace
and repeats the process, iteratively building up the language Lof correct traces
by taking the union of the Floyd-Hoare automata computed in all iterations.
A key feature of the reduction-based approach is that the generalization
along the iteration and interleaving axes is combined not just additively, but
multiplicatively: In the geometrical intuition of the figure above, we do not just
take the union of L(green area) with the equivalence class [τ] of τ(blue area), but
consider all traces in cl(L) (the pink area which is spanned by both). Further,
we heuristically try to pick a set of representatives in a way that harmonizes
// T h r ea d 1 :
in t x = 0 ;
fo r (int i = 0 ; i < N ; + + i ) {
x + = A [ i ];
}
// T h r ea d 2 :
in t y = 0 ;
fo r (int j = 0 ; j < N ; + + j ) {
y + = A [ j ];
}
with CEGAR generalization, i.e., a reduction P0
with simple loop invariants. To this end, we pre-
fer representatives with context-switches at all
loop boundaries. Ideally, each thread performs
one complete loop iteration and then hands con-
trol over to the next thread (the last thread
hands back control to the first thread). Con-
sider the example program on the right, with
the postcondition x=y. Here, a proof for the
480 D. Klumpp et al.
set of all interleavings P, or some inopportunely chosen reduction, needs in-
variants that capture the fact that x=Pi
k=0 A[k], and similar for y. Such
invariants are usually not found by Craig interpolation. However, the loop in-
variant i=jx=ysuffices for the reduction that places context-switches
at all loop boundaries. The general idea is that for this kind of reduction, the
proof often needs to summarize only the effect of a single loop iteration rather
than unboundedly many iterations (which may require quantifiers or non-linear
arithmetic). Similar observations were first made by Farzan and Vandikas [4].
GemCutter furthermore aims to improve efficiency of the proof check, i.e.,
the check whether a reduction P0is a subset of the set of proven traces L. The
state explosion problem of concurrent programs makes the computation of an
automaton recognizing a reduction P0as well as the subsequent inclusion check
prohibitively expensive. To address this, we implemented a form of persistent set
reduction [5], which allows us to compute a more compact automaton recognizing
P0. This results in a more time- and memory-efficient inclusion check.
Reductions that interact harmoniously with CEGAR generalization do not
always allow for an efficient proof check, nor vice versa. In the ConcurrencySafety
category, where correctness proofs may become complicated, we prioritize gen-
eralization by computing reductions that typically allow for simpler proofs (de-
scribed above), even though proof checking for such reductions is often more
expensive. By contrast, in the NoDataRace category we found proof assertions
to be usually quite simple (often only expressing non-aliasing of pointers), so we
prioritize faster proof checks (and postpone context-switches as far as possible).
Implementation GemCutter uses the libraries and the front-end of the Ulti-
mate framework, and extends Ultimate with a new CEGAR loop implemen-
tation and new algorithms operating on finite automata. We represent programs
P, reductions P0and sets of proven traces Las finite automata. Ultimate
constructs Floyd-Hoare automata (for L) only on-demand [7]. Due to the state
explosion problem, GemCutter extends this approach to the program and the
reduction. The necessary parts of the automata are constructed just-in-time
during traversal by automata algorithms. Various techniques are implemented
as instances of a few generic interfaces (on-demand automata, and visitors that
monitor and guide automaton traversal) for flexibility: Radically different algo-
rithms can be created by configuring, exchanging and stacking interface imple-
mentations. The following techniques and optimizations (all used in SV-COMP)
can be combined with each other independently: (i) sleep set reduction; (ii) per-
sistent set reduction; (iii) discovery and pruning of states that cannot reach ac-
cepting states; (iv) guidance towards representatives of a specific form, e.g. with
context-switches at loop boundaries; and (v) inclusion check between automata.
2 Strengths and Weaknesses
The main advantage over other concurrency approaches in Ultimate (in Au-
tomizer and Taipan) lies in the generalization across interleavings: Automizer
Ultimate GemCutter and the Axes of Generalization 481
and Taipan typically require more complex proofs possibly out-of-reach for Craig
interpolation and similar techniques. GemCutter performs significantly better,
winning 3rd place in the ConcurrencySafety category (behind the bounded model
checkers Deagle [6] and CSeq [10]) and 1st place in the NoDataRace demo cat-
egory. For details, refer to the competition report [1].
Since our proof check decides a stronger condition (P0L), it might miss
some cases in which the proof is actually sufficient, i.e., Pcl(L) holds. This
is because P0and Lmight contain different representatives for the same equiv-
alence class of interleavings. This weakness cannot be resolved completely due
to the undecidability of the inclusion Pcl(L). It can however be attenuated
by considering other choices of representatives (other than preferring context-
switches at loop boundaries) and exploring the effect. This choice is currently
given as an input parameter; an approach that heuristically chooses a reduction
based on the program structure might perform better. Our notion of indepen-
dence between statements is currently ignorant of the specification being verified.
We hope to extend our approach to take this into account. Finally, our approach
(and implementation) can be easily extended with other reduction methods that
correspond to more aggressive generalization along the interleaving axis.
Our approach only verifies programs with a bounded number of threads.
GemCutter runs out of time or memory if it is unable to establish such an up-
per bound, e.g. for many benchmarks in pthread-ext/ or goblint-regression/.
3 Architecture, Setup, Configuration, and Project
GemCutter is part of the program analysis framework Ultimate3, written in
Java and licensed under LGPLv34.GemCutter version 0.2.2-839c364b requires
Java 11 and Python 3.6. Its Linux version, binaries of the required SMT solvers5,
and a Python wrapper script were submitted as a .zip archive. GemCutter
is invoked with
./Ultimate.py --spec <p> --file <f> --architecture <a> --full-output
where <p> is an SV-COMP property file, <f> is an input C file, <a> is the archi-
tecture (32bit or 64bit), and --full-output enables verbose output to stdout.
A violation witness may be written to the file witness.graphml. The benchmark-
ing tool BenchExec [2] supports GemCutter through the tool-info module
ultimategemcutter.py6.GemCutter participates in the ConcurrencySafety and
NoDataRace categories, as declared in its SV-COMP benchmark definition file
ugemcutter.xml7.
Data Availability Our .zip archive is available online8and on Zenodo [9].
3ultimate.informatik.uni-freiburg.de and github.com/ultimate- pa/ultimate
4www.gnu.org/licenses/lgpl-3.0.en.html
5Z3 (github.com/Z3Prover/z3), CVC4 (cvc4.github.io) and Mathsat (mathsat.fbk.eu)
6github.com/sosy-lab/benchexec/blob/main/benchexec/tools/ultimategemcutter.py
7gitlab.com/sosy-lab/sv- comp/bench-defs/- /blob/main/benchmark- defs/ugemcutter.xml
8gitlab.com/sosy-lab/sv- comp/archives-2022/- /blob/main/2022/ugemcutter.zip and git.io/JM69B
482 D. Klumpp et al.
References
1. Beyer, D.: Progress on software verification: SV-COMP 2022. In: Proc. TACAS (2).
Springer (2022)
2. Beyer, D., owe, S., Wendler, P.: Reliable benchmarking: requirements
and solutions. Int. J. Softw. Tools Technol. Transf. 21(1), 1–29 (2019).
https://doi.org/10.1007/s10009-017-0469-y
3. Diekert, V., Rozenberg, G. (eds.): The Book of Traces. World Scientific (1995).
https://doi.org/10.1142/2563
4. Farzan, A., Vandikas, A.: Automated hypersafety verification. In: CAV (1).
Lecture Notes in Computer Science, vol. 11561, pp. 200–218. Springer (2019).
https://doi.org/10.1007/978-3-030-25540-4 11
5. Godefroid, P.: Partial-Order Methods for the Verification of Concurrent Systems -
An Approach to the State-Explosion Problem, Lecture Notes in Computer Science,
vol. 1032. Springer (1996). https://doi.org/10.1007/3-540-60761-7
6. He, F., Sun, Z., Fan, H.: Deagle: An SMT-based verifier for multi-threaded pro-
grams (competition contribution). In: Proc. TACAS (2). Springer (2022)
7. Heizmann, M., Chen, Y., Dietsch, D., Greitschus, M., Nutz, A., Musa, B., Sch¨atzle,
C., Schilling, C., Sch¨ussele, F., Podelski, A.: Ultimate Automizer with an on-
demand construction of Floyd-Hoare automata - (competition contribution). In:
TACAS (2). Lecture Notes in Computer Science, vol. 10206, pp. 394–398 (2017).
https://doi.org/10.1007/978-3-662-54580-5 30
8. Heizmann, M., Hoenicke, J., Podelski, A.: Refinement of trace abstraction. In:
SAS. Lecture Notes in Computer Science, vol. 5673, pp. 69–85. Springer (2009).
https://doi.org/10.1007/978-3-642-03237-0 7
9. Klumpp, D., Dietsch, D., Heizmann, M., Sch¨ussele, F., Ebbinghaus, M., Farzan,
A., Podelski, A.: Ultimate GemCutter SV-COMP 2022 Competition Contribution
(Nov 2021). https://doi.org/10.5281/zenodo.5956945
10. Sales, E., Coto, A., Inverso, O., Tuosto, E.: A prototype for data race detection in
CSeq 3 (competition contribution). In: Proc. TACAS (2). Springer (2022)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
Ultimate GemCutter and the Axes of Generalization 483
Wit4Java: A Violation-Witness Validator for Java
Verifiers (Competition Contribution)
Tong Wu1, Peter Schrammel2, and Lucas C. Cordeiro1()
1University of Manchester, Manchester, United Kingdom
2University of Sussex, Brighton, and Diffblue Ltd, Oxford, United Kingdom
lucas.cordeiro@manchester.ac.uk
Abstract. We describe and evaluate a violation-witness validator for
Java verifiers called Wit4Java. It takes a Java program with a safety
property and the respective violation-witness output by a Java verifier
to generate a new Java program whose execution deterministically vio-
lates the property. We extract the value of the program variables from
the counterexample represented by the violation-witness and feed this
information back into the original program. In addition, we have two im-
plementations for instantiating source programs by injecting counterex-
amples. Experimental results show that Wit4Java can correctly validate
the violation-witnesses produced by JBMC and GDart in a few seconds.
Keywords: Witness Validation ·Software Verification ·Java Bytecode.
1 Overview
Witness validation is the process of checking whether the same results can be
reproduced independently according to the given program, specification, verifi-
cation result, and the generated witness, improving the trust level of the software
verifiers [2].
Here, we describe and evaluate a new violation-witness validator for Java
programs called Wit4Java. We take an approach similar to Rocha et al. [5] and
Beyer et al. [1] for C programs and apply it to Java programs. As a result, we
implement Wit4Java as a Python script that creates a new Java program or a
unit test case using Mockito with the program variable values extracted from the
counterexample. As input, Wit4Java uses the violation-witness in the GraphML
format to extract the value of the non-deterministic variables in Java programs.
Lastly, Wit4Java runs the new created program using the Java Virtual Machine
(JVM) to check the assert statements.
There are some validators for C programs in the literature [6,12]. For exam-
ple, NitWit is an interpretation-based witness validator that can execute each
statement step-by-step without compiling the entire program [12]. The concept
of MetaVal is to generate a new program based on the input and then use any
checker to check for specifications [6]. CPA-witness2test and FShell-witness2test
are execution-based validators for C programs that can process the witness in
c
The Author(s) 2022
D. Fisman and G. Rosu (Eds.): TACAS 2022, LNCS 13244, pp. 484–489, 2022.
https://doi.org/10.1007/978-3-030-99527-0_36
GraphML format and generate a test harness that drives the program to the spec-
ification violation [1]. Rocha et al. focus on the counterexample produced by ES-
BMC [4] while CPA-witness2test and FShell-witness2test can process GraphML
files. However, witness validation for SV-COMP’s Java track [7] is still at an early
stage. GWIT is another validator that uses assumptions to prune the search
space for dynamic symbolic execution, limiting the analysis to paths where a
given assumption holds [10,11].
Fig. 1. Wit4Java Architecture. The grey boxes represent the inputs and outputs, and
the white boxes represent the validation process.
2 Validation Approach
The architecture of Wit4Java is illustrated in Fig. 1. First, Wit4Java takes the
Java program and the witness as input. Then, it uses the Python package Net-
workX to read the graph content of the witness and extracts the counterexample
values of the variables corresponding to the source program from the violation-
witness and saves them. After that, it generates new programs that contain the
witness’s assumptions. Finally, the validation process is performed by the JVM
(using the -ea option) to check whether the execution of the generated program
exhibits the detected assertion failure.
There are two implementations (Wit4Java 1.0 and Wit4Java 2.0) to extract
and use counterexamples. The first version is to save them as tuples (linenum,
counterexample ). Then it reads the source program and replaces the variables
of the program statements with counterexamples if the line number and vari-
able in the program match the tuple, thus generating a new created Java pro-
gram. In comparison, the second version records the data types and values of
the counterexamples and saves them sequentially into two lists. Moreover, only
the assumptions made in the witness for the non-deterministic variables (as de-
termined by Verifier.nondet) are recorded. Then, it builds a unit test case and
employs the Mockito framework to mock the Verifier.nondet calls in the source
program to make them return deterministic counterexample values from the lists.
This makes the execution of the source program follow the path described in the
witness and eventually reach the violated property.
Wit4Java: A Violation-Witness Validator for Java Verifiers 485
Listing 1.1. Analyzed program
in t v 1 = Ve r if ie r . n on d et I nt ( ) ;
in t v 2 = Ve r if ie r . n on d et I nt ( ) ;
as s e r t v1 = = v 2 ;
Listing 1.2. Output of Wit4Java 1.0
in t v 1 = 1;
in t v 2 = 0;
as s e r t v1 = = v 2 ;
We show examples for both implementations in Listings 1.1 to 1.4. Wit4Java
1.0 (the naive version) saves the counterexamples in witness in line number order.
It directly replaces the variable values in the source program, thus generating a
new program (cf. Listing 1.2). Wit4Java 2.0 (We name it the Mockito version)
generates a test case that returns the counterexample value when the mocked
function is called (cf. Listing 1.4).
Listing 1.3. Violation witness
< ed g e so u rc e =" 2 0 3 .1 6 7 " ta r g et = "2 0 7 .1 8 6 " >
< da ta ke y =" o ri gi n fi l e ">
Main.java
</ d at a >
< da t a k e y = " s t ar t l i ne ">
13
</ d at a >
< da ta ke y =" a ss um p ti o n ">
v1 = 1 ;
</ d at a >
</ e dg e >
< ed g e so u rc e =" 2 0 7 .1 8 6 " ta r g et = "2 5 2 .2 0 1 " >
< da ta ke y =" o ri gi n fi l e ">
Main.java
</ d at a >
< da t a k e y = " s t ar t l i ne ">
14
</ d at a >
< da ta ke y =" a ss um p ti o n ">
v2 = 0 ;
</ d at a >
</ e dg e >
Listing 1.4. Output of Wit4Java 2.0
Li s t _ ty p e = [ in t , in t ] ;
Li s t _v a l ue = [ 1 , 0 ];
Mo c ki t o. m o ck S ta t ic ( V e ri fi e r . cl as s );
in t n = L is t _t y pe . l e ng th ;
On g o i n gS t u b b i ng < I nt e ge r >
st u bb i ng _ in t = M oc k it o .
wh e n ( Ve r if i er . n on d et I nt ( )) ;
fo r ( i n t i = 0; i < n ; i ++) {
if (" i nt " . e q ua l s ( L is t _ ty p e [ i ]) ) {
stubbing_int = stubbing_int.
th e nR e tu r n ( In t eg e r .
pa r se I n t ( Li s t_ v a lu e [ i ]) ) ;
}
}
Ma i n . ma in ( n e w S t ri n g [0 ] );
3 Discussion of Strengths and Weaknesses
Fig. 2on the left compares the validation results of the two validation tools
Wit4Java and GWIT. The former is based on version 1.0 (naive version). The
latter is based on violation-witnesses produced by GDart. The results indicate
that Wit4Java has successfully validated 140 out of 302 witnesses, while GWIT
correctly validates 150 results. Version 2.0 handles counterexamples with differ-
ent values for each iteration within a loop better than version 1.0. This is because
version 1.0 skips the counterexamples before the last iteration. However, version
2.0 can fully use the counterexamples generated by each iteration. Fig. 2on the
right compares the validation results of the two versions of Wit4Java, which
shows that version 2.0 (Mockito version) has a better validation ability (168 out
of 302), thereby outperforming both version 1.0 and GWIT. However, the tool
can only handle witnesses with concrete counterexamples. There are two main
reasons why Wit4Java shows the result unknown : JBMC [3,8] produces an empty
witness, or the witness does not contain a counterexample for a non-deterministic
value. Besides, the validation for strings is not supported yet, which occurs in
almost half of witnesses because JBMC does not yet output counterexample val-
ues for strings. Thus we were not able to test it. Generally, there are not enough
486 Tong Wu et al.
witnesses of high quality for testing the witness validator yet because JBMC
sometimes correctly terminates without producing a witness in SV-COMP. The
witness support in the Java verifiers requires further development work so that
they are able to produce complete violation witnesses whenever they terminate
with verdict false.
Fig. 2. Validation results based on 302 witnesses. The x-axis represents the names
of the two tools and the y-axis represents the number of witnesses. A green “false”
indicates a confirmed correct result.
4 Tool Setup and Configuration
The competition submission is based on Wit4Java version 1.0 (naive version).3
For the competition [9], Wit4Java is called by executing the script wit4java.py.
It reads .java source files and corresponding witnesses in the given bench-
mark directories. The answer would be false if the assertion failure was found.
As an example, we can validate the witness by executing the following command:
./wit4java.py -witness <path-to-sv-witnesses>/witness.graphml <path-to-sv-
benchmarks>/java/jbmc-regression/return2
where witness.graphml indicates the witness to be validated, and return2 indicates
the benchmark name. The Benchexec tool info module is called wit4java.py and
the benchmark definition file wit4java-validate-violation-witnesses.xml.NetworkX
should be installed separately in the SV-COMP machines. If a validation task
does not find a property violation, it will return unknown.
5 Software Project and Contributors
Tong Wu maintains Wit4Java. It is publicly available under a BSD-style license.
The source code is available at https://github.com/Anthonysdu/wit4java, and
instructions for running the tool are given in the README file.
3https://github.com/Anthonysdu/wit4java
Wit4Java: A Violation-Witness Validator for Java Verifiers 487
Acknowledgment
The work in this paper is partially funded by the EPSRC grants EP/T026995/1,
EP/V000497/1, EU H2020 ELEGANT 957286, and Soteria project awarded by
the UK Research and Innovation for the Digital Security by Design (DSbD)
Programme.
References
1. Beyer et al. “Tests from Witnesses - Execution-Based Validation of Verifica-
tion Results”. In: Tests and Proofs - 12th International Conference,
TAP@STAF. Vol. 10889. Lecture Notes in Computer Science. 2018, pp. 3–23.
https://doi.org/10.1007/978-3-319-92994-1_1.
2. Beyer et al. “Witness validation and stepwise testification
across software veri- fiers”. In: ESEC/FSE. 2015, pp. 721–733.
https://doi.org/10.1145/2786805.2786867.
3. Cordeiro et al. “JBMC: A Bounded Model Checking Tool for Verifying Java Byte-
code”. In: CAV. Vol. 10981. LNCS. 2018, pp. 183–190. https://doi.org/10.1007/978-
3-319-96145-3_10.
4. Gadelha et al. “ESBMC v6.0: Verifying C Programs Using k-Induction and In-
variant Inference - (Competition Contribution)”. In: TACAS: vol. 11429. LNCS.
2019, pp. 209–213. https://doi.org/10.1007/978-3-030-17502-3_15.
5. Rocha et al. “Understanding Programming Bugs in ANSI-C Software Using
Bounded Model Checking Counter-Examples”. In: IFM. Vol. 7321. LNCS. 2012,
pp. 128–142. https://doi.org/10.1007/978-3-642-30729-4_10.
6. Dirk Beyer and Martin Spiessl. “MetaVal: Witness Validation via Verification”. In:
CAV Part II. Vol. 12225. LNCS. 2020, pp. 165–177. https://doi.org/10.1007/978-
3-030-53291-8_10.
7. Lucas C. Cordeiro, Daniel Kroening, and Peter Schrammel. “Benchmark-
ing of Java Verification Tools at the Software Verification Competition
(SV-COMP)”. In: ACM SIGSOFT Softw. Eng. Notes 43.4 (2018), p. 56.
https://doi.org/10.1145/3282517.3282529.
8. Lucas C. Cordeiro, Daniel Kroening, and Peter Schrammel. “JBMC: Bounded
Model Checking for Java Bytecode - (Competition Contribution)”. In: TACAS.
Vol. 11429. LNCS. 2019, pp. 219–223. https://doi.org/10.1007/978-3-030-17502-
3_17.
9. D. Beyer. “Progress on Software Verification: SV-COMP 2022”. In: Proc. TACAS.
Springer, 2022.
10. Falk Howar and Malte Mues. “GWIT (Competition Contribution)”. In: Proc.
TACAS (2). Springer, 2022.
11. Falk Howar and Malte Mues. Tudo-Aqua/gwit. https://github.com/tudo-aqua/
gwit.
12. Jan Svejda, Philipp Berger, and Joost-Pieter Katoen. “Interpretation-Based Vio-
lation Witness Validation for C: NITWIT”. In: TACAS. Vol. 12078. LNCS. 2020,
pp. 40–57. https://doi.org/10.1007/978-3-030-45190-5_3.
488 Tong Wu et al.
Wit4Java: A Violation-Witness Validator for Java Verifiers 489
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
Author Index
Ádám, Zsófia II-474
Aiken, Alex I-338
Aizawa, Akiko I-87
Albert, Elvira I-201
Alur, Rajeev II-353
Amat, Nicolas I-505
Amendola, Arturo I-125
Asgaonkar, Aditya I-167
Ayaziová, Paulína II-468
Bainczyk, Alexander II-314
Bajczi, Levente II-474
Banerjee, Tamajit II-81
Barbosa, Haniel I-415
Barrett, Clark I-143,I-415
Becchi, Anna I-125
Beyer, Dirk I-561,II-375,II-429
Biere, Armin I-443
Birkmann, Fabian II-159
Blatter, Lionel I-303
Blicha, Martin I-524
Bork, Alexander II-22
Bortolussi, Luca I-281
Bozzano, Marco I-543,II-273
Brain, Martin I-415
Bromberger, Martin I-480
Bruyère, Véronique I-244
Bryant, Randal E. I-443,I-462
Bu, Lei II-408
Casares, Antonio II-99
Cassez, Franck I-167
Castro, Pablo F. I-396
Cavada, Roberto I-125
Chakarov, Aleksandar I-404
Chalupa, Marek II-462,II-468
Cimatti, Alessandro I-125,I-543,II-273
Cohl, Howard S. I-87
Cordeiro, Lucas C. II-484
Coto, Alex II-413
D’Argenio, Pedro R. I-396
Darulova, Eva I-303
de Pol, Jaco van II-295
Deifel, Hans-Peter II-159
Demasi, Ramiro I-396
Dey, Rajen I-87
Dietsch, Daniel II-479
Dill, David I-183
Dobos-Kovács, Mihály II-474
Dragoste, Irina I-480
Duret-Lutz, Alexandre II-99
Dwyer, Matthew B. II-440
Ebbinghaus, Marcel II-479
Fan, Hongyu II-424
Faqeh, Rasha I-480
Farzan, Azadeh II-479
Fedchin, Aleksandr I-404
Fedyukovich, Grigory I-524,II-254
Ferrando, Andrea I-125
Fetzer, Christof I-480
Fijalkow, Nathanaël I-263
Fuller, Joanne I-167
Gallo, Giuseppe Maria I-281
Garhewal, Bharat I-223
Giannakopoulou, Dimitra I-387
Giesl, Jürgen II-403
Gipp, Bela I-87
González, Larry I-480
Goodloe, Alwyn I-387
Gordillo, Pablo I-201
Greiner-Petter, André I-87
Grieskamp, Wolfgang I-183
Griggio, Alberto II-273
Guan, Ji II-3
Guilloud, Simon II-196
Guo, Xiao II-408
Haas, Thomas II-418
Hajdu, Ákos II-474
Hartmanns, Arnd II-41
Havlena, Vojtˇech II-118
He, Fei II-424
Heizmann, Matthias II-479
Hensel, Jera II-403
492 Author Index
Hernández-Cerezo, Alejandro I-201
Heule, Marijn J. H. I-443,I-462
Hovland, Paul D. I-106
Howar, Falk II-435,II-446
Hückelheim, Jan I-106
Huisman, Marieke II-332
Hujsa, Thomas I-505
Hyvärinen, Antti E. J. I-524
Imai, Keigo I-379
Inverso, Omar II-413
Jakobsen, Anna Blume II-295
Jonáš, Martin II-273
Kanav, Sudeep I-561
Karri, Ramesh I-3
Katoen, Joost-Pieter II-22
Katz, Guy I-143
Kettl, Matthias II-451
Klumpp, Dominik II-479
Koenig, Jason R. I-338
Koutavas, Vasileios II-178
Krämer, Jonas I-303
Kremer, Gereon I-415
retínský, Jan I-281
Krötzsch, Markus I-480
Krsti´c, Sr -
dan II-236
Kunˇcak, Viktor II-196
Kupferman, Orna I-25
Kwiatkowska, Marta II-60
Lachnitt, Hanna I-415
Lam, Wing II-217
Lange, Julien I-379
Lauko, Henrich II-457
Laveaux, Maurice II-137
Leeson, Will II-440
Lemberger, Thomas II-451
Lengál, Ondˇrej II-118
Li, Xuandong II-408
Li, Yichao II-408
Lin, Yi I-64
Lin, Yu-Yang II-178
Loo, Boon Thau II-353
Lyu, Lecheng II-408
Majumdar, Rupak II-81
Mallik, Kaushik II-81
Mann, Makai I-415
Marinov, Darko II-217
Marx, Maximilian I-480
Mavridou, Anastasia I-387
Mensendiek, Constantin II-403
Meyer, Klara J. II-99
Meyer, Roland II-418
Mihalkoviˇc, Vincent II-462
Milius, Stefan II-159
Mitra, Sayan I-322
Mohamed, Abdalrhman I-415
Mohamed, Mudathir I-415
Molnár, Vince II-474
Mues, Malte II-435,II-446
Murali, Harish K I-480
Murtovi, Alnis II-314
Namjoshi, Kedar S. I-46
Neider, Daniel I-263
Nenzi, Laura I-281
Neykova, Rumyana I-379
Niemetz, Aina I-415
Norman, Gethin II-60
Nötzli, Andres I-415
Ozdemir, Alex I-415
Padon, Oded I-338
Park, Junkil I-183
Parker, David II-60
Patel, Nisarg I-46
Paulsen, Brandon I-357
Pérez, Guillermo A. I-244
Perez, Ivan I-387
Pilati, Lorenzo I-125
Pilato, Christian I-3
Podelski, Andreas II-479
Ponce-de-León, Hernán II-418
Preiner, Mathias I-415
Pressburger, Tom I-387
Putruele, Luciano I-396
Qadeer, Shaz I-183
Quatmann, Tim II-22
Raha, Ritam I-263
Rakamari´c, Zvonimir I-404
Raszyk, Martin II-236
ˇ
Rechtáˇcková, Anna II-462
Reeves, Joseph E. I-462
Renkin, Florian II-99
Author Index 493
Reynolds, Andrew I-415
Roˇckai, Petr II-457
Rot, Jurriaan I-223
Roy, Rajarshi I-263
Roy, Subhajit I-3
Rubio, Albert I-201
Rungta, Neha I-404
Safari, Mohsen II-332
¸Sakar, Ömer II-332
Sales, Emerson II-413
Santos, Gabriel II-60
Scaglione, Giuseppe I-125
Schmuck, Anne-Kathrin II-81
Schneider, Joshua II-236
Schrammel, Peter II-484
Schubotz, Moritz I-87
Schüssele, Frank II-479
Sharygina, Natasha I-524
Sheng, Ying I-415
Shenwald, Noam I-25
Shi, Lei II-353
Shoham, Sharon I-338
Sickert, Salomon II-99
Siegel, Stephen F. I-106
Šmahlíková, Barbora II-118
Sølvsten, Steffan Christ II-295
Soudjani, Sadegh II-81
Spiessl, Martin II-429
Staquet, Gaëtan I-244
Steffen, Bernhard II-314
Strejˇcek, Jan II-462,II-468
Sun, Dawei I-322
Sun, Zhihang II-424
Tabajara, Lucas M. I-64
Tacchella, Alberto I-125
Takhar, Gourav I-3
Thomasen, Mathias Weller Berg II-295
Tinelli, Cesare I-415
Tonetta, Stefano I-543
Traytel, Dmitriy II-236
Trost, Avi I-87
Tuosto, Emilio II-413
Tzevelekos, Nikos II-178
Ulbrich, Mattias I-303
Vaandrager, Frits I-223
Vardi, Moshe Y. I-64
Vozarova, Viktoria I-543
Wang, Chao I-357
Wang, Hao II-217
Wang, Yuepeng II-353
Weidenbach, Christoph I-480
Wesselink, Wieger II-137
Wijs, Anton II-332
Willemse, Tim A. C. II-137
Wißmann, Thorsten I-223
Wu, Haoze I-143
Wu, Tong II-484
Wu, Wenhao I-106
Xie, Tao II-217
Xie, Zhunyi II-408
Xu, Meng I-183
Yi, Pu II-217
Youssef, Abdou I-87
Yu, Nengkun II-3
Zamboni, Marco I-125
Zaoral, Lukáš II-462
Zelji´c, Aleksandar I-143
Zhao, Jianhua II-408
Zhong, Emma I-183
Zilio, Silvano Dal I-505
Zingg, Sheila II-236
Zlatkin, Ilia II-254
Zohar, Yoni I-415
Chapter
Analyzing the propagation of faults is part of the preliminary safety assessment for complex safety-critical systems. A recent work proposes an smt-based approach to deal with propagation of faults in presence of circular dependencies. The set of all the fault configurations that cause the violation of a property, also referred to as the set of minimal cut sets, is computed by means of repeated calls to the smt solver, hence enumerating all minimal models of an smt formula. Circularity is dealt with by imposing a strict temporal order, using the theory of difference logic.In this paper, we explore the use of Answer-Set Programming to tackle the same problem. We propose two encodings, leveraging the notion of stable model. The first approach deals with cycles in the encoding, while the second relies on asp Modulo Acyclicity (aspma).We experimentally evaluate the three approaches on a comprehensive set of benchmarks. The first asp-based encoding significantly outperforms the smt-based approach; the aspma-based encoding, on the other hand, does not yield the expected performance gains.KeywordsFault propagationSMTASP modulo acyclicityMinimal models
Article
Full-text available
We consider fixpoint algorithms for two-player games on graphs with $\omega$-regular winning conditions, where the environment is constrained by a strong transition fairness assumption. Strong transition fairness is a widely occurring special case of strong fairness, which requires that any execution is strongly fair with respect to a specified set of live edges: whenever the source vertex of a live edge is visited infinitely often along a play, the edge itself is traversed infinitely often along the play as well. We show that, surprisingly, strong transition fairness retains the algorithmic characteristics of the fixpoint algorithms for $\omega$-regular games -- the new algorithms have the same alternation depth as the classical algorithms but invoke a new type of predecessor operator. For Rabin games with $k$ pairs, the complexity of the new algorithm is $O(n^{k+2}k!)$ symbolic steps, which is independent of the number of live edges in the strong transition fairness assumption. Further, we show that GR(1) specifications with strong transition fairness assumptions can be solved with a 3-nested fixpoint algorithm, same as the usual algorithm. In contrast, strong fairness necessarily requires increasing the alternation depth depending on the number of fairness assumptions. We get symbolic algorithms for (generalized) Rabin, parity and GR(1) objectives under strong transition fairness assumptions as well as a direct symbolic algorithm for qualitative winning in stochastic $\omega$-regular games that runs in $O(n^{k+2}k!)$ symbolic steps, improving the state of the art. Finally, we have implemented a BDD-based synthesis engine based on our algorithm. We show on a set of synthetic and real benchmarks that our algorithm is scalable, parallelizable, and outperforms previous algorithms by orders of magnitude.
Chapter
Full-text available
Ultimate GemCutter verifies concurrent programs using the CEGAR paradigm, by generalizing from spurious counterexample traces to larger sets of correct traces. We integrate classical CEGAR generalization with orthogonal generalization across interleavings. Thereby, we are able to prove correctness of programs otherwise out-of-reach for interpolation-based verification. The competition results show significant advantages over other concurrency approaches in the Ultimate family.
Chapter
Full-text available
GDart is an ensemble of tools allowing dynamic symbolic execution of JVM programs. The dynamic symbolic execution engine is decomposed into three different components: a symbolic decision engine (DSE), a concolic executor (SPouT), and a SMT solver backend allowing meta-strategy solving of SMT problems (JConstraints). The symbolic decision component is loosely coupled with the executor by a newly introduced communication protocol. At SV-COMP 2022, GDart solved 471 of 586 tasks finding more correct false results (302) than correct true results (169). It scored fourth place.
Chapter
Full-text available
We sketch a sequentialization-based technique for bounded detection of data races under sequential consistency, and summarise the major improvements to our verification framework over the last years.
Article
Many randomized heuristic derivative-free optimization methods share a framework that iteratively learns a model for promising search areas and samples solutions from the model. This paper studies a particular setting of such framework, where the model is implemented by a classification model discriminating good solutions from bad ones. This setting allows a general theoretical characterization, where critical factors to the optimization are discovered. We also prove that optimization problems with Local Lipschitz continuity can be solved in polynomial time by proper configurations of this framework. Following the critical factors, we propose the randomized coordinate shrinking classification algorithm to learn the model, forming the RACOS algorithm, for optimization in continuous and discrete domains. Experiments on the testing functions as well as on the machine learning tasks including spectral clustering and classification with Ramp loss demonstrate the effectiveness of RACOS.
Article
We give an overview of SAT Competition 2016, the 2016 edition of thefamous competition for Boolean satisfiability (SAT) solvers with over 20 years of history. A key aim is to point out ``what's hot'' in SAT competitions in 2016, i.e., new developments in thecompetition series, including new competition tracks and new solver techniquesimplemented in some of the award-winning solvers.
Article
Over the last years, witness-based validation of verification results has become an established practice in software verification: An independent validator re-establishes verification results of a software verifier using verification witnesses, which are stored in a standardized exchange format. In addition to validation, such exchangable information about proofs and alarms found by a verifier can be shared across verification tools, and users can apply independent third-party tools to visualize and explore witnesses to help them comprehend the causes of bugs or the reasons why a given program is correct. To achieve the goal of making verification results more accessible to engineers, it is necessary to consider witnesses as first-class exchangeable objects, stored independently from the source code and checked independently from the verifier that produced them, respecting the important principle of separation of concerns. We present the conceptual principles of verification witnesses, give a description how to use them, provide a technical specification of the exchange format for witnesses, and perform an extensive experimental study on the application of witness-based result validation, using the validators CPA checker , UA utomizer , CPA- witness 2 test , and FS hell-witness 2 test .
Article
This paper studies the synthesis of controllers for discrete-time, continuous state stochastic systems subject to omega-regular specifications using finite-state abstractions. Omega-regular properties allow specifying complex behaviors and encompass, for example, linear temporal logic. First, we present a synthesis algorithm for minimizing or maximizing the probability that a discrete-time switched stochastic system with a finite number of modes satisfies an omega-regular property. Our approach relies on a finite-state abstraction of the underlying dynamics in the form of a Bounded-parameter Markov Decision Process arising from a finite partition of the system’s domain. Such Markovian abstractions allow for a range of probabilities of transition between states for each selected action representing a mode of the original system. Our method is built upon an analysis of the Cartesian product between the abstraction and a Deterministic Rabin Automaton encoding the specification of interest or its complement. Specifically, we show that synthesis can be decomposed into a qualitative problem, where the so-called greatest permanent winning components of the product automaton are created, and a quantitative problem, which requires maximizing the probability of reaching this component in the worst-case instantiation of the transition intervals. Additionally, we propose a quantitative metric for measuring the quality of the designed controller with respect to the continuous abstracted states and devise a specification-guided domain partition refinement heuristic with the objective of reaching a user-defined optimality target. Next, we present a method for computing control policies for stochastic systems with a continuous set of available inputs. In this case, the system is assumed to be affine in input and disturbance, and we derive a technique for solving the qualitative and quantitative problems in the resulting finite-state abstractions of such systems. For this, we introduce a new type of abstractions called Controlled Interval-valued Markov Chains. Specifically, we show that the greatest permanent winning component of such abstractions are found by appropriately partitioning the continuous input space in order to generate a bounded-parameter Markov decision process that accounts for all possible qualitative transitions between the finite set of states. Then, the problem of maximizing the probability of reaching these components is cast as a (possibly non-convex) optimization problem over the continuous set of available inputs. A metric of quality for the synthesized controller and a partition refinement scheme are described for this framework as well. Finally, we present a detailed case study.