ArticlePDF Available

About the efficient reduction of lambda terms

January 2017

January 2017

Authors:

University of Bologna

There is still a lot of confusion about "optimal" sharing in the lambda calculus, and its actual efficiency. In this article, we shall try to clarify some of these issues.

Forbidden duplication of applications

…

An example of super optimal sharing

…

Figures - uploaded by Andrea Asperti

Content may be subject to copyright.

Content uploaded by Andrea Asperti

Content may be subject to copyright.

About the eﬃcient reduction of lambda terms

Andrea Asperti

DISI: Dipartimento di Informatica - Scienza e Ingegneria

Mura Anteo Zamboni 7

40127, Bologna, Italy

There is still a lot of confusion about “optimal” sharing in the lambda calculus, and its actual

eﬃciency. In this article, we shall try to clarify some of these issues.

Categories and Subject Descriptors: [Theory of computation]: Lambda calculus; [Theory of

computation]: Abstract machines; [Theory of computation]: Equational logic and rewriting;

[Software and its engineering]: Functional languages

1. INTRODUCTION

In relation to rewriting techniques, sharing is the ability to avoid duplication of

reduction work, due to duplication of subterms. The issue is relatively trivial at

ﬁrst order, but it becomes much more entangled as soon as we pass to a higher

order framework, for which the lambda calculus provides a paradigmatic example.

Consider the well known beta rule

λx.M N →M[N/x]

If the argument Ngets duplicated and it contains a reducible expression, its reduc-

tion will be duplicated too.

It may seem that an eager strategy (possibly delayed “on demand”, as in the

“call by need” strategy) could solve the job. Unfortunately, this is not the case.

Let us consider ﬁrst the case of weak frameworks. In this case, functions are

treated as values and reduction is never pursued under a λ-abstraction. So, if the

argument Nis a lambda expression containing a redex R, and Nis duplicated, the

reduction of Rwill be repeated in each instance. A typical situation is when the

argument Nis obtained as a partial instantiation of some functional F. To make

things very simple, let us suppose F=two =λxy.x(x y ) (the Church integer) and

let us instantiate it with the identity I=λx.x

N=two I →λy.I (I y)

that is a weak normal form. If Ngets duplicated, the two internal applications of

the identity will be duplicated too.

This may have very nasty eﬀects. Consider the following weak reduction

two two I →tw o (two I)

→two (λy.I (I y))

→λy.(λy1.I(I y1))(λy2.I(I y2)y)

where we renamed variables for the sake of readability. We have just doubled the

number of internal applications of the identity! If we start with napplications of

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY, Pages 1–0??.

arXiv:1701.04240v1 [cs.LO] 16 Jan 2017

2·

two

two . . . two

| {z }

ntimes

we end up with a term containing 2napplications of the identity and all of them

will need to be reduced when the term will be feed with an extra argument (e.g.

an additional identity).

We warmly invite the readers to write and evaluate the term

n two I I (1)

(where nand two are Church integers) in their favorite (weak) functional program-

ming language, and observe the exponential explosion of the complexity when n

grows (no matter if the language is lazy or strict, or if it adopts combinators or

closures). On the other side, innermost reduction of the previous term is just linear

in n.

So, is rightmost innermost reduction the correct solution? Of course, not. As a

trivial example, consider the term

I(n two)I I (2)

Rightmost innermost reduction would start normalizing (n two) that is the Church

integer for 2nand has exponential dimension, hence the whole reduction would be

exponential too.

What happens in (the innermost reduction of) example (2) is that the term I

inside λy.I (I y) of example (1) is replaced by a local variable, postponing the instan-

tiation with the identity to a later stage. That is to say, that is not the duplication

of redexes that matters, but the unnecessary, blind duplication of applications. For

instance, with environment machine, any time we open a closure and the internal

code contains an application, we are possibly duplicating reduction work.

@ @ @

Fig. 1. Forbidden duplication of applications

But applications and lambda abstractions are just dual operators, so is the du-

plication of lambda abstractions dangerous too, from the point of view of sharing?

In principle, no, it is not. The point is that if the abstraction node is shared,

there are already two (or more) diﬀerent calls to the function, that will give rise

to diﬀerent redexes. The big challenge, however, is to duplicate the abstraction

node without jointly duplicating the whole body of the function (that could contain

applications). The really delicate part is to understand what happens at the level of

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

·3

variables, since they can now be bound by one or the other of the two abstractions,

requiring some form of “unsharing” (see Figure 2). The correct management of

λx

λxλx

x1x

Fig. 2. Legal duplication of λ-abstractions

sharing and unsharing is not trivial. It was solved for the ﬁrst time by Lamping

[Lamping 1990], and later revised and improved by many other people. One usually

refer to this part of the algorithm as “bookkeeping” work, to distinguish it from

duplication work and the actual ﬁring of β-redexes.

Let us also observe that, in the terminology of interaction nets [Lafont 1990], the

diﬀerent behavior between the duplication of applications and lambda abstractions

resides in the fact that in the latter case (Figure 2) duplication is requested at the

principal port of the node, while in the case of the application (Figure 1), it is

requested at an auxiliary port.

2. REDUCTION BY FAMILIES

L´evy developed the theory of optimality long before an implementation for it was

available (in fact, the problem remained open for quite a long time). The precise

deﬁnition of optimal sharing is not simple, and we shall postpone it for a moment.

Two redexes that are sharable according to L´evy are said to belong to a same

family, and optimal reduction is simulated on lambda terms by ﬁring “in parallel”

all redexes in a same family. Family reduction has very nice properties: the most

interesting one is that it satisﬁes a one-step diamond property. As a consequence,

as far as we reduce needed redexes, the length of a normalizing reduction (if it

exists) does not depend on the strategy. This fact supported the conjecture that

family reduction could provide an interesting measure of the “intrinsic complexity”

of lambda terms, i.e. the cost required to compute the normal form of a lambda

term independently from the reduction technique.

Before addressing this issue, let us consider a diﬀerent, simple reduction tech-

nique: parallel β-reduction in Takahashi’s sense [Takahashi 1995], that allows us

to ﬁre in parallel (in a single step) all redexes in a given term. Clearly, this is

a superoptimal reduction technique: all redexes in a L´evy’s family are parallel in

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

4·

Takahashi’s sense, but non all parallel redexes eventually belong to a same family

(that is, not all of them are sharable).

The potential parallelism inherent in λ-terms can be very easily understood by

restricting the attention to the simply typed case (the following argument was

spelled out for the ﬁrst time in the appendix to [Asperti and L´evy 2013]).

Working with simple types, it is traditional to deﬁne a notion of degree of a redex

Rin the following way (see e.g.[Girard et al. 1989]).

Deﬁnition 2.1 degree.The degree ∂(T) of a type Tis deﬁned by:

—∂(A) = 1 if Ais atomic

—∂(U→V) = max{∂(U), ∂(V)}+ 1

The degree of a redex (λx :U.M)Nis ∂(U→V), where Vis the type of M.

The degree ∂(M) of a term Mis the maximum among the degrees of all its redexes.

A crucial property of the simply typed lambda calculus is that a redex Rof

type U→Vmay only create redexes of type Uor of type V, hence with a degree

strictly less than that of R. As a consequence, each simply typed lambda term M

can be reduced to its normal form with a number of parallel reduction steps bound

by its degree ∂(M). On the other side, we can encode complex (arbitrarily large

Kalmar-elementary) computations in λ-terms with low-degrees (see [Mayer 1974;

Statman 1977]). So, this two facts together prove that the amount of a parallelism

in λ-terms is not elementary recursive.

Does this say anything bad about parallelism? No. On the contrary, there is a

huge amount of parallelism in lambda terms (more than one could have expected),

so it seems to be rather a good idea to try to exploit it. Of course, the speed up we

may expect is never larger then the degree of parallelism, and if it is ﬁnite (or even

elementary in the size of the term!) the execution of large elementary computations

(with an exponential height larger than that of the available parallelism) will remain

elementary.

Coming back to optimality, the important result proved in [Asperti and Mairson

2001] was that most of these parallel redexes are actually sharable in L´evy’s sense,

so that, again, you may reduce a simply typed lambda term in a number of family

reductions that is approximately linear in its size (!!). Technically, this implies that

(on a sequential machine) the cost of sharing a single redex cannot be bound by

any elementary function, but this is merely due to the enormous amount of sharing

that is inherent in lambda terms.

Stated in another way, we already concluded that parallel reduction does not look

a bad idea. Then we discovered that most of the parallel redexes can be actually

shared, that looks like an even better idea: why wasting parallelism by duplicating

work if you can share it? However, the amount of sharing can be so - inconceivably

- large that (in worse, pathological cases) cannot be handled in elementary time in

the size of the term. That’s all.

The result in [Asperti and Mairson 2001] tells you nothing about the eﬃciency

of optimal reduction. The surprising result is that in lambda terms, due to higher

order, we have much more sharing (in L´evy’s sense) than expectable. As a conse-

quence:

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

·5

—the computational cost per family may be huge

—the length of family reduction is not a good measure of the intrinsic complexity

of terms

3. EFFICIENCY, IN THEORY

Intuitively, sharing graph reduction `a la Lamping performs the minimum amount

of duplication required by the computation. However, as we already explained, in

addition to this duplication work, there is also an additional “bookkeeping” work

required to enforce the correct matching between sharing and unsharing. This is

usually implemented by means of diﬀerent levels of sharing, and the introduction of

suitable operators acting as brackets in the graph to delimit the scope of duplicators,

dynamically changing their levels. This part of the algorithm is pretty complex, and

its cost is not so clear yet. In particular, as proved in [Asperti and Chroboczek 1997],

if you are not careful in the management of brackets, they can easily accumulate,

resulting in an exponential overhead. For instance, Gonthier’s implementations

[Gonthier et al. 1992a; 1992b] are just wrong, from this respect.

The accumulation problem described in [Asperti and Chroboczek 1997], was not

present in Lamping’s original algorithm [Lamping 1990], neither in the Bologna

Optimal Higher Order Machine (BOHM) [Asperti et al. 1996], or in later imple-

mentations such as Lambdascope [van Oostrom and van de Looij 2010]. It is con-

jectured that bookkeeping only adds a polynomial overhead to the reduction cost,

but there is no proof of this fact.

To avoid to take bookkeeping into consideration, it was natural to look for frame-

works where there is no need for it. A particularly interesting case was provided by

elementary linear Logic [Girard 1998], that is a logic with boxes but no dereliction,

expressive enough to code all elementary functions. The sharing graph reduction

of lambda terms typable in elementary linear logic can be done without the use of

brackets, and hence without bookkeeping.

Rephrasing [Asperti and Mairson 2001] in this context, [Asperti et al. 2004]

showed that the non elementary cost of optimal reduction is not due to bookkeeping

(which one may suspect to add superﬂuous work), but to the (apparently unavoid-

able) duplication work. If you accept the fact that optimal reduction performs the

minimal amount of duplication, you will have at least the same operations, and

hence the same computational cost in any reduction technique.

The eﬃcient nature of optimal reduction in absence of bookkeeping was con-

ﬁrmed by [Baillot et al. 2011], who considered a class of λ-terms of known bounded

complexity (polynomial and elementary time) and investigated the cost of their

normalization via sharing graphs: the cost stays in the expected complexity class.

More recently, still working in a “bookkeeping free” framework, and making a

direct syntactical comparison with a standard graph rewriting machine, [Guerrini

et al. 2012] showed that sharing graphs can only improve performances.

In conclusion, while there are several examples of classes of lambda terms where

optimal reduction outperforms standard techniques, there is so far no known coun-

terexample to its computational eﬃciency.

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

6·

4. EFFICIENCY, IN PRACTICE

So, if optimal reduction is so good, and apart from the benighted ostracism of

traditional schools, why functional programming languages are not yet implemented

in this way?

First of all, we should make a distinction according to the intended use of the

normalization algorithm. There are essentially two diﬀerent settings where normal-

ization of λ-terms plays a role: the ﬁrst one is in higher order logical frameworks

based on Martin-L¨of type theory (e.g. for type-checking of dependent types, or when

deploying reﬂection); the second setting is as core of real functional languages. We

shall discuss them separately.

4.1 Higher order logical frameworks

The most important use of reduction in this context is to check convertibility of

λ-terms: since the calculus is conﬂuent and normalizing, two terms are convertible

if and only if their normal forms are equal. However, this is just an extrema ratio:

there is no evidence at all that the best way to check convertibility is via normal-

ization, and in fact, up to our knowledge, no logical framework implements it in

such a brute force way. In the vast majority of cases, two terms are convertible

just because are equal (even if not normal), and it would be a major waste of time

to normalize them. Even if they are not equal, they could just be few reduction

steps afar (e.g. one could be obtained by the other by folding/unfolding a few

deﬁnitions). In this case, the use of suitable convertibility heuristics, or a tighter

control of constant unfolding could be substantially more beneﬁcial than improving

the eﬃciency of reduction.

In the case of optimality, the use of normalization for comparing terms poses a

few additional problems, since there is the need to inspect the normal form1. This

can be done in two ways: either by traveling in the resulting graph, computing paths

in it, or via a readback procedure that reconstructs the λ-term out of the graph.

At present, no precise bound at the complexity of these operations is known, but

they do not look too complex. The delicate point is that, in this case, it does not

make sense to compute complexity in terms of the size of the input, since a small

sharing graph may result in a huge lambda term [Lawall and Mairson 1996]. It is

conjectured that, starting from a sharing graph in normal form, the complexity of

the readback procedure is just linear in the size of the resulting term (that, for the

sake of comparing term, is the best we may expect), but there is no proof of this

fact.

Reduction is also a key ingredient of the reﬂection technique [Boutin 1997; Baren-

dregt and Barendsen 2002], whose basic idea is to check a property by running a

suitable certiﬁed decision procedure. For instance, in order to compare two regular

expressions, we can build the corresponding automata and execute a bisimulation

algorithm over them. In this case, having an eﬃcient way of evaluating lambda

expressions may be important; however, for the most typical uses of reﬂection,

and especially for small scale reﬂection [Gonthier and Mahboubi 2010], optimal

reduction looks a bit overkilling.

1Note that no functional programming language gives you the ability to inspect higher order

values, e.g. you cannot read back a closure: this is just an issue for convertibility.

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

·7

There is a ﬁnal point that, at present, may advise against the adoption of op-

timal reduction in logical frameworks. Reduction is one of the most primitive

operations in higher order logical frameworks, and a basic component of the type-

checking/veriﬁcation algorithm. So, it is part of the so called kernel of these sys-

tems: a component whose correctness must be trusted. To this aim, it has been

argued that kernels should be small (in terms of lines of code), in order to improve

conﬁdence in their implementation2. While it is possible to implement abstract re-

duction machines for lambda terms in a few lines, sharing graphs eventually require

a bit more code, and maybe it is not such a good idea to try to put this machinery

in the kernel.

4.2 Functional programming

The ﬁrst issue to face, when considering optimal reduction for the implementation

of a real functional programming language, is to understand if the technique can be

generalized to a larger and more ﬂexible calculus (coding everything as pure lambda

terms is, of course, not a feasible solution). Since sharing graphs can be expressed

in terms of interaction nets, the natural idea is to generalize the logical operators

from the application-lambda abstraction pair, to a generic setting of (higher order)

interaction operators. This naturally leads to interaction system [Asperti and Lan-

eve 1994], that are the elegant synthesis between interaction nets and Klop’s higher

order combinatory reduction systems [Klop 1980]. Interaction nets are expressive

enough to cover all inductive data structures, primitive ﬁx-points and recursion,

and also eﬀective numerical computations where each integer is treated as a diﬀer-

ent constructor processed via primitive arithmetical operations. Interaction system

can be implemented by means of sharing graphs with no additional burden with

respect to lambda-calculus [Asperti and Laneve 1996], demonstrating that sharing

graphs just provide the abstract machinery for dealing with (optimal) sharing in a

higher-order setting, independently from the rewriting rules.

ﬁrst order higher order

direct acyclic graphs (dags) sharing graphs

Fig. 3. Sharing machinery

The Bologna Optimal Higher-order Machines (BOHM) [Asperti et al. 1996] pro-

vided a prototype implementation of the above ideas. BOHM was written in C,

and aimed to eﬃciency, in order to compare with real implementations. Several

benchmarks are given in [Asperti and Guerrini 1998]. On pure lambda terms (see

pag.296-230) BOHM outperformed both Caml Light and Haskell, while remaining

competitive on typical symbolic computations. On more numerical computations

Caml Light was sensibly faster (up to one order of magnitude), that was not sur-

prising due to the underlying overhead of graph rewriting.

The main problem we faced when implementing sharing graphs was not related

to performance but to memory consumption. This may look surprising since the

2This conception is possibly a bit outdated. Instead of having a small kernel, it would be better

to have a veriﬁed kernel, of course, no matter what its size could be.

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

8·

point of optimality is precisely to be as parsimonious as possible in the duplication

of data structures. However, the two things have very little in common. In general,

there is a well known tension between time and space: you may improve time by

sacriﬁcing space, and conversely you may save space by spending more time. For in-

stance, Savitch algorithm for graph reachability (implying PSPACE = NPSPACE)

works in space O(log2(n)) where nis the number of nodes of the graph, but its

time complexity is O(nnlog(n)); this is to be compared with the best algorithms in

time, that have time complexity O(n2) (linear in the size of the graph) but require

O(n log(n)) space. In many interesting cases, a data type can be more compactly

encoded in terms of a procedure producing it3: a zipped ﬁle saves space at the

cost of unzipping the information when required. As another example, all program

transformations meant to improve performance such as inlining, unfolding or loop

unrolling typically augment the dimension of the code.

To make an example relative to sharing graphs, consider a ﬁxpoint deﬁnition

F= Θ M→M(Θ M)

where Θ is some ﬁxpoint operator. An invocation of Fwill result in a lazy unfolding

and partial evaluation of its body, as required by the computation. To avoid to

repeat work, this unfolded form must be saved as a new, optimized version of F:

F=M(M . . . (M(ΘM)))

For instance, after invoking a recursive deﬁnition of a factorial function on the

number 20, the new deﬁnition of the factorial will look like a sort of case switch for

the ﬁrst 20 integers, followed by a recursive call to deal with the remaining cases.

This may look as a desirable eﬀect (a sort of naive form of memoization), but in

many situations things are not so clear, possibly leading to a large consumption of

memory space. Of course, you may renounce to share global deﬁnitions with their

invocation instances, making local copies instead, but this clearly goes against the

very idea of optimality.

Twenty years ago, this looked like a serious problem; since then, memory has

become much cheaper and maybe, in the Big Data era we are entering, this is not

a real issue any more.

5. SUPER OPTIMAL STRATEGIES

To address the possibility to have super optimal reduction techniques for lambda

terms we need to better understand the deﬁnition of optimal sharing according

to L´evy. Let us start with an example. Consider the development for the term

M= ∆(F I) described in Figure 4, where ∆ = λx.x x,F=λz.z y and I=λx.x.

Firing R,S1and S3we obtain the term P= (I y)(I y); the two redexes T3and

T4inside Plooks sharable, although they have no ancestor in common: T3is a

residual of T1, that in turn was created by S1, while T4has just been created by

S3. In order to relate T3and T4, we need to consider a diﬀerent reduction for M,

in this case the innermost reduction of Sleading to ∆(I y) and observe that both

T3and T4are residual (w.r.t. to R1) of the same redex T.

In general (see Figure 5), we say that a redex Swith history σis a copy of a redex

3This is the case for all non random numbers according to Kolmogorov complexity.

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

·9

∆(I y)

S1S2

T1S3S4

S5T3T4S6

T5T6

(F I) (F I)

(I y) (F I) (F I) (I y)

y (F I) (I y) (I y) (F I) y

y (I y) (I y) y

(F I)∆

y∆

y y

Fig. 4. ∆ = λx.x x,F=λz.z y and I=λx.x

ΝP

N NN

1 k

R S

ρ σ

FAMILY

COPY

Fig. 5. ∆ = λx.x x,F=λz.z y and I=λx.x

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

10 ·

Rwith history ρ, written ρR ≤σS, if and only if there is a derivation τsuch that

ρτ is permutation equivalent to σ(ρτ ≡σ) and Sis a residual of Rwith respect

to τ(S∈R/τ).

The symmetric and transitive closure of the copy relation is called the family

relation, and will be denoted with '.

Two redexes are sharable according to L´evy if and only if they belong to a same

family in the above sense.

It is important to observe that the family relation is not just deﬁned over redexes,

but it is relativized with respect to a reduction (the redex history) from some initial

expression; as a consequence we will only be able to relate redexes originated from

a same term M, and the choice of initial term is relevant to determine sharing.

For instance, in the case of the example in Figure 4, if instead of start reducing

from ∆(F I) we start from (F I)(F I ) then, according to L´evy, we loose the pos-

sibility to share T3and T4inside P. Levy’s notion aims to preserve the sharing

“inherent” in the initial λ-term, and not to recognize common subexpressions gen-

erated along the reduction (see [Grabmayer and Rochel 2014] for an investigation

of incremental sharing) . Two redexes can be shared when they have been cre-

ated in essentially the same way, and not when they happen to look similar due to

“syntactical coincidences”.

The critical situation is described in Figure 6.

λx.M λx.M

@ @

vs.

Fig. 6. An example of super optimal sharing

This kind of conﬁgurations may be addressed, at some extent, by memoization

techniques: if we cash the result of the ﬁrst redex, and we meet the “same” conﬁgu-

ration again, then we can reuse the previous result for the second computation. The

delicate point is to understand what we mean by “same”: intensional equality may

be too restrictive, and at the same time it may clutter the memoization table with

too many terms; on the other side, as explained in Section 4.1 there is no obvious

strategy to address convertibility: in particular, the obvious approach consisting in

normalizing arguments may be in conﬂict with other optimality constraints (with-

out considering the possibility of divergence).

So, while memoization is deﬁnitely not a panacea, it is true that in some situation

can be more eﬃcient than optimal sharing `a la L´evy.

A context where memoization turns out to be particularly eﬀective is on ﬁnite

structures [Asperti 2015]. The advantage of working in a ﬁnite setting is that instead

of performing memoization “on demand”, we can work in parallel on all possible

inputs, unfolding a function into a ﬁnite vector of cases (that is, essentially, its

graph). Moreover, in this setting, types are strictly related to the dimension of

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

·11

data: this provides guidelines for the use of memoization, preventing to build huge

hashing tables. The resulting calculus oﬀers an eﬃcient framework for the evalua-

tion of ﬁnite terms in conjuction with a reasonably simple meta-theory, permitting

a detailed and formal investigation of the complexity of reduction.

6. DO WE NEED HIGHER ORDER?

The real question, however, is if we really need higher-order. As a matter of fact,

functional programming makes a very modest use of it. Passing functions is used as

a way to improve the parametricity of programs, and not as a computational device.

Higher order order structures are hardly ever used as a datatype, and dynamically

synthesizing functions is much less frequent than expected. The fact that functional

languages survive without the need of optimal reduction techniques is merely due

to this fact.

The danger inherent in higher order programming is well testiﬁed by a long series

of studies relating complexity classes to hierarchies of terms with increasing type

rank (see e.g. [Gurevich 1983; Goerdt 1992; Goerdt and Seidl 1990; Hillebrand and

Kanellakis 1996; Asperti 2015]). For instance, even working in a restricted ﬁnite

setting, terms of system Tof rank 2 are already polynomially complete, and their

complexity can become rapidly unfeasible at higher ranks.

Even the recent result in [Accattoli and Dal Lago 2016] can be understood in this

sense. In order to simulate a (bounded) Turing machine you just need to encode

the transition function between conﬁgurations, that is a linear function, and have

the possibility to iterate it. On these trivial lambda terms even a silly strategy like

leftmost outermost reduction turns out to be eﬀective. Of course, this tells you

nothing about the best way to evaluate lambda terms. If you really want to learn a

lesson from this result is that, in order to encode Turing machines, you do not really

need the full expressive power of lambda terms, and in particular you do not need

higher-order (but to build suﬃciently large “clocks”). This is not surprising: in

fact, to eﬃciently compute a Turing machine, you just need . . . a Turing machine.

ACKNOWLEDGMENTS

This short note was mostly motivated by a recent Haskell discussion thread de-

bating why isn’t anyone talking about optimal lambda calculus implementations?4.

Unfortunately, the thread was already archived when I noticed it and did not have

the opportunity to post my contribution.

REFERENCES

Beniamino Accattoli and Ugo Dal Lago. 2016. (Leftmost-Outermost) Beta Reduction is Invariant,

Indeed. Logical Methods in Computer Science 12, 1 (2016). DOI:http://dx.doi.org/10.2168/

LMCS-12(1:4)2016

Andrea Asperti. 2015. Computational Complexity Via Finite Types. ACM Trans. Comput. Log.

16, 3 (2015), 26. DOI:http://dx.doi.org/10.1145/2764906

Andrea Asperti and Juliusz Chroboczek. 1997. Safe Operators: Brackets Closed Forever Op-

timizing Optimal lambda-Calculus Implementations - Optimizing Optimal lambda-Calculus

4https://www.reddit.com/r/haskell/comments/2zqtfk/why_isnt_anyone_talking_about_

optimal_lambda/

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

12 ·

Implementations. Appl. Algebra Eng. Commun. Comput. 8, 6 (1997), 437–468. DOI:http:

//dx.doi.org/10.1007/s002000050083

Andrea Asperti, Paolo Coppola, and Simone Martini. 2004. (Optimal) duplication is not ele-

mentary recursive. Inf. Comput. 193, 1 (2004), 21–56. DOI:http://dx.doi.org/10.1016/j.ic.

2004.05.001

Andrea Asperti, Cecilia Giovanetti, and Andrea Naletto. 1996. The Bologna Optimal Higher-

Order Machine. J. Funct. Program. 6, 6 (1996), 763–810. DOI:http://dx.doi.org/10.1017/

S0956796800001994

Andrea Asperti and Stefano Guerrini. 1998. The Optimal Implementation of Functional Pro-

gramming Languages. Cambridge Tracts in Theoretical Computer Science, Vol. 45. Cambridge

University Press.

Andrea Asperti and Cosimo Laneve. 1994. Interaction Systems I: The Theory of Optimal

Reductions. Mathematical Structures in Computer Science 4, 4 (1994), 457–504. DOI:

http://dx.doi.org/10.1017/S0960129500000566

Andrea Asperti and Cosimo Laneve. 1996. Interaction Systems II: The Practice of Optimal

Reductions. Theor. Comput. Sci. 159, 2 (1996), 191–244. DOI:http://dx.doi.org/10.1016/

0304-3975(95)00062- 3

Andrea Asperti and Jean-Jacques L´evy. 2013. The Cost of Usage in the Lambda-Calculus. In 28th

Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA,

USA, June 25-28, 2013. IEEE Computer Society, 293–300. DOI:http://dx.doi.org/10.1109/

LICS.2013.35

Andrea Asperti and Harry G. Mairson. 2001. Parallel Beta Reduction Is Not Elementary Recur-

sive. Inf. Comput. 170, 1 (2001), 49–80. DOI:http://dx.doi.org/10.1006/inco.2001.2869

Patrick Baillot, Paolo Coppola, and Ugo Dal Lago. 2011. Light logics and optimal reduction:

Completeness and complexity. Inf. Comput. 209, 2 (2011), 118–142. DOI:http://dx.doi.org/

10.1016/j.ic.2010.10.002

Henk Barendregt and Erik Barendsen. 2002. Autarkic Computations in Formal Proofs. J. Autom.

Reasoning 28, 3 (2002), 321–336. DOI:http://dx.doi.org/10.1023/A:1015761529444

Samuel Boutin. 1997. Using Reﬂection to Build Eﬃcient and Certiﬁed Decision Procedures.

In Theoretical Aspect of Computer Software TACS’97, Lecture Notes in Computer Science,

Martin Abadi and Takahashi Ito editors (Eds.), Vol. 1281. Springer-Verlag, 515–529. DOI:

http://dx.doi.org/10.1007/BFb0014565

Jean-Yves Girard. 1998. Light Linear Logic. Inf. Comput. 143, 2 (1998), 175–204. DOI:http:

//dx.doi.org/10.1006/inco.1998.2700

Jean-Yves Girard, Yves Lafont, and Paul Taylor. 1989. Proofs and Types. Cambridge Tracts in

Theoretical Computer Science, Vol. 7. Cambridge University Press.

Andreas Goerdt. 1992. Characterizing Complexity Classes by Higher Type Primitive Recursive

Deﬁnitions. Theor. Comput. Sci. 100, 1 (1992), 45–66. DOI:http://dx.doi.org/10.1016/

0304-3975(92)90363- K

Andreas Goerdt and Helmut Seidl. 1990. Characterizing Complexity Classes by Higher Type Prim-

itive Recursive Deﬁnitions, Part II. In Aspects and Prospects of Theoretical Computer Science,

6th International Meeting of Young Computer Scientists (IMYCS), Smolenice, Czechoslovakia,

November 19-23, 1990, Proceedings (Lecture Notes in Computer Science), Vol. 464. Springer,

148–158. DOI:http://dx.doi.org/10.1007/3-540- 53414-8_37

Georges Gonthier, Mart´ın Abadi, and Jean-Jacques L´evy. 1992a. The Geometry of Optimal

Lambda Reduction. In Conference Record of the Nineteenth Annual ACM SIGPLAN-SIGACT

Symposium on Principles of Programming Languages, Albuquerque, New Mexico, USA, Jan-

uary 19-22, 1992. ACM Press, 15–26. DOI:http://dx.doi.org/10.1145/143165.143172

Georges Gonthier, Mart´ın Abadi, and Jean-Jacques L´evy. 1992b. Linear Logic Without Boxes.

In Proceedings of the Seventh Annual Symposium on Logic in Computer Science (LICS ’92),

Santa Cruz, California, USA, June 22-25, 1992. IEEE Computer Society, 223–234. DOI:

http://dx.doi.org/10.1109/LICS.1992.185535

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

·13

Georges Gonthier and Assia Mahboubi. 2010. An introduction to small scale reﬂection in Coq.

Journal of Formalized Reasoning 3, 2 (2010), 95–152. DOI:http://dx.doi.org/10.6092/issn.

1972-5787/1979

Clemens Grabmayer and Jan Rochel. 2014. Maximal sharing in the Lambda calculus with letrec. In

Proceedings of the 19th ACM SIGPLAN international conference on Functional programming,

Gothenburg, Sweden, September 1-3, 2014. ACM, 67–80. DOI:http://dx.doi.org/10.1145/

2628136.2628148

Stefano Guerrini, Thomas Leventis, , and Marco Solieri. 2012. Deep into optimality complexity

and correctness of sharing implementation of bounded logics. proceedingss of DICE 2012,

Tallin, Estonia, 2012. (2012).

Yuri Gurevich. 1983. Algebras of Feasible Functions. In 24th Annual Symposium on Foundations

of Computer Science (FOCS), Tucson, Arizona, USA. IEEE Computer Society, 210–214. DOI:

http://dx.doi.org/10.1109/SFCS.1983.5

Gerd G. Hillebrand and Paris C. Kanellakis. 1996. On the Expressive Power of Simply Typed

and Let-Polymorphic Lambda Calculi. In Proceedings, 11th Annual IEEE Symposium on Logic

in Computer Science, New Brunswick, New Jersey, USA, July 27-30, 1996. IEEE Computer

Society, 253–263. DOI:http://dx.doi.org/10.1109/LICS.1996.561337

Jan W. Klop. 1980. Combinatory Reduction Systems. Ph.D. Dissertation. CWI, Amsterdam.

Yves Lafont. 1990. Interaction Nets. In Conference Record of the Seventeenth Annual ACM Sym-

posium on Principles of Programming Languages, San Francisco, California, USA, January

1990. ACM Press, 95–108. DOI:http://dx.doi.org/10.1145/96709.96718

John Lamping. 1990. An Algorithm for Optimal Lambda Calculus Reduction. In Conference

Record of the Seventeenth Annual ACM Symposium on Principles of Programming Languages,

San Francisco, California, USA, January 1990. ACM Press, 16–30. DOI:http://dx.doi.org/

10.1145/96709.96711

Julia L. Lawall and Harry G. Mairson. 1996. Optimality and Ineﬃciency: What Isn’t a Cost

Model of the Lambda Calculus?. In Proceedings of the 1996 ACM SIGPLAN International

Conference on Functional Programming (ICFP ’96), Philadelphia, Pennsylvania, May 24-26,

1996. ACM, 92–101. DOI:http://dx.doi.org/10.1145/232627.232639

Albert R. Mayer. 1974. The Inherent Computational Complexity of Theories of Ordered Sets. In

Proceedings of the International Congress of Mathematicians, Vancouver. 477–482.

Richard Statman. 1977. The Typed lambda-Calculus Is not Elementary Recursive. In 18th An-

nual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA. IEEE

Computer Society, 90–94. DOI:http://dx.doi.org/10.1109/SFCS.1977.34

Masako Takahashi. 1995. Parallel Reductions in λ-Calculus. Information and Computation 118,

1 (1995), 120–127. DOI:http://dx.doi.org/10.1006/inco.1995.1057

Vincent van Oostrom and Kees-Jan van de Looij. 2010. Lambdascope. Another optimal imple-

mentation of the lambda-calculus. (2010). http://www.phil.uu.nl/~oostrom/publication/

pdf/lambdascope.pdf

ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.

Is the optimal implementation inefficient? Elementarily not.

Conference Paper

Full-text available

Jan 2017

Sharing graphs are a local and asynchronous implementation of lambda-calculus beta-reduction (or linear logic proof-net cut-elimination) that avoids useless duplications. Empirical benchmarks suggest that they are one of the most efficient machineries, when one wants to fully exploit the higher-order features of lambda-calculus. However, we still lack confirming grounds with theoretical solidity to dispel uncertainties about the adoption of sharing graphs. Aiming at analysing in detail the worst-case overhead cost of sharing operators, we restrict to the case of elementary and light linear logic, two subsystems with bounded computational complexity of multiplicative exponential linear logic. In these two cases, the bookkeeping component is unnecessary, and sharing graphs are simplified to the so-called "abstract algorithm". By a modular cost comparison over a syntactical simulation, we prove that the overhead of shared reductions is quadratically bounded to cost of the naive implementation, i.e. proof-net reduction. This result generalises and strengthens a previous complexity result, and implies that the price of sharing is negligible, if compared to the obtainable benefits on reductions requiring a large amount of duplication.

(Leftmost-Outermost) Beta Reduction is Invariant, Indeed

Article

Full-text available

Jan 2016
LOG METH COMPUT SCI

Slot and van Emde Boas' weak invariance thesis states that reasonable machines can simulate each other within a polynomially overhead in time. Is lambda-calculus a reasonable machine? Is there a way to measure the computational complexity of a lambda-term? This paper presents the first complete positive answer to this long-standing problem. Moreover, our answer is completely machine-independent and based over a standard notion in the theory of lambda-calculus: the length of a leftmost-outermost derivation to normal form is an invariant cost model. Such a theorem cannot be proved by directly relating lambda-calculus with Turing machines or random access machines, because of the size explosion problem: there are terms that in a linear number of steps produce an exponentially long output. The first step towards the solution is to shift to a notion of evaluation for which the length and the size of the output are linearly related. This is done by adopting the linear substitution calculus (LSC), a calculus of explicit substitutions modeled after linear logic proof nets and admitting a decomposition of leftmost-outermost derivations with the desired property. Thus, the LSC is invariant with respect to, say, random access machines. The second step is to show that LSC is invariant with respect to the lambda-calculus. The size explosion problem seems to imply that this is not possible: having the same notions of normal form, evaluation in the LSC is exponentially longer than in the lambda-calculus. We solve such an impasse by introducing a new form of shared normal form and shared reduction, deemed useful. Useful evaluation avoids those steps that only unshare the output without contributing to beta-redexes, i.e. the steps that cause the blow-up in size. The main technical contribution of the paper is indeed the definition of useful reductions and the thorough analysis of their properties.

Deep into optimality - Complexity and correctness of sharing implementation of bounded logics (Extended abstract)

Working Paper

Full-text available

Feb 2012

The Cost of Usage in the ?-Calculus

Conference Paper

Full-text available

Jun 2013

A new “inductive” approach to standardization for the λ-calculus has been recently introduced by Xi, allowing him to establish a double-exponential upper bound |M|2|σ| for the length of the standard reduction relative to an arbitrary reduction σ originated in M. In this paper we refine Xi's analysis, obtaining much better bounds, especially for computations producing small normal forms. For instance, for terms reducing to a boolean, we are able to prove that the length of the standard reduction is at most a mere factorial of the length of the shortest reduction sequence. The methodological innovation of our approach is that instead of counting the cost for producing something, as is customary, we count the cost of consuming things. The key observation is that the part of a λ-term that is needed to produce the normal form (or an arbitrary rigid prefix) may rapidly augment along a computation, but can only decrease very slowly (actually, linearly).

The optimal implementation of functional programming languages

Book

Full-text available

Jan 1998

Computational Complexity Via Finite Types

Article

Jun 2015

Andrea Asperti

We address computational complexity writing polymorphic functions between finite types (i.e., types with a finite number of canonical elements), expressing costs in terms of the cardinality of these types. This allows us to rediscover, in a more syntactical setting, the known result that the different levels in the hierarchy of higher-order primitive recursive functions (Gödel system T), when interpreted over finite structures, precisely capture basic complexity classes: functions of rank 1 characterize LOGSPACE, rank 2 PTIME, rank 3 PSPACE, rank 4 EXPTIME=DTIME(2poly), and so on.

Parallel Reductions in λ-Calculus

Article

Apr 1995

Takahashi Masako

The notion of parallel reduction is extracted from the simple proof of the Church-Rosser theorem by Tait and Martin-Löf. Intuitively, this means to reduce a number of redexes (existing in a λ-term) simultaneously. Thus in the case of β-reduction the effect of a parallel reduction is same as that of a "complete development" which is defined by using "residuals" of β-redexes. A nice feature of parallel reduction, however, is that it can be defined directly by induction on the structure of λ-terms (without referring to residuals or other auxiliary notions), and the inductive definition provides us exactly what we need in proving the theorem inductively. Moreover, the notion can be easily extended to other reduction systems such as Girard′s second-order system F and Gödel′s system T. In this paper, after reevaluating the significance of the notion of parallel reduction in Tait-and-Martin-Löf type proofs of the Church-Rosser theorems, we show that the notion of parallel reduction is also useful in giving short and direct proofs of some other fundamental theorems in reduction theory of λ-calculus; among others, we give such simple proofs of the standardization theorem for β-reduction (a special case of which is known as the leftmost reduction theorem for β-reduction), the quasi-leftmost reduction theorem for β-reduction, the postponement theorem of η-reduction (in βη-reduction), and the leftmost reduction theorem for βη-reduction.

Maximal Sharing in the Lambda Calculus with letrec

Article

Jan 2014

Increasing sharing in programs is desirable to compactify the code, and to avoid duplication of reduction work at run-time, thereby speeding up execution. We show how a maximal degree of sharing can be obtained for programs expressed as terms in the lambda calculus with letrec. We introduce a notion of 'maximal compactness' for λletrec-terms among all terms with the same infinite unfolding. Instead of defined purely syntactically, this notion is based on a graph semantics. λletrec-terms are interpreted as first-order term graphs so that unfolding equivalence between terms is preserved and reflected through bisimilarity of the term graph interpretations. Compactness of the term graphs can then be compared via functional bisimulation. We describe practical and efficient methods for the following two problems: transforming a λletrec-term into a maximally compact form; and deciding whether two λletrec-terms are unfolding-equivalent. The transformation of a λletrec-terms L into maximally compact form L0 proceeds in three steps: (i) translate L into its term graph G = [[L]] ; (ii) compute the maximally shared form of G as its bisimulation collapse G0 ; (iii) read back a λletrec-term L0 from the term graph G0 with the property [[L0]] = G0. Then L0 represents a maximally shared term graph, and it has the same unfolding as L. The procedure for deciding whether two given λletrec-terms L1 and L2 are unfolding-equivalent computes their term graph interpretations [[L1]] and [[L2]], and checks whether these are bisimilar. For illustration, we also provide a readily usable implementation.

Lambdascope Another optimal implementation of the lambda-calculus

Article

An optimal implementation of the��-calculus into inter- action nets, featuring 1. only a single type of scope node, 2. a completely reduction based read-back, and 3. only three reduction rule schemes. !� �2�(S0)((S0)0) !� ��(�(SS0)((SS0)0))((�(SS0)((SS0)0))0) !� �yz(�x(SS0)((SS0)0))((S0)((S0)0)) !� ��(S0)((S0)((S0)((S0)0)))

Safe Operators: Brackets Closed Forever Optimizing Optimal λ-Calculus Implementations

Article

Dec 1997

Considerations from category theory described in [1] have permitted to add new rewriting rules for optimal reductions of the λ-calculus [19, 14]. These rules produce an impressive improvement in the performance of the reduction system, and provide a first step towards the solution of the well known and crucial problem of accumulation of control operators. In this paper, after an introduction to optimal reductions, we exhibit the aforementioned problem and prove the correctness of the new rules.

Using Reflection to Build Efficient and Certified Decision Procedures

Chapter

Apr 2006

Samuel Boutin

In this paper we explain how computational reflection can help build efficient certified decision procedure in reduction systems. We have developed a decision procedure on abelian rings in the Coq system but the approach we describe applies to all reduction systems that allow the definition of concrete types (or datatypes). We show that computational reflection is more efficient than an LCF-like approach to implement decision procedures in a reduction system. We discuss the concept of total reflection, which we have investigated in Coq using two facts: the extraction process available in Coq and the fact that the implementation language of the Coq system can be considered as a sublanguage of Coq. Total reflection is not yet implemented in Coq but we can test its performance as the extraction process is effective. Both reflection and total reflection are conservative extensions of the reduction system in which they are used. We also discuss performance and related approaches. In the paper,we assume basic knowledges of ML and proof-checkers.

About the efficient reduction of lambda terms

Abstract and Figures

Recommended publications

Semantic evaluation, intersection types and complexity of simply typed lambda calculus

Labeling techniques and typed fixed-point operators

Investigations on the Dual Calculus

The Origins of lambda-calculus and term rewriting systems