ArticlePDF Available

About the efficient reduction of lambda terms

Authors:

Abstract and Figures

There is still a lot of confusion about "optimal" sharing in the lambda calculus, and its actual efficiency. In this article, we shall try to clarify some of these issues.
Content may be subject to copyright.
About the efficient reduction of lambda terms
Andrea Asperti
DISI: Dipartimento di Informatica - Scienza e Ingegneria
Mura Anteo Zamboni 7
40127, Bologna, Italy
There is still a lot of confusion about “optimal” sharing in the lambda calculus, and its actual
efficiency. In this article, we shall try to clarify some of these issues.
Categories and Subject Descriptors: [Theory of computation]: Lambda calculus; [Theory of
computation]: Abstract machines; [Theory of computation]: Equational logic and rewriting;
[Software and its engineering]: Functional languages
1. INTRODUCTION
In relation to rewriting techniques, sharing is the ability to avoid duplication of
reduction work, due to duplication of subterms. The issue is relatively trivial at
first order, but it becomes much more entangled as soon as we pass to a higher
order framework, for which the lambda calculus provides a paradigmatic example.
Consider the well known beta rule
λx.M N M[N/x]
If the argument Ngets duplicated and it contains a reducible expression, its reduc-
tion will be duplicated too.
It may seem that an eager strategy (possibly delayed “on demand”, as in the
“call by need” strategy) could solve the job. Unfortunately, this is not the case.
Let us consider first the case of weak frameworks. In this case, functions are
treated as values and reduction is never pursued under a λ-abstraction. So, if the
argument Nis a lambda expression containing a redex R, and Nis duplicated, the
reduction of Rwill be repeated in each instance. A typical situation is when the
argument Nis obtained as a partial instantiation of some functional F. To make
things very simple, let us suppose F=two =λxy.x(x y ) (the Church integer) and
let us instantiate it with the identity I=λx.x
N=two I λy.I (I y)
that is a weak normal form. If Ngets duplicated, the two internal applications of
the identity will be duplicated too.
This may have very nasty effects. Consider the following weak reduction
two two I tw o (two I)
two (λy.I (I y))
λy.(λy1.I(I y1))(λy2.I(I y2)y)
where we renamed variables for the sake of readability. We have just doubled the
number of internal applications of the identity! If we start with napplications of
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY, Pages 1–0??.
arXiv:1701.04240v1 [cs.LO] 16 Jan 2017
2·
two
two . . . two
| {z }
ntimes
I
we end up with a term containing 2napplications of the identity and all of them
will need to be reduced when the term will be feed with an extra argument (e.g.
an additional identity).
We warmly invite the readers to write and evaluate the term
n two I I (1)
(where nand two are Church integers) in their favorite (weak) functional program-
ming language, and observe the exponential explosion of the complexity when n
grows (no matter if the language is lazy or strict, or if it adopts combinators or
closures). On the other side, innermost reduction of the previous term is just linear
in n.
So, is rightmost innermost reduction the correct solution? Of course, not. As a
trivial example, consider the term
I(n two)I I (2)
Rightmost innermost reduction would start normalizing (n two) that is the Church
integer for 2nand has exponential dimension, hence the whole reduction would be
exponential too.
What happens in (the innermost reduction of) example (2) is that the term I
inside λy.I (I y) of example (1) is replaced by a local variable, postponing the instan-
tiation with the identity to a later stage. That is to say, that is not the duplication
of redexes that matters, but the unnecessary, blind duplication of applications. For
instance, with environment machine, any time we open a closure and the internal
code contains an application, we are possibly duplicating reduction work.
@ @ @
Fig. 1. Forbidden duplication of applications
But applications and lambda abstractions are just dual operators, so is the du-
plication of lambda abstractions dangerous too, from the point of view of sharing?
In principle, no, it is not. The point is that if the abstraction node is shared,
there are already two (or more) different calls to the function, that will give rise
to different redexes. The big challenge, however, is to duplicate the abstraction
node without jointly duplicating the whole body of the function (that could contain
applications). The really delicate part is to understand what happens at the level of
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
·3
variables, since they can now be bound by one or the other of the two abstractions,
requiring some form of “unsharing” (see Figure 2). The correct management of
λx
λxλx
x1x
21
Μ
2
x
Μ
Fig. 2. Legal duplication of λ-abstractions
sharing and unsharing is not trivial. It was solved for the first time by Lamping
[Lamping 1990], and later revised and improved by many other people. One usually
refer to this part of the algorithm as “bookkeeping” work, to distinguish it from
duplication work and the actual firing of β-redexes.
Let us also observe that, in the terminology of interaction nets [Lafont 1990], the
different behavior between the duplication of applications and lambda abstractions
resides in the fact that in the latter case (Figure 2) duplication is requested at the
principal port of the node, while in the case of the application (Figure 1), it is
requested at an auxiliary port.
2. REDUCTION BY FAMILIES
evy developed the theory of optimality long before an implementation for it was
available (in fact, the problem remained open for quite a long time). The precise
definition of optimal sharing is not simple, and we shall postpone it for a moment.
Two redexes that are sharable according to L´evy are said to belong to a same
family, and optimal reduction is simulated on lambda terms by firing “in parallel”
all redexes in a same family. Family reduction has very nice properties: the most
interesting one is that it satisfies a one-step diamond property. As a consequence,
as far as we reduce needed redexes, the length of a normalizing reduction (if it
exists) does not depend on the strategy. This fact supported the conjecture that
family reduction could provide an interesting measure of the “intrinsic complexity”
of lambda terms, i.e. the cost required to compute the normal form of a lambda
term independently from the reduction technique.
Before addressing this issue, let us consider a different, simple reduction tech-
nique: parallel β-reduction in Takahashi’s sense [Takahashi 1995], that allows us
to fire in parallel (in a single step) all redexes in a given term. Clearly, this is
a superoptimal reduction technique: all redexes in a L´evy’s family are parallel in
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
4·
Takahashi’s sense, but non all parallel redexes eventually belong to a same family
(that is, not all of them are sharable).
The potential parallelism inherent in λ-terms can be very easily understood by
restricting the attention to the simply typed case (the following argument was
spelled out for the first time in the appendix to [Asperti and L´evy 2013]).
Working with simple types, it is traditional to define a notion of degree of a redex
Rin the following way (see e.g.[Girard et al. 1989]).
Definition 2.1 degree.The degree (T) of a type Tis defined by:
(A) = 1 if Ais atomic
(UV) = max{(U), ∂(V)}+ 1
The degree of a redex (λx :U.M)Nis (UV), where Vis the type of M.
The degree (M) of a term Mis the maximum among the degrees of all its redexes.
A crucial property of the simply typed lambda calculus is that a redex Rof
type UVmay only create redexes of type Uor of type V, hence with a degree
strictly less than that of R. As a consequence, each simply typed lambda term M
can be reduced to its normal form with a number of parallel reduction steps bound
by its degree (M). On the other side, we can encode complex (arbitrarily large
Kalmar-elementary) computations in λ-terms with low-degrees (see [Mayer 1974;
Statman 1977]). So, this two facts together prove that the amount of a parallelism
in λ-terms is not elementary recursive.
Does this say anything bad about parallelism? No. On the contrary, there is a
huge amount of parallelism in lambda terms (more than one could have expected),
so it seems to be rather a good idea to try to exploit it. Of course, the speed up we
may expect is never larger then the degree of parallelism, and if it is finite (or even
elementary in the size of the term!) the execution of large elementary computations
(with an exponential height larger than that of the available parallelism) will remain
elementary.
Coming back to optimality, the important result proved in [Asperti and Mairson
2001] was that most of these parallel redexes are actually sharable in L´evy’s sense,
so that, again, you may reduce a simply typed lambda term in a number of family
reductions that is approximately linear in its size (!!). Technically, this implies that
(on a sequential machine) the cost of sharing a single redex cannot be bound by
any elementary function, but this is merely due to the enormous amount of sharing
that is inherent in lambda terms.
Stated in another way, we already concluded that parallel reduction does not look
a bad idea. Then we discovered that most of the parallel redexes can be actually
shared, that looks like an even better idea: why wasting parallelism by duplicating
work if you can share it? However, the amount of sharing can be so - inconceivably
- large that (in worse, pathological cases) cannot be handled in elementary time in
the size of the term. That’s all.
The result in [Asperti and Mairson 2001] tells you nothing about the efficiency
of optimal reduction. The surprising result is that in lambda terms, due to higher
order, we have much more sharing (in L´evy’s sense) than expectable. As a conse-
quence:
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
·5
—the computational cost per family may be huge
—the length of family reduction is not a good measure of the intrinsic complexity
of terms
3. EFFICIENCY, IN THEORY
Intuitively, sharing graph reduction `a la Lamping performs the minimum amount
of duplication required by the computation. However, as we already explained, in
addition to this duplication work, there is also an additional “bookkeeping” work
required to enforce the correct matching between sharing and unsharing. This is
usually implemented by means of different levels of sharing, and the introduction of
suitable operators acting as brackets in the graph to delimit the scope of duplicators,
dynamically changing their levels. This part of the algorithm is pretty complex, and
its cost is not so clear yet. In particular, as proved in [Asperti and Chroboczek 1997],
if you are not careful in the management of brackets, they can easily accumulate,
resulting in an exponential overhead. For instance, Gonthier’s implementations
[Gonthier et al. 1992a; 1992b] are just wrong, from this respect.
The accumulation problem described in [Asperti and Chroboczek 1997], was not
present in Lamping’s original algorithm [Lamping 1990], neither in the Bologna
Optimal Higher Order Machine (BOHM) [Asperti et al. 1996], or in later imple-
mentations such as Lambdascope [van Oostrom and van de Looij 2010]. It is con-
jectured that bookkeeping only adds a polynomial overhead to the reduction cost,
but there is no proof of this fact.
To avoid to take bookkeeping into consideration, it was natural to look for frame-
works where there is no need for it. A particularly interesting case was provided by
elementary linear Logic [Girard 1998], that is a logic with boxes but no dereliction,
expressive enough to code all elementary functions. The sharing graph reduction
of lambda terms typable in elementary linear logic can be done without the use of
brackets, and hence without bookkeeping.
Rephrasing [Asperti and Mairson 2001] in this context, [Asperti et al. 2004]
showed that the non elementary cost of optimal reduction is not due to bookkeeping
(which one may suspect to add superfluous work), but to the (apparently unavoid-
able) duplication work. If you accept the fact that optimal reduction performs the
minimal amount of duplication, you will have at least the same operations, and
hence the same computational cost in any reduction technique.
The efficient nature of optimal reduction in absence of bookkeeping was con-
firmed by [Baillot et al. 2011], who considered a class of λ-terms of known bounded
complexity (polynomial and elementary time) and investigated the cost of their
normalization via sharing graphs: the cost stays in the expected complexity class.
More recently, still working in a “bookkeeping free” framework, and making a
direct syntactical comparison with a standard graph rewriting machine, [Guerrini
et al. 2012] showed that sharing graphs can only improve performances.
In conclusion, while there are several examples of classes of lambda terms where
optimal reduction outperforms standard techniques, there is so far no known coun-
terexample to its computational efficiency.
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
6·
4. EFFICIENCY, IN PRACTICE
So, if optimal reduction is so good, and apart from the benighted ostracism of
traditional schools, why functional programming languages are not yet implemented
in this way?
First of all, we should make a distinction according to the intended use of the
normalization algorithm. There are essentially two different settings where normal-
ization of λ-terms plays a role: the first one is in higher order logical frameworks
based on Martin-L¨of type theory (e.g. for type-checking of dependent types, or when
deploying reflection); the second setting is as core of real functional languages. We
shall discuss them separately.
4.1 Higher order logical frameworks
The most important use of reduction in this context is to check convertibility of
λ-terms: since the calculus is confluent and normalizing, two terms are convertible
if and only if their normal forms are equal. However, this is just an extrema ratio:
there is no evidence at all that the best way to check convertibility is via normal-
ization, and in fact, up to our knowledge, no logical framework implements it in
such a brute force way. In the vast majority of cases, two terms are convertible
just because are equal (even if not normal), and it would be a major waste of time
to normalize them. Even if they are not equal, they could just be few reduction
steps afar (e.g. one could be obtained by the other by folding/unfolding a few
definitions). In this case, the use of suitable convertibility heuristics, or a tighter
control of constant unfolding could be substantially more beneficial than improving
the efficiency of reduction.
In the case of optimality, the use of normalization for comparing terms poses a
few additional problems, since there is the need to inspect the normal form1. This
can be done in two ways: either by traveling in the resulting graph, computing paths
in it, or via a readback procedure that reconstructs the λ-term out of the graph.
At present, no precise bound at the complexity of these operations is known, but
they do not look too complex. The delicate point is that, in this case, it does not
make sense to compute complexity in terms of the size of the input, since a small
sharing graph may result in a huge lambda term [Lawall and Mairson 1996]. It is
conjectured that, starting from a sharing graph in normal form, the complexity of
the readback procedure is just linear in the size of the resulting term (that, for the
sake of comparing term, is the best we may expect), but there is no proof of this
fact.
Reduction is also a key ingredient of the reflection technique [Boutin 1997; Baren-
dregt and Barendsen 2002], whose basic idea is to check a property by running a
suitable certified decision procedure. For instance, in order to compare two regular
expressions, we can build the corresponding automata and execute a bisimulation
algorithm over them. In this case, having an efficient way of evaluating lambda
expressions may be important; however, for the most typical uses of reflection,
and especially for small scale reflection [Gonthier and Mahboubi 2010], optimal
reduction looks a bit overkilling.
1Note that no functional programming language gives you the ability to inspect higher order
values, e.g. you cannot read back a closure: this is just an issue for convertibility.
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
·7
There is a final point that, at present, may advise against the adoption of op-
timal reduction in logical frameworks. Reduction is one of the most primitive
operations in higher order logical frameworks, and a basic component of the type-
checking/verification algorithm. So, it is part of the so called kernel of these sys-
tems: a component whose correctness must be trusted. To this aim, it has been
argued that kernels should be small (in terms of lines of code), in order to improve
confidence in their implementation2. While it is possible to implement abstract re-
duction machines for lambda terms in a few lines, sharing graphs eventually require
a bit more code, and maybe it is not such a good idea to try to put this machinery
in the kernel.
4.2 Functional programming
The first issue to face, when considering optimal reduction for the implementation
of a real functional programming language, is to understand if the technique can be
generalized to a larger and more flexible calculus (coding everything as pure lambda
terms is, of course, not a feasible solution). Since sharing graphs can be expressed
in terms of interaction nets, the natural idea is to generalize the logical operators
from the application-lambda abstraction pair, to a generic setting of (higher order)
interaction operators. This naturally leads to interaction system [Asperti and Lan-
eve 1994], that are the elegant synthesis between interaction nets and Klop’s higher
order combinatory reduction systems [Klop 1980]. Interaction nets are expressive
enough to cover all inductive data structures, primitive fix-points and recursion,
and also effective numerical computations where each integer is treated as a differ-
ent constructor processed via primitive arithmetical operations. Interaction system
can be implemented by means of sharing graphs with no additional burden with
respect to lambda-calculus [Asperti and Laneve 1996], demonstrating that sharing
graphs just provide the abstract machinery for dealing with (optimal) sharing in a
higher-order setting, independently from the rewriting rules.
first order higher order
direct acyclic graphs (dags) sharing graphs
Fig. 3. Sharing machinery
The Bologna Optimal Higher-order Machines (BOHM) [Asperti et al. 1996] pro-
vided a prototype implementation of the above ideas. BOHM was written in C,
and aimed to efficiency, in order to compare with real implementations. Several
benchmarks are given in [Asperti and Guerrini 1998]. On pure lambda terms (see
pag.296-230) BOHM outperformed both Caml Light and Haskell, while remaining
competitive on typical symbolic computations. On more numerical computations
Caml Light was sensibly faster (up to one order of magnitude), that was not sur-
prising due to the underlying overhead of graph rewriting.
The main problem we faced when implementing sharing graphs was not related
to performance but to memory consumption. This may look surprising since the
2This conception is possibly a bit outdated. Instead of having a small kernel, it would be better
to have a verified kernel, of course, no matter what its size could be.
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
8·
point of optimality is precisely to be as parsimonious as possible in the duplication
of data structures. However, the two things have very little in common. In general,
there is a well known tension between time and space: you may improve time by
sacrificing space, and conversely you may save space by spending more time. For in-
stance, Savitch algorithm for graph reachability (implying PSPACE = NPSPACE)
works in space O(log2(n)) where nis the number of nodes of the graph, but its
time complexity is O(nnlog(n)); this is to be compared with the best algorithms in
time, that have time complexity O(n2) (linear in the size of the graph) but require
O(n log(n)) space. In many interesting cases, a data type can be more compactly
encoded in terms of a procedure producing it3: a zipped file saves space at the
cost of unzipping the information when required. As another example, all program
transformations meant to improve performance such as inlining, unfolding or loop
unrolling typically augment the dimension of the code.
To make an example relative to sharing graphs, consider a fixpoint definition
F= Θ MMM)
where Θ is some fixpoint operator. An invocation of Fwill result in a lazy unfolding
and partial evaluation of its body, as required by the computation. To avoid to
repeat work, this unfolded form must be saved as a new, optimized version of F:
F=M(M . . . (MM)))
For instance, after invoking a recursive definition of a factorial function on the
number 20, the new definition of the factorial will look like a sort of case switch for
the first 20 integers, followed by a recursive call to deal with the remaining cases.
This may look as a desirable effect (a sort of naive form of memoization), but in
many situations things are not so clear, possibly leading to a large consumption of
memory space. Of course, you may renounce to share global definitions with their
invocation instances, making local copies instead, but this clearly goes against the
very idea of optimality.
Twenty years ago, this looked like a serious problem; since then, memory has
become much cheaper and maybe, in the Big Data era we are entering, this is not
a real issue any more.
5. SUPER OPTIMAL STRATEGIES
To address the possibility to have super optimal reduction techniques for lambda
terms we need to better understand the definition of optimal sharing according
to L´evy. Let us start with an example. Consider the development for the term
M= ∆(F I) described in Figure 4, where ∆ = λx.x x,F=λz.z y and I=λx.x.
Firing R,S1and S3we obtain the term P= (I y)(I y); the two redexes T3and
T4inside Plooks sharable, although they have no ancestor in common: T3is a
residual of T1, that in turn was created by S1, while T4has just been created by
S3. In order to relate T3and T4, we need to consider a different reduction for M,
in this case the innermost reduction of Sleading to ∆(I y) and observe that both
T3and T4are residual (w.r.t. to R1) of the same redex T.
In general (see Figure 5), we say that a redex Swith history σis a copy of a redex
3This is the case for all non random numbers according to Kolmogorov complexity.
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
·9
(I y)
S1S2
T2
R1
T1S3S4
S5T3T4S6
T5T6
R2
(F I) (F I)
(I y) (F I) (F I) (I y)
y (F I) (I y) (I y) (F I) y
y (I y) (I y) y
(F I)
y
y y
T
R
S
Fig. 4. ∆ = λx.x x,F=λz.z y and I=λx.x
Fig. 5. ∆ = λx.x x,F=λz.z y and I=λx.x
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
10 ·
Rwith history ρ, written ρR σS, if and only if there is a derivation τsuch that
ρτ is permutation equivalent to σ(ρτ σ) and Sis a residual of Rwith respect
to τ(SR/τ).
The symmetric and transitive closure of the copy relation is called the family
relation, and will be denoted with '.
Two redexes are sharable according to L´evy if and only if they belong to a same
family in the above sense.
It is important to observe that the family relation is not just defined over redexes,
but it is relativized with respect to a reduction (the redex history) from some initial
expression; as a consequence we will only be able to relate redexes originated from
a same term M, and the choice of initial term is relevant to determine sharing.
For instance, in the case of the example in Figure 4, if instead of start reducing
from ∆(F I) we start from (F I)(F I ) then, according to L´evy, we loose the pos-
sibility to share T3and T4inside P. Levy’s notion aims to preserve the sharing
“inherent” in the initial λ-term, and not to recognize common subexpressions gen-
erated along the reduction (see [Grabmayer and Rochel 2014] for an investigation
of incremental sharing) . Two redexes can be shared when they have been cre-
ated in essentially the same way, and not when they happen to look similar due to
“syntactical coincidences”.
The critical situation is described in Figure 6.
λx.M λx.M
@ @
N
vs.
@
N
Fig. 6. An example of super optimal sharing
This kind of configurations may be addressed, at some extent, by memoization
techniques: if we cash the result of the first redex, and we meet the “same” configu-
ration again, then we can reuse the previous result for the second computation. The
delicate point is to understand what we mean by “same”: intensional equality may
be too restrictive, and at the same time it may clutter the memoization table with
too many terms; on the other side, as explained in Section 4.1 there is no obvious
strategy to address convertibility: in particular, the obvious approach consisting in
normalizing arguments may be in conflict with other optimality constraints (with-
out considering the possibility of divergence).
So, while memoization is definitely not a panacea, it is true that in some situation
can be more efficient than optimal sharing `a la L´evy.
A context where memoization turns out to be particularly effective is on finite
structures [Asperti 2015]. The advantage of working in a finite setting is that instead
of performing memoization “on demand”, we can work in parallel on all possible
inputs, unfolding a function into a finite vector of cases (that is, essentially, its
graph). Moreover, in this setting, types are strictly related to the dimension of
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
·11
data: this provides guidelines for the use of memoization, preventing to build huge
hashing tables. The resulting calculus offers an efficient framework for the evalua-
tion of finite terms in conjuction with a reasonably simple meta-theory, permitting
a detailed and formal investigation of the complexity of reduction.
6. DO WE NEED HIGHER ORDER?
The real question, however, is if we really need higher-order. As a matter of fact,
functional programming makes a very modest use of it. Passing functions is used as
a way to improve the parametricity of programs, and not as a computational device.
Higher order order structures are hardly ever used as a datatype, and dynamically
synthesizing functions is much less frequent than expected. The fact that functional
languages survive without the need of optimal reduction techniques is merely due
to this fact.
The danger inherent in higher order programming is well testified by a long series
of studies relating complexity classes to hierarchies of terms with increasing type
rank (see e.g. [Gurevich 1983; Goerdt 1992; Goerdt and Seidl 1990; Hillebrand and
Kanellakis 1996; Asperti 2015]). For instance, even working in a restricted finite
setting, terms of system Tof rank 2 are already polynomially complete, and their
complexity can become rapidly unfeasible at higher ranks.
Even the recent result in [Accattoli and Dal Lago 2016] can be understood in this
sense. In order to simulate a (bounded) Turing machine you just need to encode
the transition function between configurations, that is a linear function, and have
the possibility to iterate it. On these trivial lambda terms even a silly strategy like
leftmost outermost reduction turns out to be effective. Of course, this tells you
nothing about the best way to evaluate lambda terms. If you really want to learn a
lesson from this result is that, in order to encode Turing machines, you do not really
need the full expressive power of lambda terms, and in particular you do not need
higher-order (but to build sufficiently large “clocks”). This is not surprising: in
fact, to efficiently compute a Turing machine, you just need . . . a Turing machine.
ACKNOWLEDGMENTS
This short note was mostly motivated by a recent Haskell discussion thread de-
bating why isn’t anyone talking about optimal lambda calculus implementations?4.
Unfortunately, the thread was already archived when I noticed it and did not have
the opportunity to post my contribution.
REFERENCES
Beniamino Accattoli and Ugo Dal Lago. 2016. (Leftmost-Outermost) Beta Reduction is Invariant,
Indeed. Logical Methods in Computer Science 12, 1 (2016). DOI:http://dx.doi.org/10.2168/
LMCS-12(1:4)2016
Andrea Asperti. 2015. Computational Complexity Via Finite Types. ACM Trans. Comput. Log.
16, 3 (2015), 26. DOI:http://dx.doi.org/10.1145/2764906
Andrea Asperti and Juliusz Chroboczek. 1997. Safe Operators: Brackets Closed Forever Op-
timizing Optimal lambda-Calculus Implementations - Optimizing Optimal lambda-Calculus
4https://www.reddit.com/r/haskell/comments/2zqtfk/why_isnt_anyone_talking_about_
optimal_lambda/
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
12 ·
Implementations. Appl. Algebra Eng. Commun. Comput. 8, 6 (1997), 437–468. DOI:http:
//dx.doi.org/10.1007/s002000050083
Andrea Asperti, Paolo Coppola, and Simone Martini. 2004. (Optimal) duplication is not ele-
mentary recursive. Inf. Comput. 193, 1 (2004), 21–56. DOI:http://dx.doi.org/10.1016/j.ic.
2004.05.001
Andrea Asperti, Cecilia Giovanetti, and Andrea Naletto. 1996. The Bologna Optimal Higher-
Order Machine. J. Funct. Program. 6, 6 (1996), 763–810. DOI:http://dx.doi.org/10.1017/
S0956796800001994
Andrea Asperti and Stefano Guerrini. 1998. The Optimal Implementation of Functional Pro-
gramming Languages. Cambridge Tracts in Theoretical Computer Science, Vol. 45. Cambridge
University Press.
Andrea Asperti and Cosimo Laneve. 1994. Interaction Systems I: The Theory of Optimal
Reductions. Mathematical Structures in Computer Science 4, 4 (1994), 457–504. DOI:
http://dx.doi.org/10.1017/S0960129500000566
Andrea Asperti and Cosimo Laneve. 1996. Interaction Systems II: The Practice of Optimal
Reductions. Theor. Comput. Sci. 159, 2 (1996), 191–244. DOI:http://dx.doi.org/10.1016/
0304-3975(95)00062- 3
Andrea Asperti and Jean-Jacques L´evy. 2013. The Cost of Usage in the Lambda-Calculus. In 28th
Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA,
USA, June 25-28, 2013. IEEE Computer Society, 293–300. DOI:http://dx.doi.org/10.1109/
LICS.2013.35
Andrea Asperti and Harry G. Mairson. 2001. Parallel Beta Reduction Is Not Elementary Recur-
sive. Inf. Comput. 170, 1 (2001), 49–80. DOI:http://dx.doi.org/10.1006/inco.2001.2869
Patrick Baillot, Paolo Coppola, and Ugo Dal Lago. 2011. Light logics and optimal reduction:
Completeness and complexity. Inf. Comput. 209, 2 (2011), 118–142. DOI:http://dx.doi.org/
10.1016/j.ic.2010.10.002
Henk Barendregt and Erik Barendsen. 2002. Autarkic Computations in Formal Proofs. J. Autom.
Reasoning 28, 3 (2002), 321–336. DOI:http://dx.doi.org/10.1023/A:1015761529444
Samuel Boutin. 1997. Using Reflection to Build Efficient and Certified Decision Procedures.
In Theoretical Aspect of Computer Software TACS’97, Lecture Notes in Computer Science,
Martin Abadi and Takahashi Ito editors (Eds.), Vol. 1281. Springer-Verlag, 515–529. DOI:
http://dx.doi.org/10.1007/BFb0014565
Jean-Yves Girard. 1998. Light Linear Logic. Inf. Comput. 143, 2 (1998), 175–204. DOI:http:
//dx.doi.org/10.1006/inco.1998.2700
Jean-Yves Girard, Yves Lafont, and Paul Taylor. 1989. Proofs and Types. Cambridge Tracts in
Theoretical Computer Science, Vol. 7. Cambridge University Press.
Andreas Goerdt. 1992. Characterizing Complexity Classes by Higher Type Primitive Recursive
Definitions. Theor. Comput. Sci. 100, 1 (1992), 45–66. DOI:http://dx.doi.org/10.1016/
0304-3975(92)90363- K
Andreas Goerdt and Helmut Seidl. 1990. Characterizing Complexity Classes by Higher Type Prim-
itive Recursive Definitions, Part II. In Aspects and Prospects of Theoretical Computer Science,
6th International Meeting of Young Computer Scientists (IMYCS), Smolenice, Czechoslovakia,
November 19-23, 1990, Proceedings (Lecture Notes in Computer Science), Vol. 464. Springer,
148–158. DOI:http://dx.doi.org/10.1007/3-540- 53414-8_37
Georges Gonthier, Mart´ın Abadi, and Jean-Jacques L´evy. 1992a. The Geometry of Optimal
Lambda Reduction. In Conference Record of the Nineteenth Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, Albuquerque, New Mexico, USA, Jan-
uary 19-22, 1992. ACM Press, 15–26. DOI:http://dx.doi.org/10.1145/143165.143172
Georges Gonthier, Mart´ın Abadi, and Jean-Jacques L´evy. 1992b. Linear Logic Without Boxes.
In Proceedings of the Seventh Annual Symposium on Logic in Computer Science (LICS ’92),
Santa Cruz, California, USA, June 22-25, 1992. IEEE Computer Society, 223–234. DOI:
http://dx.doi.org/10.1109/LICS.1992.185535
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
·13
Georges Gonthier and Assia Mahboubi. 2010. An introduction to small scale reflection in Coq.
Journal of Formalized Reasoning 3, 2 (2010), 95–152. DOI:http://dx.doi.org/10.6092/issn.
1972-5787/1979
Clemens Grabmayer and Jan Rochel. 2014. Maximal sharing in the Lambda calculus with letrec. In
Proceedings of the 19th ACM SIGPLAN international conference on Functional programming,
Gothenburg, Sweden, September 1-3, 2014. ACM, 67–80. DOI:http://dx.doi.org/10.1145/
2628136.2628148
Stefano Guerrini, Thomas Leventis, , and Marco Solieri. 2012. Deep into optimality complexity
and correctness of sharing implementation of bounded logics. proceedingss of DICE 2012,
Tallin, Estonia, 2012. (2012).
Yuri Gurevich. 1983. Algebras of Feasible Functions. In 24th Annual Symposium on Foundations
of Computer Science (FOCS), Tucson, Arizona, USA. IEEE Computer Society, 210–214. DOI:
http://dx.doi.org/10.1109/SFCS.1983.5
Gerd G. Hillebrand and Paris C. Kanellakis. 1996. On the Expressive Power of Simply Typed
and Let-Polymorphic Lambda Calculi. In Proceedings, 11th Annual IEEE Symposium on Logic
in Computer Science, New Brunswick, New Jersey, USA, July 27-30, 1996. IEEE Computer
Society, 253–263. DOI:http://dx.doi.org/10.1109/LICS.1996.561337
Jan W. Klop. 1980. Combinatory Reduction Systems. Ph.D. Dissertation. CWI, Amsterdam.
Yves Lafont. 1990. Interaction Nets. In Conference Record of the Seventeenth Annual ACM Sym-
posium on Principles of Programming Languages, San Francisco, California, USA, January
1990. ACM Press, 95–108. DOI:http://dx.doi.org/10.1145/96709.96718
John Lamping. 1990. An Algorithm for Optimal Lambda Calculus Reduction. In Conference
Record of the Seventeenth Annual ACM Symposium on Principles of Programming Languages,
San Francisco, California, USA, January 1990. ACM Press, 16–30. DOI:http://dx.doi.org/
10.1145/96709.96711
Julia L. Lawall and Harry G. Mairson. 1996. Optimality and Inefficiency: What Isn’t a Cost
Model of the Lambda Calculus?. In Proceedings of the 1996 ACM SIGPLAN International
Conference on Functional Programming (ICFP ’96), Philadelphia, Pennsylvania, May 24-26,
1996. ACM, 92–101. DOI:http://dx.doi.org/10.1145/232627.232639
Albert R. Mayer. 1974. The Inherent Computational Complexity of Theories of Ordered Sets. In
Proceedings of the International Congress of Mathematicians, Vancouver. 477–482.
Richard Statman. 1977. The Typed lambda-Calculus Is not Elementary Recursive. In 18th An-
nual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA. IEEE
Computer Society, 90–94. DOI:http://dx.doi.org/10.1109/SFCS.1977.34
Masako Takahashi. 1995. Parallel Reductions in λ-Calculus. Information and Computation 118,
1 (1995), 120–127. DOI:http://dx.doi.org/10.1006/inco.1995.1057
Vincent van Oostrom and Kees-Jan van de Looij. 2010. Lambdascope. Another optimal imple-
mentation of the lambda-calculus. (2010). http://www.phil.uu.nl/~oostrom/publication/
pdf/lambdascope.pdf
ACM Transactions on Computational Logic, Vol. V, No. N, 20YY.
... What is the price of sharing graphs? To answer this question, very recently broadly surveyed by Asperti[1], we first need the notion of cost. We cannot use the number of β steps, since, even though they are a reasonable measure for the λ calculus[9], since in sharing graphs whole families of redexes are reduced simultaneously, and this is the reason why they realise the Lévy optimal reduction[19]. ...
Conference Paper
Full-text available
Sharing graphs are a local and asynchronous implementation of lambda-calculus beta-reduction (or linear logic proof-net cut-elimination) that avoids useless duplications. Empirical benchmarks suggest that they are one of the most efficient machineries, when one wants to fully exploit the higher-order features of lambda-calculus. However, we still lack confirming grounds with theoretical solidity to dispel uncertainties about the adoption of sharing graphs. Aiming at analysing in detail the worst-case overhead cost of sharing operators, we restrict to the case of elementary and light linear logic, two subsystems with bounded computational complexity of multiplicative exponential linear logic. In these two cases, the bookkeeping component is unnecessary, and sharing graphs are simplified to the so-called "abstract algorithm". By a modular cost comparison over a syntactical simulation, we prove that the overhead of shared reductions is quadratically bounded to cost of the naive implementation, i.e. proof-net reduction. This result generalises and strengthens a previous complexity result, and implies that the price of sharing is negligible, if compared to the obtainable benefits on reductions requiring a large amount of duplication.
Article
Full-text available
Slot and van Emde Boas' weak invariance thesis states that reasonable machines can simulate each other within a polynomially overhead in time. Is lambda-calculus a reasonable machine? Is there a way to measure the computational complexity of a lambda-term? This paper presents the first complete positive answer to this long-standing problem. Moreover, our answer is completely machine-independent and based over a standard notion in the theory of lambda-calculus: the length of a leftmost-outermost derivation to normal form is an invariant cost model. Such a theorem cannot be proved by directly relating lambda-calculus with Turing machines or random access machines, because of the size explosion problem: there are terms that in a linear number of steps produce an exponentially long output. The first step towards the solution is to shift to a notion of evaluation for which the length and the size of the output are linearly related. This is done by adopting the linear substitution calculus (LSC), a calculus of explicit substitutions modeled after linear logic proof nets and admitting a decomposition of leftmost-outermost derivations with the desired property. Thus, the LSC is invariant with respect to, say, random access machines. The second step is to show that LSC is invariant with respect to the lambda-calculus. The size explosion problem seems to imply that this is not possible: having the same notions of normal form, evaluation in the LSC is exponentially longer than in the lambda-calculus. We solve such an impasse by introducing a new form of shared normal form and shared reduction, deemed useful. Useful evaluation avoids those steps that only unshare the output without contributing to beta-redexes, i.e. the steps that cause the blow-up in size. The main technical contribution of the paper is indeed the definition of useful reductions and the thorough analysis of their properties.
Conference Paper
Full-text available
A new “inductive” approach to standardization for the λ-calculus has been recently introduced by Xi, allowing him to establish a double-exponential upper bound |M|2|σ| for the length of the standard reduction relative to an arbitrary reduction σ originated in M. In this paper we refine Xi's analysis, obtaining much better bounds, especially for computations producing small normal forms. For instance, for terms reducing to a boolean, we are able to prove that the length of the standard reduction is at most a mere factorial of the length of the shortest reduction sequence. The methodological innovation of our approach is that instead of counting the cost for producing something, as is customary, we count the cost of consuming things. The key observation is that the part of a λ-term that is needed to produce the normal form (or an arbitrary rigid prefix) may rapidly augment along a computation, but can only decrease very slowly (actually, linearly).
Article
We address computational complexity writing polymorphic functions between finite types (i.e., types with a finite number of canonical elements), expressing costs in terms of the cardinality of these types. This allows us to rediscover, in a more syntactical setting, the known result that the different levels in the hierarchy of higher-order primitive recursive functions (Gödel system T), when interpreted over finite structures, precisely capture basic complexity classes: functions of rank 1 characterize LOGSPACE, rank 2 PTIME, rank 3 PSPACE, rank 4 EXPTIME=DTIME(2poly), and so on.
Article
The notion of parallel reduction is extracted from the simple proof of the Church-Rosser theorem by Tait and Martin-Löf. Intuitively, this means to reduce a number of redexes (existing in a λ-term) simultaneously. Thus in the case of β-reduction the effect of a parallel reduction is same as that of a "complete development" which is defined by using "residuals" of β-redexes. A nice feature of parallel reduction, however, is that it can be defined directly by induction on the structure of λ-terms (without referring to residuals or other auxiliary notions), and the inductive definition provides us exactly what we need in proving the theorem inductively. Moreover, the notion can be easily extended to other reduction systems such as Girard′s second-order system F and Gödel′s system T. In this paper, after reevaluating the significance of the notion of parallel reduction in Tait-and-Martin-Löf type proofs of the Church-Rosser theorems, we show that the notion of parallel reduction is also useful in giving short and direct proofs of some other fundamental theorems in reduction theory of λ-calculus; among others, we give such simple proofs of the standardization theorem for β-reduction (a special case of which is known as the leftmost reduction theorem for β-reduction), the quasi-leftmost reduction theorem for β-reduction, the postponement theorem of η-reduction (in βη-reduction), and the leftmost reduction theorem for βη-reduction.
Article
Increasing sharing in programs is desirable to compactify the code, and to avoid duplication of reduction work at run-time, thereby speeding up execution. We show how a maximal degree of sharing can be obtained for programs expressed as terms in the lambda calculus with letrec. We introduce a notion of 'maximal compactness' for λletrec-terms among all terms with the same infinite unfolding. Instead of defined purely syntactically, this notion is based on a graph semantics. λletrec-terms are interpreted as first-order term graphs so that unfolding equivalence between terms is preserved and reflected through bisimilarity of the term graph interpretations. Compactness of the term graphs can then be compared via functional bisimulation. We describe practical and efficient methods for the following two problems: transforming a λletrec-term into a maximally compact form; and deciding whether two λletrec-terms are unfolding-equivalent. The transformation of a λletrec-terms L into maximally compact form L0 proceeds in three steps: (i) translate L into its term graph G = [[L]] ; (ii) compute the maximally shared form of G as its bisimulation collapse G0 ; (iii) read back a λletrec-term L0 from the term graph G0 with the property [[L0]] = G0. Then L0 represents a maximally shared term graph, and it has the same unfolding as L. The procedure for deciding whether two given λletrec-terms L1 and L2 are unfolding-equivalent computes their term graph interpretations [[L1]] and [[L2]], and checks whether these are bisimilar. For illustration, we also provide a readily usable implementation.
Article
An optimal implementation of the��-calculus into inter- action nets, featuring 1. only a single type of scope node, 2. a completely reduction based read-back, and 3. only three reduction rule schemes. !� �2�(S0)((S0)0) !� ��(�(SS0)((SS0)0))((�(SS0)((SS0)0))0) !� �yz(�x(SS0)((SS0)0))((S0)((S0)0)) !� ��(S0)((S0)((S0)((S0)0)))
Article
 Considerations from category theory described in [1] have permitted to add new rewriting rules for optimal reductions of the λ-calculus [19, 14]. These rules produce an impressive improvement in the performance of the reduction system, and provide a first step towards the solution of the well known and crucial problem of accumulation of control operators. In this paper, after an introduction to optimal reductions, we exhibit the aforementioned problem and prove the correctness of the new rules.
Chapter
In this paper we explain how computational reflection can help build efficient certified decision procedure in reduction systems. We have developed a decision procedure on abelian rings in the Coq system but the approach we describe applies to all reduction systems that allow the definition of concrete types (or datatypes). We show that computational reflection is more efficient than an LCF-like approach to implement decision procedures in a reduction system. We discuss the concept of total reflection, which we have investigated in Coq using two facts: the extraction process available in Coq and the fact that the implementation language of the Coq system can be considered as a sublanguage of Coq. Total reflection is not yet implemented in Coq but we can test its performance as the extraction process is effective. Both reflection and total reflection are conservative extensions of the reduction system in which they are used. We also discuss performance and related approaches. In the paper,we assume basic knowledges of ML and proof-checkers.