ArticlePDF Available

Garbage Collection in LOGFLOW

Authors:

Abstract and Figures

Machines, 3DPAM) execute the Prolog program parallel with fine-grain execution. Coarse-grain pieces of work are executed by traditional sequential WAM (Warren Abstract) machines. In the execution of logic programming languages the memory consumption speed is very high because of a storage cell of the memory cannot be reused when it is filled once. The memory is fulfilled very shortly but most of the allocated storage cells are 'garbage' i.e., they are not used by the system. The 3DPAM engine does not use any space reclamation technique. Our goal is to select an appropriate garbage collection method for it to support the execution of larger programs. In order to find an efficient one an analysis of the system is presented in this paper. After the analysis some algorithm is described and the most appropriate one is suggested for implementation. This paper is the continuation of the paper [2], which gives an overview of the existing garbage collection methods. In Section 2. the data struc...
Content may be subject to copyright.
Garbage Collection in LOGFLOW
*
Norbert Podhorszki Peter Kacsuk
KFKI Research Institute for Measurement and Computing Techniques of the
Hungarian Academy of Sciences
H-1525 Budapest, P.O.Box 49. Hungary
{pnorbert, kacsuk}@sunserv.kfki.hu
1. Introduction
The LOGFLOW is a distributed Prolog system running on multi-transputer machines and
workstation clusters. The engines (Distributed Data Driven Prolog Abstract Machines, 3DPAM)
execute the Prolog program parallel with fine-grain execution. Coarse-grain pieces of work are
executed by traditional sequential WAM (Warren Abstract) machines.
In the execution of logic programming languages the memory consumption speed is very high
because of a storage cell of the memory cannot be reused when it is filled once. The memory is
fulfilled very shortly but most of the allocated storage cells are 'garbage' i.e., they are not used by the
system.
The 3DPAM engine does not use any space reclamation technique. Our goal is to select an
appropriate garbage collection method for it to support the execution of larger programs. In order to
find an efficient one an analysis of the system is presented in this paper. After the analysis some
algorithm is described and the most appropriate one is suggested for implementation.
This paper is the continuation of the paper [2], which gives an overview of the existing garbage
collection methods. In Section 2. the data structures of the 3DPAM are described briefly. In section 3.
the system is analysed. It describes how the program structures are created in the memory and how
they will be garbage. The Section 4. describes collection methods. The conclusion suggests the
appropriate methods for both implementations of the LOGFLOW system.
2. The memory structure of the 3DPAM
In 3DPAM the following data areas are used: Code, Context tables, Token Queues, Heap,
Token tables. The Code area is simply an array for the 3DPAM code. The Heap is used
conventionally, except its first part holds the ground structures for the sake of optimization. The other
data areas are new and specific to 3DPAM and are explained in the following sections. The detailed
description of the structure and the handling of these data areas can be found in [1].
2.1. The heap
The heap is a continuous data area in the
memory. It consists of cells (with the size of the integer
data type). A structure with N arguments occupies N+1
cells in the heap (the descriptor
[FUNC | Functor ID | Arity] cell and a reference cell for
each argument), see Fig. 1. The heap is consumed
continuously by the engine therefore, the structures of
different Prolog search paths are mixed.
The garbage collector has to concentrate on the heap because other data areas are cleared
automatically.
*
The work presented in this paper was done in the framework of Hungarian-Portuguese joint project
'PAR-PROLOPPE' registered under No. H-9305-02/1095
Heap
FUNC
ATOM
STR
functor / arity
5
Address of the
substructure
A structure
Fig. 1. The heap of the 3DPAM
Garbage Collection 2
2.2. Token queues
The execution is governed by tokens. Each token holds at least the following information:
identifier, target PE and starting address in the code. Depending on the type, some tokens hold more
information. There are three groups of tokens:
1. request token: DO (<environment>,<arguments>)
2. reply token: SUCC (<environment>), FAIL, CFAIL
3. request/reply token: SUB (<environment>) (Only virtually, they never appear in the
system)
Identifiers of tokens not currently processed, are placed into a token queue. There are two types
of token queues: Local Token Queue and Remote Token Queue. Tokens in the LTQ will be processed
by the local engine, whereas tokens in the RTQ will be transmitted to an other PE. The engine takes
the tokens (only identifiers, token data are taken from one of the token tables) from the LTQ. The
resulting tokens (if any) can be placed into the LTQ, or the RTQ, or can be processed by the engine.
2.3. Token tables
Tokens are kept in token tables. There are three kinds of token tables: Do, Fail and Sub/Succ.
The token tables are arrays of records which contain a pointer and a token slot. The pointer is
used to maintain the list of free token slots (see Section 2.4.4. where its usage is described). The token
slot holds the relevant information of a given token. Its structure equals to that of a token of the given
type.
When a token is processed, its record is freed therefore, the garbage
collector does not find any garbage in these tables. The examination of these
tables during the collection is important because the stored tokens contain
references to the data structures of the heap.
This fact is true for the next data areas, too.
2.4. Context tables
Context tables are used to distinguish different token streams. Their usage is fundamental to
realize pipeline AND parallelism. Every operator type (Unify, And, Or, Cut) has its own context table.
The theoretical basis is explained in [6] and an example can also be found there.
2.4.1. The Unify and Or context tables
The Unify and Or context tables are similar. The tables are arrays of records which contain a
pointer and a context slot. The pointers are used to form a chain of free context slots (similar to token
tables) and the context slots hold the relevant information:
Every time a token has arrived to a Unify or Or node, its information content is saved, and a
token with new identifier (new colour) will be emitted. When a corresponding token (representing a
partial solution) of this colour has arrived at the appropriate point, its original colour will be restored,
and the saved information will be updated.
2.4.2. The And context table
The And context table differs from the previous ones since, it must handle multiple overlapped
SUB token streams. The And context table is implemented in a different way. The details of current
implementation can be found in [5].
The And context table consists of records which contain an array of counters and a state field.
The context table is indexed with the colour of the incoming tokens, and the counters are indexed with
the identifier of the And node within a Unify/And node.
2.4.3. The Cut context table
The Cut context table consists of a pointer to chain the free items together, and a field for token
identifier. It is used to implement the 'cut' in the distributed environment, where the system looks for
all solutions while the 'cut' ignores all solution except for the first (first in the term of the sequential
execution!).
Garbage Collection 3
2.4.4. The dynamic handling of context and token tables
Token tables and Context tables are handled in
such a way that, although, the whole arrays are statically
allocated data objects, the elements of these arrays can be
allocated and freed dynamically. This feature is obtained
by means of pointers which are in fact array indices.
Whenever a new item is to be allocated, the head of the
free list will be the next item pointed by the current head
and thus, the current item is removed from the list.
Adding an item to the free list means setting the pointer of
the given element to the current head of the list, and the
given item will be the new head, see Fig. 2. This
elaboration combines the flexible and efficient use of
pointers with the benefits of arrays, i.e., no run-time
memory allocations and error handling are needed.
3. Analysis of the 3DPAM heap usage
3.1. 3DPAM specialities
The LOGFLOW system is a special distributed Prolog system: there are no remote references
used. The processing elements (PE) store all the needed data in the local memory. The new pieces of
work are transferred in tokens among the PEs, including all data needed for the processing. Therefore,
the garbage collector of a PE can be a local process which has to handle only the local states of the PE.
Generally, garbage collectors use complex algorithms in a distributed system handling remote
references. Notice, it does not conclude from this fact that a simpler algorithm of a sequential Prolog
system can be used in LOGFLOW. The types of the garbage are different from a sequential system,
see the section 3.2.
The other speciality is that only structures (including lists) are stored in the main memory area
(heap). Variables are stored in environments (and in special tables in the hybrid binding environment
scheme).
The third speciality comes from the distributed and fine-grained computation. Since many
tokens (pieces of work or results) are stored on a PE in one time, very large amount of heap references
is stored in several information tables of the 3DPAM engine. It means that the root set of the live
objects is very large (see the root set notion of the marking-and-reclaiming garbage collectors).
3.2. The types of garbage creation
3.2.1. Prolog speciality
During the sequential execution a branch
of the Search Graph of the Prolog program is
explored looking for a solution. If the program
fails, another branch is explored. The created
structures on the first branch can be reclaimed.
The backtracking feature of the sequential
Prolog interpreters solves this problem,
however, in a parallel system there is no
backtracking, all branches are explored in the
same time, see Fig. 3.
next
info
next
info
next
info
nil
info
next
info
next
info
nil
info
nil
info
nil
info
nil
info
next
info
next
info
List head
List head
Fig. 2. Dynamically allocated static table
No solutions
Many structures,
no solutions
Solution found
Query
Fig. 3. Garbage from failed search
Garbage Collection 4
A little help in a parallel system is the structure sharing among branches. A structure is used by
several branches of the Prolog program therefore, they live for long during the execution. The
handling of these structures is hard, see the hybrid binding environment scheme. It is important to
take into account that the sharing does not solve the problem.
3.2.2. Temporary structures
Considering only one branch of the
Prolog execution, the garbage are the
temporarily used structures. The analysis of
the sequential Prolog systems and
applications shows that most structures are
alive only for a short period. They are used
for (see Fig. 4.):
Controlling the search by
the unification successes
and fails. The structures
are used to make a call
enabling the selection of
only some of the available
predicates.
Computing a sub-result. The situation is opposite to other languages where explicit
storage is allowed. A C programmer can allocate and free memory areas, a Prolog
programmer cannot. All structures used to compute a partial solution remains in the
memory.
The sequential Prolog systems have problems 'only' with the reclamation of the temporary
structures, the first type of the garbage is reclaimed at backtracking while the third type occurs only in
distributed systems.
3.2.3. Garbage creation in distributed systems
In a distributed system the processing elements are communicating with each other sharing the
work. The data are stored in the local memories of the PEs. A PE uses mainly the data placed into its
local memory and sometimes sends request for data stored in the memory of another PE. The ratio of
the local and remote data accesses depends on the system. A remote access has a great communication
overhead while a local access is possible only if the needed data is placed locally. A balance between
the two types of the accesses should be reached in a distributed system. (LOGFLOW is a very special
distributed system where the PEs use only local data. This is possible due to the careful organization
of the system.)
When a PE executes a piece of work, the needed data from other PEs are transferred to the PE
(i.e. data are copied into its local memory). The PE executes a code fragment (creating the result data
in the local memory). Finally it might send the result to another PE (sending the result data).
Generally the created structures are no more used by the PE, they are garbage, see Fig. 5. In the
LOGFLOW system the pieces of work are transferred by tokens among the PEs. A token contains the
entry point of the code (each PE has the whole program code) and all data needed for the processing.
searching for the solution
computing
partial
solution
many structures are created
searching further
only the sub-result
(e.g. a number) is used
Fig. 4. Creation of temporary structures
Query + Data
Answer
result data
Query
execution
data
data
result
PE 1
PE 2
Data of the query
and result are
garbages at step
Fig. 5. Garbage from data distribution
Garbage Collection 5
3.3. The types of the heap consumption of 3DPAM
3.3.1. Structure creation
Predicate call and unify
The most common data creation in a
Prolog program is the explicit data generation by
the program. The structures defined in the head
and the body of a clause are stored in the heap
when the clause is executed, see Fig. 6. This is the
only one possible method for the user to create
data structures in the Prolog application. (There is
no dynamic data creation in LOGFLOW yet.)
Data transfer in a distributed system
The data transferred from a PE to another PE is copied into the heap of the target PE. The data
are transferred when a subgoal is sent to a remote PE or the solutions of the subgoal are sent back to
the caller. Since the 3DPAM executes fine-grained computation and all the needed data are
transferred with the subgoal requests, the data transfer and heap consumption in the LOGFLOW
system are very high.
Results from a sequential Prolog engine
The 3DPAM engine is based on a fine-grained computation model. To control the granularity
the LOGFLOW system contains conventional Warren Abstract Machines (WAM), too. A subgoal
given to a WAM is executed sequentially on the local PE. The solutions for the subgoal are sent back
to the caller engine, see Fig. 7. They are stored in tokens as other solutions performed by the
distributed engines. The data structures contained in these tokens are copied into the heap. The
temporary structures created by the WAM engine are stored in the WAM memory and the garbage
collection is the task of the sequential engine.
3.3.2. Structure copying
Structures may be shared by several environments. In LOGFLOW structures are copied when a
variable in their arguments has been bound or a reference item in their arguments has been changed
and there is a possibility that the structure is referenced from more than one environment. Obviously,
it can occur when the structure contains unbound variables. Therefore, in LOGFLOW the structure
copying is optimized in such a way that they are copied only if they contain unbound variables.
OR branches in the program
Each time a token is processed by an OR-node, the structures are copied (if they contain
unbound variables). For each solution there is a new copy of the structures.
Fig. 6. Explicit structure creation
Heap
Query
Answer
Execution
WAM
engine
Results
3DPAM
engine
Fig. 7. Structures from the sequential engine
Garbage Collection 6
Back-unification
At back-unification each token represents a
different solution (see Fig. 8.) thus, structures are copied
and the variables are instantiated into them unless they
are ground already.
The hybrid binding environment scheme (see
Section 4.1.2.) avoids the structure copying therefore, the
garbage collector has to concentrate on the first type of
structures.
3.3.3. Tests and results
The 8-queens and the Hamiltonian-path-search programs were executed. The results clearly
show that most heap is consumed by the data transfer among the processors, see Fig. 9. (The larger
problems result in higher ratio!) The consequence is that the garbage collector should concentrate on
these types of structure creation first.
3.4. Analysis of the data exchange among the processing elements
As the previous section showed, most garbage is created by the data distribution. Two different
types of tokens are in the system: query tokens representing a piece of work and answer tokens
representing solutions. In this section the ratio of the data transfer of the two kinds of tokens is
examined.
Query tokens
A query (DO) token contains a code address in the program code - representing a subgoal - and
all data needed for the execution on a remote PE which has no access to the memory of other PEs.
Answer tokens
An answer token (SUCC or FAIL) contains a return address to the caller - the caller PE will
execute the program from this point - and all data representing the found solution. (One SUCC token
represents one solution, the FAIL token terminates the stream of the solutions and contains no data.)
Tests and results
In order to know whether one kind of the tokens dominates in transferring data structures, we
examined the programs executed in the previous test. The results (see Fig. 10.) show that there are no
significant differences between the amount of data transferred by DO tokens and SUCC tokens.
The processors 'near' to the start processor (which executes the initial query) have more data
transferred by answer tokens than by query tokens while this is reversed on the 'far' processors. The
reason is that the 15-transputer architecture was too large for these problems and the execution was
not well balanced.
functor / arity
unbound variable
X = 5
5
X = 'apple'
FUNC
Process 1 Process 2
Fig. 8. The problem of structure sharing
The heap consumption
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Processor
Number of cells
Heap create
Heap copy
Fig. 9. The heap consumption by the 8-queens program on 15 processors
Garbage Collection 7
4. Offered garbage collection techniques
4.1. The aspects of an appropriate garbage collection method
The main goal of implementing a garbage collecting method is to get a fast and effective
collector. It should be as fast as possible and it should collect as much garbage as possible. However,
these two efforts are opposite to each other. Other systems generally use hybrid methods, providing a
fast (but 'weak') collector for general use and a massive and very effective one for clean up the memory
less frequently.
4.1.1. Implementation on transputers and workstation clusters
The transputers are relatively simple from the point of view of garbage collection. They do not
use caches or virtual memories. Therefore, any place in the memory can be accessed in the same time.
A workstation uses both cache and virtual memory. This increases its performance and expands its
capacity.
The processors (and the UNIX operating system) handle processes, i.e. the system consists of
processes working parallel (with time slicing) on the processor. The processes can share memory areas
and by the use of semaphores the data access violations can be avoided. Hence, the garbage collector
can be a separated process working parallel with the engines and it can access the data of the 3DPAM
from outside. Of course, it can be only a procedure called by the engine suspending the program
execution while the collection is working.
4.1.2. Avoiding structure copying: hybrid binding environment
The hybrid binding environment stores the bound variables of the structures in a separated
(hash) table. A structure containing an unbound variable can be shared among many threads of the
Prolog search. When a thread bounds the variable, its value is stored in the hash table and it has
exclusive access to it. Therefore, the original structure needs not to be copied.
The use of the hybrid binding environment implies, that no structures (and garbage) are
created on the heap by copying. As the test showed, it does not decrease the amount of the garbage
significantly.
The incoming DO and SUCC tokens
0
1000
2000
3000
4000
5000
6000
7000
8000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Processor
Number of cells
Received by DO
Received by SUCC
Fig. 10. The amount of transferred structures delivered by the query and the answer tokens.
8-queens program on 15 processors
Garbage Collection 8
4.1.3. Comments on marking-and-compacting garbage collection methods
The marking-and-compacting methods perform the collection of living objects in several
cycles. First, all accessible objects are marked, then in two or three cycles they are moved into a
continuous area and their references are modified accordingly.
In the LOGFLOW system, because of the many tokens (which are stored in queues before
processing and in context tables after that) the number of references, pointing into the heap from
outside, is very large. Therefore, the update of these references would take very much time with these
methods because at the movement of a structure, its references should be found and modified. The
copying methods update the references when they are accessed therefore, it is more efficient than the
marking-and-compacting methods.
4.2. Segmenting the heap by the use of pages
One idea for supporting the garbage collection is the following: the large amount of tokens
should be grouped and the memory should be segmented in order to separate the memory areas used
by the groups of tokens. The collector can examine a separated area and a group of tokens without
changing anything in other groups and memory areas. This enables a concurrent way for executing
the Prolog engine and the garbage collector. While the engine processes a token belonging to a group
the collector can examine the areas of another token group.
Because of the large amount of tokens the root set of the live objects is huge. Grouping the
tokens a marking collector can start the search with a much smaller root set.
The following subsections describe, how the tokens can be grouped, how the memory can be
segmented and how a garbage collection based on this segmentation can be implemented.
4.2.1. Token families
The idea of separating the tokens into groups is the following: the tokens belonging to the same
branch of the Prolog Search Graph are the members of a family. Since the collector is a local process,
the tokens in a given processing element will be considered only. The definition of the family from the
point of view of the implementation: the incoming DO token representing a sub-query 'founds' the
family. Any token which is the successor of that token and is processed on the same PE, belongs to the
family, see Fig. 11.
The handling of the families should be easily implemented but this definition includes a
difficulty. When a DO token (belonging to family A, and created on PE X) is transferred to other PE,
it founds a family (family B on PE Y). If a successor DO token from this family is transferred to the
PE X, it should be the member of the family A (because it is a successor of the founder of family A).
In order to implement the family handling easily, this token will found a new family on PE X.
Receive
a DO
token
Family A
DO
DO
SUCC
FAIL
DO
...
Family
founded
Family A
Family
growes
DO
DO
SUCC
FAIL
DO
DO
Fig. 11. The token family
Garbage Collection 9
The number of token families during execution
In order to know how many families are founded during the execution and how many of them
exist in one time, we examined the programs executed in the previous test. The results (see Fig. 12.)
show that some tens of families are founded and the maximum number of them is low. However, the
ratio is not so good generally as the chart shows. When the most work is executed by WAM tokens,
the ratio is very low. The number of the families is much lower in this case (the Search Graph is
mostly explored by the sequential engines). The worst result was the execution of the Hamiltonian
Path Search program on 2 processors. Only one family was founded on the first processor (founded by
the main query). The reason was that the program was little and it was executed mostly by the WAM.
4.2.2. Memory pages
The token family concept provides an abstract separation of groups of structures. In order to
separate them physically, the heap is segmented by pages. A page is used exclusively by a family but a
family can use more than one page. This implies that heap memory areas used by different families
are separated from each other, see Fig. 13.
The usage of a page is
not interesting from the point
of view of the segmentation, it
can be designed appropriately
for the selected collection
method. For the sake of
simplicity let us assume that
the usage is the same as the
original usage of the heap: the
structures created during the
process of different tokens (of
the same family) are stored
mixed.
The pages used by a family can be considered as a logically continuous heap separated from
other heaps, see Fig. 13. The boundaries of the pages should be handled only on implementation level.
A heap manager can solve this problem.
4.2.3. Heap manager for handling paged memory
The tasks of a heap manager are:
Providing new pages for the token families.
Handling structure writing and reading on page boundaries.
The engine can see a continuous heap area, it reads and writes the structures with the help of
manager functions (or macros to decrease the overhead). The heap manager maintains a list of free
pages, it provides new free memory from these pages and the unused pages are linked to this list.
Token families
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Processor
Number of
families
Max. number of families Total number of families
Fig. 12. Token families during the execution of the 8-queens program
Heap
Family A
Family B
Page 0
Page N
Fig. 13. The pages of the memory and the logically contiguous
memory areas of the families
The simplest implementation of the heap manager is a macro or function package hiding the
heap from the engine. The read and write functions can detect the special events in heap handling and
can call the heap management procedures if it is needed. The special events are: the page boundary
fault (when a structure is sliced and stored in two non-contiguous pages) and the running out of
memory. At the first event the macros can hide this slicing from the engine.
The second event is more complex. The simple case is when there is free page available. The
manager can give this free page to the corresponding family (whose member is processed in that
moment). The main problem is the full use of the memory, when the heap manager cannot provide
any free pages. In this moment, the engine should be stopped while the garbage collector frees at least
one page by compacting the used structures of other families (not that one processed by the engine!).
However, this is not enough yet. It is possible, that no pages can be reclaimed from other
families than the one processed by the engine. A possible solution is that the collector is allowed to
reclaim the garbage of that family in this case, but it has a restriction. During the processing of a
token the engine uses internal data structures (temporary variables, copies of the current
environment). The references stored in these structures cannot be explored by the collector. The
engine should be able to restore the state of the system to the point of the start of the processing of the
last token. It should also be able to restart the interrupted processing. If it has this ability, the heap
manager can:
1. force the engine to restore the system state to the point of the beginning of the last
token processing,
2. force the collector to reclaim the garbage of the currently processed family,
3. force the engine to restart the last token processing.
The implementation of this ability of the engine can be seen later in this section.
A partial garbage collection based on the family concept
A family is founded by an incoming query token. The query is executed (while the family
grows) and the solutions are searched for. When all possible solution for a subquery is found, the
subgraph of the subquery is fully explored. That means, that no tokens belonging to this subgraph
exist. In this moment the query token also disappears from the system (it is not stored or transferred
anymore). However, the answer tokens contain information about the solutions and can refer to
structures stored on the heap.
In the special case, when all solutions for the incoming query are found and they are
transferred back to the processing element which sent the query, the family 'die out' on the local
processing element. The answer tokens are transferred to other processing element with all referenced
data. The whole list of memory pages used by this family is not referenced. That means, that in the
moment when the last solution is sent back to the caller, the memory used by this family can be
considered as a free memory area and the heap manager can give these pages to other families.
In the implementation, this important event can be easily detected. Because of the 3DPAM
system and the concept of token streams, the stream of the solutions belonging to the same query
(each of them is stored in a SUCC token) is terminated by a FAIL token. (FAIL: if there is no SUCC
token before the FAIL, that means, that no solution was found for a query.) When a FAIL token is sent
from the processing element, the corresponding query is answered and the family die out.
Explicitly, when a FAIL token is sent out by the Output Manager it has to report this event to
the Heap Manager (call a function) which frees the memory used by the corresponding family (links
the used pages into the list of the free pages).
The maximum number of the existing families in one time can be significantly lower than the
number of the founded families during the whole execution, see Fig. 12. Therefore, this partial
collection can be very efficient in most cases (there may be programs and execution conditions, when
most founded families survive long).
The modification of the engine
As it was pointed out above, the engine should be able to restore the system state to the point of
the start of the processing of the last token and to restart the interrupted processing. The
implementation of this ability would be difficult however, modified conditions can be given:
The engine should be able to restore the state of the system to the point, from where the
engine can continue its work after garbage collection.
The restoration should include the following tasks:
a) the heap pointer of the current family should be restored
b) the context tables should be restored
c) the token queues should be restored (?)
a) The heap pointer can be stored in a variable at the start of a new code block. If new pages
were allocated during the process, they should be linked back to the list of free pages. The identifier of
the last page used at the start can also be stored in a variable. The allocated new pages are linked as a
list to this page therefore, their reclamation is easy.
b) The process of the DO and the SUCC token should be examined here. The FAIL tokens do
not cause heap consuming therefore, the forced garbage collection cannot occur at the processing of a
FAIL token. Records of the context tables can be removed only at the process of FAIL tokens
therefore, the restoration of a removed record is not needed.
A DO token can be received at an OR-branch (call of a predicate with multiple clauses), at a
UNIT node (fact call) or a UNIFY node (call of a single predicate). When the token is processed at a
UNIFY or UNIT node, only the unify_context_table and the heap can be modified. The restoration of
the heap is solved at a). The unify_context_table can only be expanded: a new record is filled with the
information about the incoming DO token. Only a pointer to the table should be stored at the start of
the processing and at the (forced) garbage collection the table can be restored by linking the allocated
record back to the free records (see Section 2.4.4.).
When the DO token is processed at an OR node, two phases of processing should be examined.
First, the token is copied because the two possible searching ways are explored parallel. The OR
context table is expanded with a new record. The copy of the DO token is placed into one of the token
queues. If it is placed into the Remote Token Queue and the Output Manager sends it to another PE,
the restoration is impossible. In the older implementation of the LOGFLOW system, the structures
containing unbound variables are copied (these structures cannot be shared by the two DO tokens).
The hybrid binding environment avoids the copying therefore the heap is not expanded here (i.e.
forced garbage collection does not occur) The second phase is the process of the DO token (the engine
continues the search on the left branch of the Search Graph). This the same case as in the previous
paragraph.
The two phase should be handled separately at restoration. When the process is in the second
phase, the system should be restored to the point of the start of the second phase (the other DO token
placed into a queue remains there, only the process of the first DO token is suspended).
The SUCC tokens can arrive at OR, UNIFY, AND or CUT nodes. The records of the or and the
unify context tables are not modified but the and_context_table can be expanded (theoretically at the
process of SUB tokens but these kinds of tokens are virtual ones and therefore, the process of a SUB
token can cause modification in the AND context table). The CUT node can store the token
temporarily (suspend) but at this point the forced garbage collection does not occur (the heap is not
consumed).
c) The restoration of the token queues is hard (an element should be inserted into the beginning
of the queue). The solution described above avoids the restoration of token queues.
4.2.4. Garbage collection techniques in pages of the memory
The previous subsections provided a heap management. This is a 'frame' to give the possibility
to design a garbage collection method which works parallel with the engine. One could also
implement a run-time system based on this frame but our goal does not include this property.
Notice the situation at this point: a garbage collector can reclaim the garbage in a list of
memory pages which is accessed exclusively by the collector. All references into this memory area are
local therefore a uniprocessor algorithm can be used.
Copying garbage collection
The original copying collection method (see [2][3][4]) cannot be used for lists of memory pages
but it can be slightly modified. Consider, the L list of pages should be collected. The F list of free
pages is used by the collector (leaving some pages for the engine). Let us say that M free pages are
available for the collection at its start while N pages should be examined. The steps of the algorithms
are the followings, see Fig. 14.:
1. The original copying method is used but only the living objects of the last M pages of
the L list are moved into the M free pages. The objects in other pages are only
marked and the references into moved structures are updated.
2. The copied M pages are handled now as free pages. At this time, some part of the
originally reserved M free pages and the new M free pages can be used by the
collector (or it can give some of them to the engine). The copying method is used
again. As many pages are copied into the free area as many surely fit. Since all the
living objects are already marked by the 1st step, the number of pages, which can be
copied, can be counted easily. (Notice, that the references of previously copied
structures pointing into the newly copied structures are updated, too.)
3. The second step is repeated until the whole L list of pages is collected.
Notice, that older structures (created earlier) can be placed by the collector behind younger
ones. The order of structures is mixed totally but it does not cause any problem in LOGFLOW.
The selection of the last M page in the first step is not obligatory. The lately created structures
are stored here and after collections here may be the most garbage. Therefore, the first step can free
more pages for the next steps than if the collector selected the first M pages.
"Semi"-generational garbage collection
The generational collectors (see [2][4]) divide the structures into generations. When a structure
survives long enough it is placed into an older generation. The collector examines the youngest
generation most frequently and the oldest one most rarely. The simplest type of these collectors is:
Two generation is used (young and old),
A structure surviving at least one collection is the member of the old generation.
Fig. 14. Copying garbage collection on memory pages
The previous algorithm can be modified to be 'nearly' this kind of generational garbage
collector. The main difference between our method and the original ones is that our algorithm also
parses the old generation (updating all references pointing into copied structures), just it is not copied
(the collection stops at step 3, when the young pages are copied). Therefore, our algorithm is not a
generational collector.
For space optimization, the generations are stored continuously in the pages therefore, the last
page of the older generation is the first page of the younger one (except if the last page is fulfilled with
old structures). This page is considered as 'young page' and it is collected surely.
Consider, that the first O pages of the L list (used by the family) contain the old structures
(except for the last old page). The remaining pages are collected by the previous algorithm. The
references pointing from the old pages into the young ones are updated, too. When the whole young
generation is collected, the new pages are linked behind the old ones. The collection is ready and the
whole page list is old generation.
The old generation is collected rarely. The algorithms use a fix collection ratio between the two
generations. Notice, that our "semi" solution provides a very simple and dynamic decision possibility
for the collection of the older pages (we pay for it with time). The first step of the algorithm marks the
structures of the old generation, too. After the first step, the ratio of the living old structures and the
size of the pages allocated by the old generation is known. This ratio can be the base of the decision.
This algorithm can be considered only as an optimization of the previous one rather than as a
generational collector. The space overhead is a variable which stores the page pointer separating the
two generations. The collection time is decreased because the collector can halt without the collection
of the old generation. In this case not all garbage is reclaimed however, the decision method ensures a
good efficiency.
Generational garbage collection
The old generation should not be parsed frequently in a generational collector (as it is done our
semi-generational algorithm). If the old pages are not examined, the references should be updated in
other way. The suggested solution of the implementors of such collectors is that the references from
the old generation pointing into the young generation should be stored and used as the part of the root
set of the young generation. Since the number of that references is relatively small in generally, this is
a good and efficient solution. The references can be stored in the pages of the heap and no additional
data storage should be created.
The order of the structures can be mixed only in a generation therefore, the pages of different
generations should be separated.
The implementation is more complex than the simple copying collector. The statistics show
that it is more efficient than the copying ones. However, the proposed algorithm seems to be efficient
enough for our system and therefore, the implementation of it is suggested. The measurements of the
working collector will clarify whether a better algorithm should be implemented or not.
4.3. Copying garbage collection method without memory management
The use of memory pages and the copying collector on them seems to be a very efficient
solution. However, in a relatively simple system, as the multi-transputer machines, a simpler collector
with similar efficiency can be designed.
The following solution leaves
the memory in the original state: it is
a continuous area. No pages are used
and any heap management is
avoided.
The algorithm is nearly the
same, as in the paged environment
but it is a little bit slower. It cannot
be predicted which one is the more
efficient collector for the transputer
based LOGFLOW, only
measurements on the
implementations can show that.
Fig. 15. Logically segmented memory
The original copying garbage collection method (see [2][3][4]) provides a very fast collection
but its main disadvantage is that the half of the memory is reserved for the algorithm. In order to
decrease the size of the reserved area, a slower kind of copying algorithm is proposed here.
Let us consider that the heap is segmented, it consists of N contiguous segments. The last
segment is reserved by the collector while the others are used by the engine. That is, N-1/N part of the
heap is usable continuously, see Fig. 15.
When the N-1 segments are full of data the engine stops and the garbage collector is invoked.
The steps of the collector are as follows, see Fig. 16.:
1. The original copying method is used but only the living objects of the 1st segment are
moved into the Nth segment. The addresses in the references are set as if the objects
were moved within the 1st segment (see the second step). The objects in other
segments (2..N-1) are only marked.
2. The Nth segment is copied into the 1st segment. That is, the objects of the 1st
segment remain in that segment, only their positions change.
3. At this time, some part of the 1st segment and the whole Nth segment can be used by
the collector. The copying method is used again. As many segments are copied into
the free area as many surely fit into it. Since all the living objects are already marked
by the 1st step, the number of segments, which can be copied, can be counted easily.
4. The Nth segment is copied behind the living objects.
5. The third and fourth steps are repeated until the whole heap is collected.
Before GC
1
1
1
1
1
N-1
N-1
N-1
N-1
N-1
N
N
N
N
N
Step 1 Step 2
Step 3 Step 4
Full of data
Free area
Free areaFree area
Free area
Free area
Copied area
Copied area
Marked area
Marked area Marked area
Marked area
Living objects
Living objects
Living objects
Living objects
Living objects
Fig. 16. The modified copying garbage collection algorithm
The back-copy of the Nth segment is needed to provide a continuous area for the engine. Thus,
no heap management functions or macros are needed.
The algorithm seems to be much slower than the original method but consider the followings.
The statistics show that more than 90 percent of objects are not accessible when the collector is
invoked. Consider that the heap consists of 10 segments (one of them is reserved for the collector and
9/10 of the heap is usable). The first two steps collect the living objects of the 1st segment. At that
time all living objects are marked and according to the statistics, the summarized size of the living
objects in the remaining 8 (2nd..9th) segments is less than the size of the 10th segment. Therefore, the
8 segment can be copied in the 3rd step and the garbage collection terminates after the 4th step. That
is, the algorithm can be expected to be only twice slower than the original.
An optimization can be reached in the second and more cycles by marking the structures (and
root set elements) which contain references into a non-copied area. In the second cycle only these
structures should be examined avoiding the exploration of totally moved structures.
The main advantage of this solution is the efficiency: it provides a fast collector, all garbage is
thrown away and the continuous heap can be used by the engine without any overhead.
A disadvantage is that the engine is stopped during the collection. This does not seem to be a
big problem in our system because there is no run-time requirement and the system is running on
multi-computers therefore, an engine can be stopped for a while. Another disadvantage is the handle
of the heap as a whole memory structure. If the system uses virtual memory or cache, the access of the
relating structures may be very costly (they can be stored in different areas of the heap). The
transputer machines do not have virtual memory or cache therefore, this method can be used there.
5. Conclusion
The paper described an analysis of the LOGFLOW system and proposed some garbage
collection algorithm. The algorithm described in Section 4.3. is suggested to implement on the
transputer-based LOGFLOW system. It is a simple collector but it seems to be efficient enough in the
environment of the transputers. The more complex algorithm described in Section 4.2.4. is suggested
to implement both for the transputer-based LOGFLOW and the workstation cluster based WS-
LOGFLOW systems. The measurements of the collectors will show which one has the better
performance in the LOGFLOW system.
References
[1] Zs. Németh, P. Kacsuk, Sz. Ferenczi, G. Dózsa:
Technical Description of Implementing the 3DPAM on a Multitransputer System
Technical Report
[2] N. Podhorszki: Garbage Collection Techniques
PHARE report
[3] J. Cohen: Garbage Collection of Linked Data Structures
Computing Surveys, Vol. 13, No. 3, September 1981.
[4] P.R. Wilson: Uniprocessor Garbage Collection Techniques
Proc. of the 1992 Intl. Workshop on Memory Management. Springer Verlag Lecture Notes in
Computer Science series.
[5] G. Dózsa: A 3DPAM új típusú AND context tábla kezelése
Technical Report
[6] P. Kacsuk: Execution of Prolog on Massively Parallel Distributed Systems
SERC Reasearch Grant Report
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Machine) is the most efficient sequential interpreter of Prolog programs. However, it gives a constraint of the garbage collection: the temporal order of the objects on the heap must be kept in order to be able to undo the changes on backtracking. Nowadays the actual requirements on a garbage collector include the incremental feature and the usability in virtual memory systems. W.J.Older and J.A.Rummel proposed in [Old92]an incremental garbage whose short collection phases are built into the code of the compiled Prolog program (WAM code). These short phases use a copying algorithm despite the fact that copying algorithm is not a sliding compactor. A choicepoint stack is used in WAM in order to permit bactracking. A choicepoint holds information required to return the computation to the state just before a non-deterministic call. For example, it stores pointers which show the end of the heap and environment stack before the call. The deleting of the unnecessary data (because of the fail...
Execution of Prolog on Massively Parallel Distributed Systems SERC Reasearch Grant Report
  • P Kacsuk
P. Kacsuk: Execution of Prolog on Massively Parallel Distributed Systems SERC Reasearch Grant Report
A 3DPAM új típusú AND context tábla kezelése Technical Report
  • G Dózsa
G. Dózsa: A 3DPAM új típusú AND context tábla kezelése Technical Report
Garbage Collection of Linked Data Structures Computing Surveys
  • J Cohen
J. Cohen: Garbage Collection of Linked Data Structures Computing Surveys, Vol. 13, No. 3, September 1981.