ArticlePDF Available

Garbage Collection in LOGFLOW

March 1998

March 1998

Authors:

Oak Ridge National Laboratory

Machines, 3DPAM) execute the Prolog program parallel with fine-grain execution. Coarse-grain pieces of work are executed by traditional sequential WAM (Warren Abstract) machines. In the execution of logic programming languages the memory consumption speed is very high because of a storage cell of the memory cannot be reused when it is filled once. The memory is fulfilled very shortly but most of the allocated storage cells are 'garbage' i.e., they are not used by the system. The 3DPAM engine does not use any space reclamation technique. Our goal is to select an appropriate garbage collection method for it to support the execution of larger programs. In order to find an efficient one an analysis of the system is presented in this paper. After the analysis some algorithm is described and the most appropriate one is suggested for implementation. This paper is the continuation of the paper [2], which gives an overview of the existing garbage collection methods. In Section 2. the data struc...

The amount of transferred structures delivered by the query and the answer tokens. 8-queens program on 15 processors

…

The modified copying garbage collection algorithm

…

Figures - uploaded by Norbert Podhorszki

Content may be subject to copyright.

Content uploaded by Norbert Podhorszki

Content may be subject to copyright.

Garbage Collection in LOGFLOW

Norbert Podhorszki Peter Kacsuk

KFKI Research Institute for Measurement and Computing Techniques of the

Hungarian Academy of Sciences

H-1525 Budapest, P.O.Box 49. Hungary

{pnorbert, kacsuk}@sunserv.kfki.hu

1. Introduction

The LOGFLOW is a distributed Prolog system running on multi-transputer machines and

workstation clusters. The engines (Distributed Data Driven Prolog Abstract Machines, 3DPAM)

execute the Prolog program parallel with fine-grain execution. Coarse-grain pieces of work are

executed by traditional sequential WAM (Warren Abstract) machines.

In the execution of logic programming languages the memory consumption speed is very high

because of a storage cell of the memory cannot be reused when it is filled once. The memory is

fulfilled very shortly but most of the allocated storage cells are 'garbage' i.e., they are not used by the

system.

The 3DPAM engine does not use any space reclamation technique. Our goal is to select an

appropriate garbage collection method for it to support the execution of larger programs. In order to

find an efficient one an analysis of the system is presented in this paper. After the analysis some

algorithm is described and the most appropriate one is suggested for implementation.

This paper is the continuation of the paper [2], which gives an overview of the existing garbage

collection methods. In Section 2. the data structures of the 3DPAM are described briefly. In section 3.

the system is analysed. It describes how the program structures are created in the memory and how

they will be garbage. The Section 4. describes collection methods. The conclusion suggests the

appropriate methods for both implementations of the LOGFLOW system.

2. The memory structure of the 3DPAM

In 3DPAM the following data areas are used: Code, Context tables, Token Queues, Heap,

Token tables. The Code area is simply an array for the 3DPAM code. The Heap is used

conventionally, except its first part holds the ground structures for the sake of optimization. The other

data areas are new and specific to 3DPAM and are explained in the following sections. The detailed

description of the structure and the handling of these data areas can be found in [1].

2.1. The heap

The heap is a continuous data area in the

memory. It consists of cells (with the size of the integer

data type). A structure with N arguments occupies N+1

cells in the heap (the descriptor

[FUNC | Functor ID | Arity] cell and a reference cell for

each argument), see Fig. 1. The heap is consumed

continuously by the engine therefore, the structures of

different Prolog search paths are mixed.

The garbage collector has to concentrate on the heap because other data areas are cleared

automatically.

The work presented in this paper was done in the framework of Hungarian-Portuguese joint project

'PAR-PROLOPPE' registered under No. H-9305-02/1095

Heap

FUNC

ATOM

STR

functor / arity

Address of the

substructure

A structure

Fig. 1. The heap of the 3DPAM

Garbage Collection 2

2.2. Token queues

The execution is governed by tokens. Each token holds at least the following information:

identifier, target PE and starting address in the code. Depending on the type, some tokens hold more

information. There are three groups of tokens:

1. request token: DO (<environment>,<arguments>)

2. reply token: SUCC (<environment>), FAIL, CFAIL

3. request/reply token: SUB (<environment>) (Only virtually, they never appear in the

system)

Identifiers of tokens not currently processed, are placed into a token queue. There are two types

of token queues: Local Token Queue and Remote Token Queue. Tokens in the LTQ will be processed

by the local engine, whereas tokens in the RTQ will be transmitted to an other PE. The engine takes

the tokens (only identifiers, token data are taken from one of the token tables) from the LTQ. The

resulting tokens (if any) can be placed into the LTQ, or the RTQ, or can be processed by the engine.

2.3. Token tables

Tokens are kept in token tables. There are three kinds of token tables: Do, Fail and Sub/Succ.

The token tables are arrays of records which contain a pointer and a token slot. The pointer is

used to maintain the list of free token slots (see Section 2.4.4. where its usage is described). The token

slot holds the relevant information of a given token. Its structure equals to that of a token of the given

type.

When a token is processed, its record is freed therefore, the garbage

collector does not find any garbage in these tables. The examination of these

tables during the collection is important because the stored tokens contain

references to the data structures of the heap.

This fact is true for the next data areas, too.

2.4. Context tables

Context tables are used to distinguish different token streams. Their usage is fundamental to

realize pipeline AND parallelism. Every operator type (Unify, And, Or, Cut) has its own context table.

The theoretical basis is explained in [6] and an example can also be found there.

2.4.1. The Unify and Or context tables

The Unify and Or context tables are similar. The tables are arrays of records which contain a

pointer and a context slot. The pointers are used to form a chain of free context slots (similar to token

tables) and the context slots hold the relevant information:

Every time a token has arrived to a Unify or Or node, its information content is saved, and a

token with new identifier (new colour) will be emitted. When a corresponding token (representing a

partial solution) of this colour has arrived at the appropriate point, its original colour will be restored,

and the saved information will be updated.

2.4.2. The And context table

The And context table differs from the previous ones since, it must handle multiple overlapped

SUB token streams. The And context table is implemented in a different way. The details of current

implementation can be found in [5].

The And context table consists of records which contain an array of counters and a state field.

The context table is indexed with the colour of the incoming tokens, and the counters are indexed with

the identifier of the And node within a Unify/And node.

2.4.3. The Cut context table

The Cut context table consists of a pointer to chain the free items together, and a field for token

identifier. It is used to implement the 'cut' in the distributed environment, where the system looks for

all solutions while the 'cut' ignores all solution except for the first (first in the term of the sequential

execution!).

Garbage Collection 3

2.4.4. The dynamic handling of context and token tables

Token tables and Context tables are handled in

such a way that, although, the whole arrays are statically

allocated data objects, the elements of these arrays can be

allocated and freed dynamically. This feature is obtained

by means of pointers which are in fact array indices.

Whenever a new item is to be allocated, the head of the

free list will be the next item pointed by the current head

and thus, the current item is removed from the list.

Adding an item to the free list means setting the pointer of

the given element to the current head of the list, and the

given item will be the new head, see Fig. 2. This

elaboration combines the flexible and efficient use of

pointers with the benefits of arrays, i.e., no run-time

memory allocations and error handling are needed.

3. Analysis of the 3DPAM heap usage

3.1. 3DPAM specialities

The LOGFLOW system is a special distributed Prolog system: there are no remote references

used. The processing elements (PE) store all the needed data in the local memory. The new pieces of

work are transferred in tokens among the PEs, including all data needed for the processing. Therefore,

the garbage collector of a PE can be a local process which has to handle only the local states of the PE.

Generally, garbage collectors use complex algorithms in a distributed system handling remote

references. Notice, it does not conclude from this fact that a simpler algorithm of a sequential Prolog

system can be used in LOGFLOW. The types of the garbage are different from a sequential system,

see the section 3.2.

The other speciality is that only structures (including lists) are stored in the main memory area

(heap). Variables are stored in environments (and in special tables in the hybrid binding environment

scheme).

The third speciality comes from the distributed and fine-grained computation. Since many

tokens (pieces of work or results) are stored on a PE in one time, very large amount of heap references

is stored in several information tables of the 3DPAM engine. It means that the root set of the live

objects is very large (see the root set notion of the marking-and-reclaiming garbage collectors).

3.2. The types of garbage creation

3.2.1. Prolog speciality

During the sequential execution a branch

of the Search Graph of the Prolog program is

explored looking for a solution. If the program

fails, another branch is explored. The created

structures on the first branch can be reclaimed.

The backtracking feature of the sequential

Prolog interpreters solves this problem,

however, in a parallel system there is no

backtracking, all branches are explored in the

same time, see Fig. 3.

info

nil

info

nil

info

nil

info

nil

info

nil

info

List head

Fig. 2. Dynamically allocated static table

No solutions

Many structures,

no solutions

Solution found

Query

Fig. 3. Garbage from failed search

Garbage Collection 4

A little help in a parallel system is the structure sharing among branches. A structure is used by

several branches of the Prolog program therefore, they live for long during the execution. The

handling of these structures is hard, see the hybrid binding environment scheme. It is important to

take into account that the sharing does not solve the problem.

3.2.2. Temporary structures

Considering only one branch of the

Prolog execution, the garbage are the

temporarily used structures. The analysis of

the sequential Prolog systems and

applications shows that most structures are

alive only for a short period. They are used

for (see Fig. 4.):

♦

Controlling the search by

the unification successes

and fails. The structures

are used to make a call

enabling the selection of

only some of the available

predicates.

♦

Computing a sub-result. The situation is opposite to other languages where explicit

storage is allowed. A C programmer can allocate and free memory areas, a Prolog

programmer cannot. All structures used to compute a partial solution remains in the

memory.

The sequential Prolog systems have problems 'only' with the reclamation of the temporary

structures, the first type of the garbage is reclaimed at backtracking while the third type occurs only in

distributed systems.

3.2.3. Garbage creation in distributed systems

In a distributed system the processing elements are communicating with each other sharing the

work. The data are stored in the local memories of the PEs. A PE uses mainly the data placed into its

local memory and sometimes sends request for data stored in the memory of another PE. The ratio of

the local and remote data accesses depends on the system. A remote access has a great communication

overhead while a local access is possible only if the needed data is placed locally. A balance between

the two types of the accesses should be reached in a distributed system. (LOGFLOW is a very special

distributed system where the PEs use only local data. This is possible due to the careful organization

of the system.)

When a PE executes a piece of work, the needed data from other PEs are transferred to the PE

(i.e. data are copied into its local memory). The PE executes a code fragment (creating the result data

in the local memory). Finally it might send the result to another PE (sending the result data).

Generally the created structures are no more used by the PE, they are garbage, see Fig. 5. In the

LOGFLOW system the pieces of work are transferred by tokens among the PEs. A token contains the

entry point of the code (each PE has the whole program code) and all data needed for the processing.

searching for the solution

computing

partial

solution

many structures are created

searching further

only the sub-result

(e.g. a number) is used

Fig. 4. Creation of temporary structures

Query + Data

Answer

result data

Query

execution

data

result

PE 1

PE 2

Data of the query

and result are

garbages at step

Fig. 5. Garbage from data distribution

Garbage Collection 5

3.3. The types of the heap consumption of 3DPAM

3.3.1. Structure creation

Predicate call and unify

The most common data creation in a

Prolog program is the explicit data generation by

the program. The structures defined in the head

and the body of a clause are stored in the heap

when the clause is executed, see Fig. 6. This is the

only one possible method for the user to create

data structures in the Prolog application. (There is

no dynamic data creation in LOGFLOW yet.)

Data transfer in a distributed system

The data transferred from a PE to another PE is copied into the heap of the target PE. The data

are transferred when a subgoal is sent to a remote PE or the solutions of the subgoal are sent back to

the caller. Since the 3DPAM executes fine-grained computation and all the needed data are

transferred with the subgoal requests, the data transfer and heap consumption in the LOGFLOW

system are very high.

Results from a sequential Prolog engine

The 3DPAM engine is based on a fine-grained computation model. To control the granularity

the LOGFLOW system contains conventional Warren Abstract Machines (WAM), too. A subgoal

given to a WAM is executed sequentially on the local PE. The solutions for the subgoal are sent back

to the caller engine, see Fig. 7. They are stored in tokens as other solutions performed by the

distributed engines. The data structures contained in these tokens are copied into the heap. The

temporary structures created by the WAM engine are stored in the WAM memory and the garbage

collection is the task of the sequential engine.

3.3.2. Structure copying

Structures may be shared by several environments. In LOGFLOW structures are copied when a

variable in their arguments has been bound or a reference item in their arguments has been changed

and there is a possibility that the structure is referenced from more than one environment. Obviously,

it can occur when the structure contains unbound variables. Therefore, in LOGFLOW the structure

copying is optimized in such a way that they are copied only if they contain unbound variables.

OR branches in the program

Each time a token is processed by an OR-node, the structures are copied (if they contain

unbound variables). For each solution there is a new copy of the structures.

Fig. 6. Explicit structure creation

Heap

Query

Answer

Execution

WAM

engine

Results

3DPAM

engine

Fig. 7. Structures from the sequential engine

Garbage Collection 6

Back-unification

At back-unification each token represents a

different solution (see Fig. 8.) thus, structures are copied

and the variables are instantiated into them unless they

are ground already.

The hybrid binding environment scheme (see

Section 4.1.2.) avoids the structure copying therefore, the

garbage collector has to concentrate on the first type of

structures.

3.3.3. Tests and results

The 8-queens and the Hamiltonian-path-search programs were executed. The results clearly

show that most heap is consumed by the data transfer among the processors, see Fig. 9. (The larger

problems result in higher ratio!) The consequence is that the garbage collector should concentrate on

these types of structure creation first.

3.4. Analysis of the data exchange among the processing elements

As the previous section showed, most garbage is created by the data distribution. Two different

types of tokens are in the system: query tokens representing a piece of work and answer tokens

representing solutions. In this section the ratio of the data transfer of the two kinds of tokens is

examined.

Query tokens

A query (DO) token contains a code address in the program code - representing a subgoal - and

all data needed for the execution on a remote PE which has no access to the memory of other PEs.

Answer tokens

An answer token (SUCC or FAIL) contains a return address to the caller - the caller PE will

execute the program from this point - and all data representing the found solution. (One SUCC token

represents one solution, the FAIL token terminates the stream of the solutions and contains no data.)

Tests and results

In order to know whether one kind of the tokens dominates in transferring data structures, we

examined the programs executed in the previous test. The results (see Fig. 10.) show that there are no

significant differences between the amount of data transferred by DO tokens and SUCC tokens.

The processors 'near' to the start processor (which executes the initial query) have more data

transferred by answer tokens than by query tokens while this is reversed on the 'far' processors. The

reason is that the 15-transputer architecture was too large for these problems and the execution was

not well balanced.

functor / arity

unbound variable

X = 5

X = 'apple'

FUNC

Process 1 Process 2

Fig. 8. The problem of structure sharing

The heap consumption

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Processor

Number of cells

Heap create

Heap copy

Fig. 9. The heap consumption by the 8-queens program on 15 processors

Garbage Collection 7

4. Offered garbage collection techniques

4.1. The aspects of an appropriate garbage collection method

The main goal of implementing a garbage collecting method is to get a fast and effective

collector. It should be as fast as possible and it should collect as much garbage as possible. However,

these two efforts are opposite to each other. Other systems generally use hybrid methods, providing a

fast (but 'weak') collector for general use and a massive and very effective one for clean up the memory

less frequently.

4.1.1. Implementation on transputers and workstation clusters

The transputers are relatively simple from the point of view of garbage collection. They do not

use caches or virtual memories. Therefore, any place in the memory can be accessed in the same time.

A workstation uses both cache and virtual memory. This increases its performance and expands its

capacity.

The processors (and the UNIX operating system) handle processes, i.e. the system consists of

processes working parallel (with time slicing) on the processor. The processes can share memory areas

and by the use of semaphores the data access violations can be avoided. Hence, the garbage collector

can be a separated process working parallel with the engines and it can access the data of the 3DPAM

from outside. Of course, it can be only a procedure called by the engine suspending the program

execution while the collection is working.

4.1.2. Avoiding structure copying: hybrid binding environment

The hybrid binding environment stores the bound variables of the structures in a separated

(hash) table. A structure containing an unbound variable can be shared among many threads of the

Prolog search. When a thread bounds the variable, its value is stored in the hash table and it has

exclusive access to it. Therefore, the original structure needs not to be copied.

The use of the hybrid binding environment implies, that no structures (and garbage) are

created on the heap by copying. As the test showed, it does not decrease the amount of the garbage

significantly.

The incoming DO and SUCC tokens

1000

2000

3000

4000

5000

6000

7000

8000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Processor

Number of cells

Received by DO

Received by SUCC

Fig. 10. The amount of transferred structures delivered by the query and the answer tokens.

8-queens program on 15 processors

Garbage Collection 8

4.1.3. Comments on marking-and-compacting garbage collection methods

The marking-and-compacting methods perform the collection of living objects in several

cycles. First, all accessible objects are marked, then in two or three cycles they are moved into a

continuous area and their references are modified accordingly.

In the LOGFLOW system, because of the many tokens (which are stored in queues before

processing and in context tables after that) the number of references, pointing into the heap from

outside, is very large. Therefore, the update of these references would take very much time with these

methods because at the movement of a structure, its references should be found and modified. The

copying methods update the references when they are accessed therefore, it is more efficient than the

marking-and-compacting methods.

4.2. Segmenting the heap by the use of pages

One idea for supporting the garbage collection is the following: the large amount of tokens

should be grouped and the memory should be segmented in order to separate the memory areas used

by the groups of tokens. The collector can examine a separated area and a group of tokens without

changing anything in other groups and memory areas. This enables a concurrent way for executing

the Prolog engine and the garbage collector. While the engine processes a token belonging to a group

the collector can examine the areas of another token group.

Because of the large amount of tokens the root set of the live objects is huge. Grouping the

tokens a marking collector can start the search with a much smaller root set.

The following subsections describe, how the tokens can be grouped, how the memory can be

segmented and how a garbage collection based on this segmentation can be implemented.

4.2.1. Token families

The idea of separating the tokens into groups is the following: the tokens belonging to the same

branch of the Prolog Search Graph are the members of a family. Since the collector is a local process,

the tokens in a given processing element will be considered only. The definition of the family from the

point of view of the implementation: the incoming DO token representing a sub-query 'founds' the

family. Any token which is the successor of that token and is processed on the same PE, belongs to the

family, see Fig. 11.

The handling of the families should be easily implemented but this definition includes a

difficulty. When a DO token (belonging to family A, and created on PE X) is transferred to other PE,

it founds a family (family B on PE Y). If a successor DO token from this family is transferred to the

PE X, it should be the member of the family A (because it is a successor of the founder of family A).

In order to implement the family handling easily, this token will found a new family on PE X.

Receive

a DO

token

Family A

SUCC

FAIL

...

Family

founded

Family A

Family

growes

SUCC

FAIL

Fig. 11. The token family

Garbage Collection 9

The number of token families during execution

In order to know how many families are founded during the execution and how many of them

exist in one time, we examined the programs executed in the previous test. The results (see Fig. 12.)

show that some tens of families are founded and the maximum number of them is low. However, the

ratio is not so good generally as the chart shows. When the most work is executed by WAM tokens,

the ratio is very low. The number of the families is much lower in this case (the Search Graph is

mostly explored by the sequential engines). The worst result was the execution of the Hamiltonian

Path Search program on 2 processors. Only one family was founded on the first processor (founded by

the main query). The reason was that the program was little and it was executed mostly by the WAM.

4.2.2. Memory pages

The token family concept provides an abstract separation of groups of structures. In order to

separate them physically, the heap is segmented by pages. A page is used exclusively by a family but a

family can use more than one page. This implies that heap memory areas used by different families

are separated from each other, see Fig. 13.

The usage of a page is

not interesting from the point

of view of the segmentation, it

can be designed appropriately

for the selected collection

method. For the sake of

simplicity let us assume that

the usage is the same as the

original usage of the heap: the

structures created during the

process of different tokens (of

the same family) are stored

mixed.

The pages used by a family can be considered as a logically continuous heap separated from

other heaps, see Fig. 13. The boundaries of the pages should be handled only on implementation level.

A heap manager can solve this problem.

4.2.3. Heap manager for handling paged memory

The tasks of a heap manager are:

♦

Providing new pages for the token families.

♦

Handling structure writing and reading on page boundaries.

The engine can see a continuous heap area, it reads and writes the structures with the help of

manager functions (or macros to decrease the overhead). The heap manager maintains a list of free

pages, it provides new free memory from these pages and the unused pages are linked to this list.

Token families

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Processor

Number of

families

Max. number of families Total number of families

Fig. 12. Token families during the execution of the 8-queens program

Heap

Family A

Family B

Page 0

Page N

Fig. 13. The pages of the memory and the logically contiguous

memory areas of the families

The simplest implementation of the heap manager is a macro or function package hiding the

heap from the engine. The read and write functions can detect the special events in heap handling and

can call the heap management procedures if it is needed. The special events are: the page boundary

fault (when a structure is sliced and stored in two non-contiguous pages) and the running out of

memory. At the first event the macros can hide this slicing from the engine.

The second event is more complex. The simple case is when there is free page available. The

manager can give this free page to the corresponding family (whose member is processed in that

moment). The main problem is the full use of the memory, when the heap manager cannot provide

any free pages. In this moment, the engine should be stopped while the garbage collector frees at least

one page by compacting the used structures of other families (not that one processed by the engine!).

However, this is not enough yet. It is possible, that no pages can be reclaimed from other

families than the one processed by the engine. A possible solution is that the collector is allowed to

reclaim the garbage of that family in this case, but it has a restriction. During the processing of a

token the engine uses internal data structures (temporary variables, copies of the current

environment). The references stored in these structures cannot be explored by the collector. The

engine should be able to restore the state of the system to the point of the start of the processing of the

last token. It should also be able to restart the interrupted processing. If it has this ability, the heap

manager can:

1. force the engine to restore the system state to the point of the beginning of the last

token processing,

2. force the collector to reclaim the garbage of the currently processed family,

3. force the engine to restart the last token processing.

The implementation of this ability of the engine can be seen later in this section.

A partial garbage collection based on the family concept

A family is founded by an incoming query token. The query is executed (while the family

grows) and the solutions are searched for. When all possible solution for a subquery is found, the

subgraph of the subquery is fully explored. That means, that no tokens belonging to this subgraph

exist. In this moment the query token also disappears from the system (it is not stored or transferred

anymore). However, the answer tokens contain information about the solutions and can refer to

structures stored on the heap.

In the special case, when all solutions for the incoming query are found and they are

transferred back to the processing element which sent the query, the family 'die out' on the local

processing element. The answer tokens are transferred to other processing element with all referenced

data. The whole list of memory pages used by this family is not referenced. That means, that in the

moment when the last solution is sent back to the caller, the memory used by this family can be

considered as a free memory area and the heap manager can give these pages to other families.

In the implementation, this important event can be easily detected. Because of the 3DPAM

system and the concept of token streams, the stream of the solutions belonging to the same query

(each of them is stored in a SUCC token) is terminated by a FAIL token. (FAIL: if there is no SUCC

token before the FAIL, that means, that no solution was found for a query.) When a FAIL token is sent

from the processing element, the corresponding query is answered and the family die out.

Explicitly, when a FAIL token is sent out by the Output Manager it has to report this event to

the Heap Manager (call a function) which frees the memory used by the corresponding family (links

the used pages into the list of the free pages).

The maximum number of the existing families in one time can be significantly lower than the

number of the founded families during the whole execution, see Fig. 12. Therefore, this partial

collection can be very efficient in most cases (there may be programs and execution conditions, when

most founded families survive long).

The modification of the engine

As it was pointed out above, the engine should be able to restore the system state to the point of

the start of the processing of the last token and to restart the interrupted processing. The

implementation of this ability would be difficult however, modified conditions can be given:

The engine should be able to restore the state of the system to the point, from where the

engine can continue its work after garbage collection.

The restoration should include the following tasks:

a) the heap pointer of the current family should be restored

b) the context tables should be restored

c) the token queues should be restored (?)

a) The heap pointer can be stored in a variable at the start of a new code block. If new pages

were allocated during the process, they should be linked back to the list of free pages. The identifier of

the last page used at the start can also be stored in a variable. The allocated new pages are linked as a

list to this page therefore, their reclamation is easy.

b) The process of the DO and the SUCC token should be examined here. The FAIL tokens do

not cause heap consuming therefore, the forced garbage collection cannot occur at the processing of a

FAIL token. Records of the context tables can be removed only at the process of FAIL tokens

therefore, the restoration of a removed record is not needed.

A DO token can be received at an OR-branch (call of a predicate with multiple clauses), at a

UNIT node (fact call) or a UNIFY node (call of a single predicate). When the token is processed at a

UNIFY or UNIT node, only the unify_context_table and the heap can be modified. The restoration of

the heap is solved at a). The unify_context_table can only be expanded: a new record is filled with the

information about the incoming DO token. Only a pointer to the table should be stored at the start of

the processing and at the (forced) garbage collection the table can be restored by linking the allocated

record back to the free records (see Section 2.4.4.).

When the DO token is processed at an OR node, two phases of processing should be examined.

First, the token is copied because the two possible searching ways are explored parallel. The OR

context table is expanded with a new record. The copy of the DO token is placed into one of the token

queues. If it is placed into the Remote Token Queue and the Output Manager sends it to another PE,

the restoration is impossible. In the older implementation of the LOGFLOW system, the structures

containing unbound variables are copied (these structures cannot be shared by the two DO tokens).

The hybrid binding environment avoids the copying therefore the heap is not expanded here (i.e.

forced garbage collection does not occur) The second phase is the process of the DO token (the engine

continues the search on the left branch of the Search Graph). This the same case as in the previous

paragraph.

The two phase should be handled separately at restoration. When the process is in the second

phase, the system should be restored to the point of the start of the second phase (the other DO token

placed into a queue remains there, only the process of the first DO token is suspended).

The SUCC tokens can arrive at OR, UNIFY, AND or CUT nodes. The records of the or and the

unify context tables are not modified but the and_context_table can be expanded (theoretically at the

process of SUB tokens but these kinds of tokens are virtual ones and therefore, the process of a SUB

token can cause modification in the AND context table). The CUT node can store the token

temporarily (suspend) but at this point the forced garbage collection does not occur (the heap is not

consumed).

c) The restoration of the token queues is hard (an element should be inserted into the beginning

of the queue). The solution described above avoids the restoration of token queues.

4.2.4. Garbage collection techniques in pages of the memory

The previous subsections provided a heap management. This is a 'frame' to give the possibility

to design a garbage collection method which works parallel with the engine. One could also

implement a run-time system based on this frame but our goal does not include this property.

Notice the situation at this point: a garbage collector can reclaim the garbage in a list of

memory pages which is accessed exclusively by the collector. All references into this memory area are

local therefore a uniprocessor algorithm can be used.

Copying garbage collection

The original copying collection method (see [2][3][4]) cannot be used for lists of memory pages

but it can be slightly modified. Consider, the L list of pages should be collected. The F list of free

pages is used by the collector (leaving some pages for the engine). Let us say that M free pages are

available for the collection at its start while N pages should be examined. The steps of the algorithms

are the followings, see Fig. 14.:

1. The original copying method is used but only the living objects of the last M pages of

the L list are moved into the M free pages. The objects in other pages are only

marked and the references into moved structures are updated.

2. The copied M pages are handled now as free pages. At this time, some part of the

originally reserved M free pages and the new M free pages can be used by the

collector (or it can give some of them to the engine). The copying method is used

again. As many pages are copied into the free area as many surely fit. Since all the

living objects are already marked by the 1st step, the number of pages, which can be

copied, can be counted easily. (Notice, that the references of previously copied

structures pointing into the newly copied structures are updated, too.)

3. The second step is repeated until the whole L list of pages is collected.

Notice, that older structures (created earlier) can be placed by the collector behind younger

ones. The order of structures is mixed totally but it does not cause any problem in LOGFLOW.

The selection of the last M page in the first step is not obligatory. The lately created structures

are stored here and after collections here may be the most garbage. Therefore, the first step can free

more pages for the next steps than if the collector selected the first M pages.

"Semi"-generational garbage collection

The generational collectors (see [2][4]) divide the structures into generations. When a structure

survives long enough it is placed into an older generation. The collector examines the youngest

generation most frequently and the oldest one most rarely. The simplest type of these collectors is:

♦

Two generation is used (young and old),

♦

A structure surviving at least one collection is the member of the old generation.

Fig. 14. Copying garbage collection on memory pages

The previous algorithm can be modified to be 'nearly' this kind of generational garbage

collector. The main difference between our method and the original ones is that our algorithm also

parses the old generation (updating all references pointing into copied structures), just it is not copied

(the collection stops at step 3, when the young pages are copied). Therefore, our algorithm is not a

generational collector.

For space optimization, the generations are stored continuously in the pages therefore, the last

page of the older generation is the first page of the younger one (except if the last page is fulfilled with

old structures). This page is considered as 'young page' and it is collected surely.

Consider, that the first O pages of the L list (used by the family) contain the old structures

(except for the last old page). The remaining pages are collected by the previous algorithm. The

references pointing from the old pages into the young ones are updated, too. When the whole young

generation is collected, the new pages are linked behind the old ones. The collection is ready and the

whole page list is old generation.

The old generation is collected rarely. The algorithms use a fix collection ratio between the two

generations. Notice, that our "semi" solution provides a very simple and dynamic decision possibility

for the collection of the older pages (we pay for it with time). The first step of the algorithm marks the

structures of the old generation, too. After the first step, the ratio of the living old structures and the

size of the pages allocated by the old generation is known. This ratio can be the base of the decision.

This algorithm can be considered only as an optimization of the previous one rather than as a

generational collector. The space overhead is a variable which stores the page pointer separating the

two generations. The collection time is decreased because the collector can halt without the collection

of the old generation. In this case not all garbage is reclaimed however, the decision method ensures a

good efficiency.

Generational garbage collection

The old generation should not be parsed frequently in a generational collector (as it is done our

semi-generational algorithm). If the old pages are not examined, the references should be updated in

other way. The suggested solution of the implementors of such collectors is that the references from

the old generation pointing into the young generation should be stored and used as the part of the root

set of the young generation. Since the number of that references is relatively small in generally, this is

a good and efficient solution. The references can be stored in the pages of the heap and no additional

data storage should be created.

The order of the structures can be mixed only in a generation therefore, the pages of different

generations should be separated.

The implementation is more complex than the simple copying collector. The statistics show

that it is more efficient than the copying ones. However, the proposed algorithm seems to be efficient

enough for our system and therefore, the implementation of it is suggested. The measurements of the

working collector will clarify whether a better algorithm should be implemented or not.

4.3. Copying garbage collection method without memory management

The use of memory pages and the copying collector on them seems to be a very efficient

solution. However, in a relatively simple system, as the multi-transputer machines, a simpler collector

with similar efficiency can be designed.

The following solution leaves

the memory in the original state: it is

a continuous area. No pages are used

and any heap management is

avoided.

The algorithm is nearly the

same, as in the paged environment

but it is a little bit slower. It cannot

be predicted which one is the more

efficient collector for the transputer

based LOGFLOW, only

measurements on the

implementations can show that.

Fig. 15. Logically segmented memory

The original copying garbage collection method (see [2][3][4]) provides a very fast collection

but its main disadvantage is that the half of the memory is reserved for the algorithm. In order to

decrease the size of the reserved area, a slower kind of copying algorithm is proposed here.

Let us consider that the heap is segmented, it consists of N contiguous segments. The last

segment is reserved by the collector while the others are used by the engine. That is, N-1/N part of the

heap is usable continuously, see Fig. 15.

When the N-1 segments are full of data the engine stops and the garbage collector is invoked.

The steps of the collector are as follows, see Fig. 16.:

1. The original copying method is used but only the living objects of the 1st segment are

moved into the Nth segment. The addresses in the references are set as if the objects

were moved within the 1st segment (see the second step). The objects in other

segments (2..N-1) are only marked.

2. The Nth segment is copied into the 1st segment. That is, the objects of the 1st

segment remain in that segment, only their positions change.

3. At this time, some part of the 1st segment and the whole Nth segment can be used by

the collector. The copying method is used again. As many segments are copied into

the free area as many surely fit into it. Since all the living objects are already marked

by the 1st step, the number of segments, which can be copied, can be counted easily.

4. The Nth segment is copied behind the living objects.

5. The third and fourth steps are repeated until the whole heap is collected.

Before GC

N-1

Step 1 Step 2

Step 3 Step 4

Full of data

Free area

Free areaFree area

Free area

Copied area

Marked area

Marked area Marked area

Marked area

Living objects

Fig. 16. The modified copying garbage collection algorithm

The back-copy of the Nth segment is needed to provide a continuous area for the engine. Thus,

no heap management functions or macros are needed.

The algorithm seems to be much slower than the original method but consider the followings.

The statistics show that more than 90 percent of objects are not accessible when the collector is

invoked. Consider that the heap consists of 10 segments (one of them is reserved for the collector and

9/10 of the heap is usable). The first two steps collect the living objects of the 1st segment. At that

time all living objects are marked and according to the statistics, the summarized size of the living

objects in the remaining 8 (2nd..9th) segments is less than the size of the 10th segment. Therefore, the

8 segment can be copied in the 3rd step and the garbage collection terminates after the 4th step. That

is, the algorithm can be expected to be only twice slower than the original.

An optimization can be reached in the second and more cycles by marking the structures (and

root set elements) which contain references into a non-copied area. In the second cycle only these

structures should be examined avoiding the exploration of totally moved structures.

The main advantage of this solution is the efficiency: it provides a fast collector, all garbage is

thrown away and the continuous heap can be used by the engine without any overhead.

A disadvantage is that the engine is stopped during the collection. This does not seem to be a

big problem in our system because there is no run-time requirement and the system is running on

multi-computers therefore, an engine can be stopped for a while. Another disadvantage is the handle

of the heap as a whole memory structure. If the system uses virtual memory or cache, the access of the

relating structures may be very costly (they can be stored in different areas of the heap). The

transputer machines do not have virtual memory or cache therefore, this method can be used there.

5. Conclusion

The paper described an analysis of the LOGFLOW system and proposed some garbage

collection algorithm. The algorithm described in Section 4.3. is suggested to implement on the

transputer-based LOGFLOW system. It is a simple collector but it seems to be efficient enough in the

environment of the transputers. The more complex algorithm described in Section 4.2.4. is suggested

to implement both for the transputer-based LOGFLOW and the workstation cluster based WS-

LOGFLOW systems. The measurements of the collectors will show which one has the better

performance in the LOGFLOW system.

References

[1] Zs. Németh, P. Kacsuk, Sz. Ferenczi, G. Dózsa:

Technical Description of Implementing the 3DPAM on a Multitransputer System

Technical Report

[2] N. Podhorszki: Garbage Collection Techniques

PHARE report

[3] J. Cohen: Garbage Collection of Linked Data Structures

Computing Surveys, Vol. 13, No. 3, September 1981.

[4] P.R. Wilson: Uniprocessor Garbage Collection Techniques

Proc. of the 1992 Intl. Workshop on Memory Management. Springer Verlag Lecture Notes in

Computer Science series.

[5] G. Dózsa: A 3DPAM új típusú AND context tábla kezelése

Technical Report

[6] P. Kacsuk: Execution of Prolog on Massively Parallel Distributed Systems

SERC Reasearch Grant Report

ResearchGate has not been able to resolve any citations for this publication.

Garbage Collection Techniques

Article

Full-text available

Mar 1998

Norbert Podhorszki

Machine) is the most efficient sequential interpreter of Prolog programs. However, it gives a constraint of the garbage collection: the temporal order of the objects on the heap must be kept in order to be able to undo the changes on backtracking. Nowadays the actual requirements on a garbage collector include the incremental feature and the usability in virtual memory systems. W.J.Older and J.A.Rummel proposed in [Old92]an incremental garbage whose short collection phases are built into the code of the compiled Prolog program (WAM code). These short phases use a copying algorithm despite the fact that copying algorithm is not a sliding compactor. A choicepoint stack is used in WAM in order to permit bactracking. A choicepoint holds information required to return the computation to the state just before a non-deterministic call. For example, it stores pointers which show the end of the heap and environment stack before the call. The deleting of the unnecessary data (because of the fail...

Execution of Prolog on Massively Parallel Distributed Systems SERC Reasearch Grant Report

P Kacsuk

P. Kacsuk: Execution of Prolog on Massively Parallel Distributed Systems SERC Reasearch Grant Report

A 3DPAM új típusú AND context tábla kezelése Technical Report

G Dózsa

G. Dózsa: A 3DPAM új típusú AND context tábla kezelése Technical Report

Garbage Collection of Linked Data Structures Computing Surveys

Oct 1981

J Cohen

J. Cohen: Garbage Collection of Linked Data Structures Computing Surveys, Vol. 13, No. 3, September 1981.

Garbage Collection in LOGFLOW

Abstract and Figures

Recommended publications

Structure data sharing method of highly parallel inference engine pie

Prolog implementations for cellular architectures

Execution Models For A Massively Parallel Prolog Implementation

Prolog Implementations on Parallel Computers.

Extending LOGFLOW with Parallel Relational Database Operations