Asian Science Citation Index is committed to provide an authoritative, trusted and significant information by the coverage of the most important and influential journals to meet the needs of the global scientific community.  
ASCI Database
308-Lasani Town,
Sargodha Road,
Faisalabad, Pakistan
Fax: +92-41-8815544
Contact Via Web
Suggest a Journal
Journal of Computer Science
Year: 2009  |  Volume: 5  |  Issue: 11  |  Page No.: 778 - 782

An Improved Lazy Release Consistency Model

Chapram Sudhakar and T. Ramesh    

Abstract: Problem statement: A network of workstations, viewed as a distributed shared memory system can be used to develop and test parallel algorithms. Approach: For implementing parallel algorithms on such DSMs shared memory consistency model plays a vital role. Results: However on a LAN, strict consistency models like Sequential Consistency model (SC) are not useful since the communication is slow. In such environments relaxed models like Entry Consistency (EC), Release Consistency (RC) or their variations such as Lazy Release Consistency (LRC) are generally used. Conclusion/Recommendations: In this study an Improved Lazy Release Consistency (ILRC) model is proposed. This model is studied with standard parallel algorithms. In many cases the ILRC model is proved to work better than the LRC model.

PageArray : Array with one entry per shared page
ProcArray : Array with one list of interval records per process
DirtyList : Identifiers of pages that were modified during the current interval
VC : Local vector clock
Pid : local process identifier

Each shared page entry has fields for twin page, write notices, page manager, copy set and current status of the page. The twin page of a shared page is used for storing old contents of the corresponding page before attempting any modifications and is used later at the release time (or delayed till the next page difference request) to compute the page differences. The write notices field in the page entry describes modifications to the page. The entry for process i in the write notices array contains a list with all the write notices created by i for the page, that are known to the local process. Each of these write notice records describes the updates performed by i to the page in a given interval. The write notice record contains a pointer to the interval record describing that interval and a pointer to the differences containing the words of the page that were updated in the interval. The interval records contain a back pointer to a list with one write notice for each page that was modified during the interval. Whenever an interval record is created, it is tagged with the vector time and the identity of its creator. The page manager is the process which has the ownership of the page and acts like a home machine for that page. Page ownership might be changed dynamically depending on which process has recently modified the page. The copy set indicates the list of processes that are having copy of the shared page and used to recall the page from other processes when exclusive access to the page is required. The status field of a page entry is the operating system protection status for the page, i.e., if the status is no-access then any access to the page triggers a page fault and if the status is read-only a write access to the page triggers a page fault.

The procArray has an entry for each process. The entry for process i contains a list of interval records describing the intervals created by i that the local process knows about. This list is ordered by decreasing order of interval logical times. We refer to the value of VCi as i's vector time and to the value of VCi[i] as i's logical time. Similarly, the vector time of an interval created by i is the value of VCi when the interval is created and the logical time of the interval is the value of VCi[i].

The storage for the differences, the write notice records and the interval records is not freed until garbage collection is performed, i.e., a process effectively maintains a log of all shared memory accesses since the last garbage collection. This is necessary because any other process that requires a page which was referenced long back, may ask for entire history of differences for that page. In that case all the differences for that page from the oldest interval must be given in the reply for the page difference request.


Weaknesses of LRC model: There are several drawbacks for the original Lazy Release Consistency implementation, when applied to real parallel applications. If the data of the application is very large and is modified frequently, then the difference representation for one interval can exceed the size of original page itself. Even if smaller differences are there, collectively for a number of processes, which might acquire the lock in sequence before the current process, the total size of differences together may exceed the size of original page itself. The original LRC implementation is useful if one or few scalar variables are present in a page that is modified in critical section, where the resultant differences are smaller. It is not suitable for large data arrays that are modified frequently by large number of processes. The amount of space utilized by twin pages, which may be maintained for longer periods, if differences are not requested, is additional overhead in addition to the write notice records, interval records and page differences.

Computing page differences, when requested by other processes, is time consuming and requests get delayed. If in advance page differences are computed, then it may become wastage of time if those differences are not requested in the future. The improved implementation of LRC, overcomes these memory space and CPU time problems.

Improved LRC implementation: Improved LRC implementation takes annotated input source program. All the variables of the source program are divided into two categories, small data items and larger data items. Synchronization wise related set of small data items are kept together in one page. Different such sets are placed in to separate pages. The compiler includes the size element for each related data item set. This indicates used amount of memory for the variables in that page. Even though most of the page is wasted, for any parallel application usually very few such type of smaller data items exist. So the overall wastage is very less. For larger data items that may span multiple pages, any additional information is not required. Those pages are treated normally with write-invalidation protocol.

When a lock is acquired the process gets the list of current managers of the recently modified pages. The acquired process modifies its data structures to indicate the managers for each such modified page. For the pages which have not been modified since they are requested for the last time, those are still up to date. If an older page is referenced, then a page fault occurs. The page fault handler determines whether the page is small data item set page or normal page. It sends a partial page request or full page request to the manager of that page depending on page type. It can also request for ownership if the faulted instruction is write operation.

This method eliminates the need for computing page differences and maintaining write notice records, page differences and interval records. But for each page it maintains type of the page, page manager, last acquired interval and state information which occupies very little space. This is really advantageous to real parallel applications that require huge amounts of memory. As the related small variables are maintained in one page those can be requested at once with a single partial page request and thus communication can be reduced. Implementation details of this method with the data structures are given below.

ILRC data structures: Each process maintains the following data structures:

Array of NormalSharedPages : {manager, last interval, state}
Array of SmallDataItemSet : {Page number, size, manager, last interval, state}
IntervalRecord : Interval record holding write notices for small data item set pages
VectorClock : To order the intervals

A NormalSharedPage entry contains information about the page such as which process is the current manager, interval when the page is last requested and its state indicating read-only, writable and dirty. A SmallDataItem set entry contains the page number of the data items allocated to and their size in addition to the NormalSharedPage information. IntervalRecord contains for a given interval of vector time, the write notices for small data item set pages that are modified. The vector clock is used for partial ordering of the intervals in all processes.

The prototype system is designed to support any type of memory consistency model. It includes a generic memory consistency manager that supports any model with a common interface. The common interface contains set of operations for all memory consistency related events. Some of the operations of the memory consistency model interface are given below:

Interface of memory consistency model:
This function initializes the memory consistency model specific data structures. It is called automatically when the process is being created or if the process explicitly calls set_mctype() to set the memory consistency model to a specific model.

PageFault: This performs memory consistency model specific operations such as invalidation, updation, or request a page copy, when a page fault occurs.

LockAcquire: This is called when a lock/semaphore is locked by the current process.

LockRelease: This is called when a lock/semaphore is released by the current process. In the case of a barrier when joining, LockRelease is called and when leaving LockAcquire is called.

InvalidatePage: This is called when invalidation request for a page is received from other processes.

UpdatePage: This is called when update request is received.

EnlargeConsistencyData: This is called for expansion/shrinking of memory consistency specific data structures in the case of process address space size is changed.

ProcessMCRequest: This is called when any other type of memory consistency request is received from another process. This is provided for extending the support for any other type of memory consistency model specific events.

ProcessMCReply: This is also provided for extending the support for handling any consistency model specific reply message.

ProcessExit: This is called when a process is terminating, to upload the locally made changes to the data items to their respective home locations.

Environment and software: The prototype system is currently running with a set of X86 based 32-workstations connected with high-speed Ethernet LAN. All workstations execute the same copy of the operating system. The application can be developed using standard pthread library. Any pthread based parallel application can be executed without any modifications. But for using improved features of LRC and Entry consistency models source program must be annotated with the required information, which is a minor change. An application can be started from any workstation. When it creates threads, correspondingly processes are spawned on other workstations, which can run in parallel. All those processes use same selected memory consistency model or a default consistency model assigned by the system.


The developed memory consistency model has been tested with well known parallel algorithms such as reduction, sieve (finding prime numbers), matrix multiplication, merge sort, quick sort[8] , bucket sort and a branch and bound algorithm for travelling salesperson problem[12] and SPLASH-2 Benchmark programs. All of the programs are tested using 8-workstations. The data set sizes for each one of the algorithms and the comparative execution times using old LRC implementation and improved LRC implementation are shown in Table 1.

Merge sort algorithm changes most of the contents of the data frequently. So, as expected, ILRC is showing considerable performance improvement in this case. In Sieve algorithm, actually marking is reduced as the current prime number increases, which effectively reduces the amount of change in the flag array.

Table 1: Performance ILRC and LRC compared

But even then ILRC is showing slight improvement over the original LRC model. This is due to high amount of change in the flags array, in the initial stages. The reduction algorithm is very simple in which contention for lock (Global sum) occurs only at the end. But twin page creation for very short period usage, which is done sequentially one after the other processor, makes comparatively much difference. Hence some difference in execution times can be observed in this case also. Matrix multiplication problem is actually not affected by any particular model, because all computations are independent computations. Propagation of changes in the data arrays does not happen in this case. So there should not be any difference in execution time. But in ILRC improved communication primitives are used for propagation of matrix rows and hence the difference is shown in the table. In the case of travelling sales person problem, as the algorithm follows lexicographic search order, almost all processors search independently. The only data item shared both for reading and writing, is current known bound value. As it is simple small data item, there is no considerable difference between the two consistency models. In other SPLASH-2 applications and kernels also similar improvements can be observed.


The Improved Lazy Release Consistency model has been implemented and tested with standard parallel algorithms. For many cases ILRC model has shown better performance compared to the original LRC model. The only drawback is program needs to be annotated, to indicate small data item sets and large data item sets. It can be improved to identify dynamically small data item sets and large data item sets by prediction. Another possible improvement that can be done is adaptively using either whole page approach or incremental modification approach dynamically depending on the amount of changes.


Researchers thank MHRD for their financial support through the research project entitled "implementation of distributed shared memory system using page based memory management system".

" target="_blank">View Fulltext    |   Related Articles   |   Back
  Related Articles

No Article Found
Copyright   |   Desclaimer   |    Privacy Policy   |   Browsers   |   Accessibility