Table Of ContentSTUDIES IN COMPUTER SCIENCE 3
AND ARTIFICIAL INTELLIGENCE
Editors:
H. Kobayashi
IBM Japan Ltd.
Tokyo
M. Ni vat
Université Paris VII
Paris
NORTH-HOLLAND -AMSTERDAM · NEW YORK · OXFORD · TOKYO
CONCURRENCY CONTROL IN
DISTRIBUTED DATABASE SYSTEMS
WojciechCELLARY
Politechnika Poznanska
Poznan, Poland
ErolGELENBE
École des Hautes Études en Informatique
Université René Descartes, Paris, France
TadeuszMORZY
Politechnika Poznanska
Poznan, Poland
1988
NORTH-HOLLAND -AMSTERDAM · NEW YORK · OXFORD · TOKYO
© Elsevier Science Publishers B.V., 1988
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, withoutthe prior permission of the copyright owner.
ISBN: 0 444 70409 4
Publishers:
ELSEVIER SCIENCE PUBLISHERS B.V.
P.O. BOX 1991
1000 BZ AMSTERDAM
THE NETHERLANDS
Sole distributors for the U. S.A. and Canada:
ELSEVIER SCIENCE PUBLISHING COMPANY, INC.
52VANDERBILT AVENUE
NEW YORK, N.Y. 10017
U.S.A.
PRINTED IN THE NETHERLANDS
To those whose patience and encouragement
simplified our task:
Daromira, Kasia, Marcin and Przemko
Deniz, Pamir and Pasha
Anna and Mikolaj
Preface
In recent years research and practical applications in the area of distri-
buted systems have developed rapidly, stimulated by several factors.
In the first place, this is a consequence of the significant progress in two
fields of computer science, namely computer networks and database systems.
These ares have constituted the technical and scientific foundations for the
development of distributed systems.
This rapid development is also the result of the need for such systems
for management and control applications. Distributed computer systems
arise mainly because of the distributed nature of many engineering and
management systems such as banking systems, systems for production line
automation, service systems, reservation systems, inventory systems, infor-
mation retrieval systems, military systems, etc. The distributed nature of
these applications is better satisfied by distributed computer systems than
by centralized configurations.
The third motivation for distributed systems is of economic nature. The
development of VLSI circuit technology has considerably reduced the cost of
computer equipment, and changed the orders of magnitude of its price/perfor-
mance ratio. Present day "small" computers offer at far lower cost many of
the capabilities which were previously provided only by large mainframes.
Thus it has become possible to construct distributed computer systems con-
sisting of many small yet powerful coupled computers instead of installing
one large mainframe.
This evolution of computer technology has also lowered the relative cost
of computing versus communication and has provided high speed local area
networks. The cost of communication facilities is now comparable with that
of powerful mini-computers. Therefore, again it is worth constructing a
distributed system in which computers interchange computation results be-
tween them via communication links instead of installing one large main-
frame in which data is gathered via communication links.
XI
xii Preface
Another attractive aspect of distributed systems is the possibility of sim-
pler software design. Individual processors can thus be dedicated to par-
ticular functions of the system leading to the elimination of very complex
multiprogramming software usually associated with large mainframes.
Furthermore, the most important indices characterizing the quality of
computer systems can be improved by distributed systems. In particular
they allow:
• increased reliability and accessibility of the system due to the physical
replication and distribution of computer resources (i.e. data, comput-
ing power, etc.), since the crash of a single site does not necessarily
affect the other sites and does not lead to the unavailability of the
whole system;
• better system performance as a consequence of the increased level of
parallel processing, also obtained by bringing the system resources
closer to data sources and users;
• increased flexibility of the system resulting from its modular and open
structure which allows growth and smoother change of functions and
capacity;
• increased data security due to better protection in the case of hardware
and software failures or attempts to destroy data.
Some of the most advanced types of distributed systems are Distributed
Database Systems (DDBS) which may be defined as integrated database
systems composed of autonomous local databases geographically distributed
and interconnected by a computer network.
Research in the field of DDBS s is experiencing rapid growth ever since
the mid seventies. At present DDBS s are in the initial stages of com-
mercialization. Experimental DDBS s such as SDD-1, R*, SIRIUS-DELTA,
Distributed-Ingres, DDM and POREL have been tested and evaluated. Some
commercial systems such as ENCOMPASS from Tandem, and CICS/ISC
from IBM, are already available.
New problems arise in distributed database systems, in comparison with
centralized database systems, with respect to their management. A DDBS is
managed by a Distributed Database Management System (DDBMS) whose
main task is to give the users a "transparent" view of the distributed struc-
ture of the database, i.e. the illusion of having a monolithic and centralized
Preface xiii
database at their disposal. Distribution transparency, i.e. location and repli-
cation transparency, implies that the conceptual and external-level problems
(using the ANSI/SPARC terminology) of distributed databases do not es-
sentially differ from similar issues in centralized database systems. On the
other hand, the internal-level problems, i.e. physical database design and
DDBS management, are specific and qualitatively new.
The main issues in DDBS management can be classified in three principal
groups: concurrency control, query processing optimization, and reliability.
Solutions to these problems in a centralized environment are inappropri-
ate for a distributed environment because of differences in the internal-level
structure of the databases. Their effective solution conditions the possibility
of taking full advantage of DDBS structure and applications.
The fundamental problem facing the designers of DDBMSs is that of
the correct control of concurrent access to the distributed database by many
different users. This can be viewed as the design of an appropriate concur-
rency control algorithm. The construction of concurrency control algorithms
is of key importance to the whole management of distributed databases.
A solution of this problem will influence the solution of the two remaining
issues.
The general aim of a concurrency control algorithm is to ensure consis-
tency of the distributed database and the correct completion of each tran-
saction initiated in the system. An obvious additional requirement is to
minimize overhead and transaction response time, and to maximize DDBSs
throughput.
In the study of concurrency control in DDBSs three successive phases
can be distinguished. Initially, there has been an attempt to adopt concur-
rency control algorithms designed for multiaccess but centralized database
systems. This attempt was not successful for the following reasons. In
a DDBS every transaction can request simultaneous access to many local
databases located on physically dispersed computer sites. Thus, the concur-
rency control problem for DDBSs is more general than the similar problem
for centralized database systems. Moreover, in centralized database systems
concurrency control has to ensure the internal consistency of one single lo-
cal database. In DDBSs concurrency control has to guarantee the internal
consistency of several different local databases and the external consistency
of the distributed database understood as the identity of copies of the data
items.
Furthermore, in DDBSs no computer site will in general hold full infor-
mation on the global state of the whole system. Hence, all control decisions
XIV Preface
taken at a site of a DDBS have to be made on the basis of incomplete and not
entirely up-to-date information on the activities of the remaining sites. This
fact must be taken into consideration in every concurrency control method.
In a more recent past, three basic methods were designed in relation to
the syntactic model of concurrency control, i.e the model in which no se-
mantic information on transactions or data is assumed. These are locking,
timestamp ordering and validation. Studies related to the syntactic model of
concurrency control are still of interest. At present, research focuses on inte-
grating the basic concurrency control methods and the construction of hybrid
methods. Algorithms using hybrid methods, e.g. the bi-ordering locking al-
gorithm, guarantee better performance and eliminate system performance
failures (deadlock, permanent blocking, and cyclic and infinite restarting),
which can prevent some transactions from completing. The global resolu-
tion of the problems of DDBS consistency and DDBS performance failures
is one of the major advantages of hybrid methods.
In the next phase of studies on concurrency control problems a multiver-
sion data model was assumed. In this model every data item is a sequence of
versions created as a result of successive updates. Thus, the history of data
updates is stored in the database. Multiversion DDBS s are attractive to
DDBS designers for several reasons. They allow a higher degree of concur-
rency, they can be combined with reliability mechanisms in a natural way,
and they can be easily designed so that no queries are delayed or rejected.
In recent years there has been a new trend in research characterized
by a shift from the syntactic model of concurrency control to the semantic
model. Semantic information can concern data (e.g. physical structure of
the database, consistency constraints, etc.) or the set of transactions. At
present only preliminary results in this area are available. However this
trend seems to be very promising and will presumably provide significant
results in the near future.
The purpose of this monograph is to present DDBS concurrency control
algorithms and their related performance issues. The most recent results
have been taken into consideration. A detailed analysis and selection of these
results has been made so as to include those which, in the authors' opinion,
will promote applications and progress in the field. It can also be said that
the application of the methods and algorithms presented in the book is not
limited to DDBSs but also relates to centralized database systems and to
database machines which can often be considered as particular examples of
DDBSs .
Preface
xv
The book is intended primarily for DDBMS designers, but can also be
of use to those who are engaged in the design and management of databases
in general, as well as in problems of distributed system management such as
distributed operating systems, computer networks, etc.
This text consists of five parts. Part I is devoted to basic definitions and
models. In Chapter 1 a model of DDBSs is presented and its components,
the distributed database model and the transaction model, are discussed.
Distributed database consistency is introduced next. This chapter ends with
a description of DDBS architecture. In Chapter 2 definitions are given of
syntactic and semantic concurrency control models. For the syntactic model
the serializability criterion of transaction execution correctness is discussed
in relation to both mono- and multiversion DDBSs . Garcia-Molina's and
Lynch's approaches are presented for the semantic model. Chapter 3 covers
issues related to DDBS performance failures: deadlock, permanent blocking,
cyclic and infinite restarting.
In Part II, Chapters 4, 5, 6, and 7 discuss concurrency control methods in
monoversion DDBSs : the locking method, the timestamp ordering method,
the validation method and hybrid methods. For each method the concept,
the basic algorithms, a hierarchical version of the basic algorithms, and
methods for avoiding performance failures are given.
In Part III, Chapters 8, 9, 10 cover concurrency control methods in mul-
tiversion DDBSs : the multiversion locking method, the multiversion times-
tamp ordering method and the multiversion validation method.
Concurrency control methods for the semantic concurrency model are
given in Part IV. In Chapter 11, Garcia-Molina's locking algorithm which
uses the semantic criterion of transaction execution correctness is presented
and discussed. Chapter 12 is devoted to a locking algorithm which uses the
abstract data type approach.
Part V is composed of five chapters concerning performance issues. Chap-
ter 13 presents a general statement of the issue of performance in a DDBS
with respect to the service received by a particular transaction. Chapter
14 discusses the effect of concurrency control algorithms in general on the
DDBS's transaction processing capacity. Chapter 15 is devoted to the per-
formance evaluation of global locking policies, while Chapter 16 analyses the
performance issues related to locking policies based on individual data items
or granules. Finally, in Chapter 17 recent results on the performance of re-
sequencing such as timestamp ordering algorithms are presented. The whole
of Part V uses the performance evaluation methodology based on queuing
models and simulation tools.
xvi Preface
The book concludes with a comprehensive bibliography on the subject.
Acknowledgements
We would like to thank Dr. Geneviève Jomier from the University of
Paris Sud (Orsay) in France and Prof. Jan Wçglarz from the Technical Uni-
versity of Poznan in Poland who have helped in organizing the cooperation
which made this book possible.
We also thank those who have contributed to the preparation of the ca-
mera ready form of this book: Catherine Vinet, Marisela Hernandez, Gilbert
Harrus, Jerzy Strojny and Michal Jankowski.