Table Of Content-- --
.
Designing Memory Consistency Models For
Shared-Memory Multiprocessors
Sarita V. Adve
Computer Sciences Technical Report #1198
University of Wisconsin-Madison
December 1993
-- --
DESIGNING MEMORY CONSISTENCY MODELS FOR
SHARED-MEMORY MULTIPROCESSORS
by
SARITA VIKRAM ADVE
A thesis submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
(Computer Sciences)
at the
UNIVERSITY OF WISCONSIN-MADISON
1993
-- --
(cid:211)
Copyright by Sarita Vikram Adve 1993
All Rights Reserved
-- --
Abstract
The memory consistency model(or memory model) of a shared-memory multiprocessor system influences
boththeperformanceandtheprogrammabilityofthesystem. Thesimplestandmostintuitivemodelforprogram-
mers,sequentialconsistency,restrictstheuseofmanyperformance-enhancingoptimizationsexploitedbyunipro-
cessors. For higher performance, several alternative models have been proposed. However, many of these are
hardware-centricinnature anddifficulttoprogram. Further,the multitudeof many seemingly unrelated memory
models inhibits portability. We use a 3P criteria of programmability, portability, and performance to assess
memorymodels,andfindcurrent modelslackinginoneormoreofthesecriteria. Thisthesisestablishesaunify-
ingframeworkforreasoningaboutmemorymodelsthatleadstomodelsthatadequatelysatisfythe3Pcriteria.
Thefirstcontributionofthisthesisisaprogrammer-centricmethodology,calledsequentialconsistencynor-
mal form (SCNF), for specifying memory models. This methodology is based on the observation that perfor-
mance enhancing optimizations can be allowed without violating sequential consistency if the system is given
some information about the program. An SCNF model is a contract between the system and the programmer,
where the system guarantees both high performance and sequential consistency only if the programmer provides
certain information about the program. Insufficient information gives lower performance, but incorrect informa-
tionviolatessequentialconsistency. Thismethodologyadequatelysatisfiesthe3Pcriteriaofprogrammability(by
providing sequential consistency), portability (by providing a common interface of sequential consistency across
allmodels),andperformance(byonlyrequiringtheappearanceofsequentialconsistencyforcorrectprograms).
Thesecondcontributiondemonstratestheeffectiveness oftheSCNFapproach byapplyingittotheoptimi-
zations of several previous hardware-centric models. We propose four SCNF models that unify many hardware-
centric models. Althoughbased onintuitionsimilartothe hardware-centricmodels,the SCNFmodelsare easier
to program, enhance portability, and allow more implementations (and potentially higher performance) than the
correspondinghardware-centricmodels.
The third contribution culminates the above work by exposing a large part of the design space of SCNF
models. The SCNF models satisfy the 3P criteria well, but are difficult to design. The complexity arises because
the relationship between system optimizations and programmer-provided information that allows system optimi-
zationswithoutviolatingsequentialconsistencyiscomplex. Wesimplifythisrelationshipandusethesimplerela-
tionship to characterize and explore the design space of SCNF models. In doing so, we show the unexploited
potentialinthedesignspace,leadingtoseveralnewmemorymodels.
The finalcontributionconcerns debugging programs on SCNFmodels. While debugging, the programmer
may unknowingly provide incorrect information, leading to a violation of sequential consistency. We apply
debuggingtechniquesforsequentialconsistencytotwooftheSCNFmodelstoalleviatethisproblem.
ii
-- --
Acknowledgements
Manypeoplehavecontributedtothiswork. Here,Icanonlymentionafew.
Mark Hill, my advisor, has provided invaluable technical and professional advice, which has guided me
throughout my graduate school years and willcontinue toguide me inthe future. Also,hisconstant supportand
encouragementmadethePh.D.processsomuchlessoverwhelming.
I am indebted to Jim Goodman for the support and encouragement he provided, especially in my initial
yearsatWisconsin.Ihavealsoenjoyedmanystimulatingdiscussionswithhimonmemoryconsistencymodelsin
particular,andcomputerarchitectureingeneral.
I am specially grateful toGuri Sohifor raising many incisive questionsonthiswork, and then for hispati-
enceinmanylongdiscussionstoaddressthosequestions.
JimLarusandDavidWoodhaveonmanyoccasionsprovidedvaluablefeedbackonthiswork.
I have enjoyed many, many hours of discussion on memory consistency models with Kourosh Gharachor-
loo. Somepartsofthisthesishaveevolvedfromourjointwork.
IhaveoftenlookedtowardsMaryVernonforprofessionaladvice,andamgratefultoherforalwaysmaking
thetimeforme.
The support and encouragement I have received from my parents, Vikram’s parents, and our brothers and
sistersisimmeasurable. Theirprideinourworkmadeeachlittleachievementsomuchmorepleasurable,andthe
finalgoalevenmoredesirable.
Finally,myhusband,Vikram,hasbeenbothcolleagueandcompanion. Ihavebeenabletorelyonhisfrank
appraisalsofmywork,andhistechnicaladviceandsupporthavecometomyrescueumpteentimes. Hisconstant
presencethroughalltheupsanddownsofgraduateschoolhasmadetheridesomuchsmoother.
iii
-- --
Table of Contents
Abstract ....................................................................................................................................................... ii
Acknowledgements ...................................................................................................................................... iii
1.Introduction ............................................................................................................................................. 1
1.1.Motivation .................................................................................................................................. 1
1.2.SummaryofContributions ......................................................................................................... 5
1.3.ThesisOrganization .................................................................................................................... 8
2.RelatedWork ........................................................................................................................................... 8
2.1.SequentialConsistency ............................................................................................................... 9
2.1.1.Definition ....................................................................................................................... 9
2.1.2.SequentialConsistencyvs.CacheCoherence ............................................................... 9
2.1.3.ImplementationsThatDisobeySequentialConsistency ............................................... 10
2.1.4.ImplementationsthatObeySequentialConsistency ..................................................... 11
2.1.5.WhyRelaxedMemoryModels? ................................................................................... 13
2.2.RelaxedMemoryModels ........................................................................................................... 15
2.2.1.WeakOrderingAndRelatedModels ............................................................................ 15
2.2.2.ProcessorConsistencyAndRelatedModels ................................................................. 16
2.2.3.ReleaseConsistencyAndRelatedModels .................................................................... 18
2.2.4.OtherRelaxedModels ................................................................................................... 20
2.3.FormalismsForSpecifyingMemoryModels ............................................................................. 21
2.4.PerformanceBenefitsofRelaxedMemorySystems .................................................................. 22
2.5.CorrectnessCriteriaStrongerThanorSimilartoSequentialConsistency ................................ 23
3.AProgrammer-CentricMethodologyforSpecifyingMemoryModels .................................................. 25
3.1.MotivationforaProgrammer-CentricMethodology ................................................................. 25
3.2.SequentialConsistencyNormalForm(SCNF) .......................................................................... 26
3.3.ConceptsandTerminologyforDefiningSCNFModels ............................................................ 28
3.3.1.DichotomyBetweenStaticandDynamicAspectsofaSystem .................................... 28
3.3.2.TerminologyforDefiningSCNFModels ..................................................................... 29
4.AnSCNFMemoryModel:Data-Race-Free-0 ........................................................................................ 34
4.1.DefinitionoftheData-Race-Free-0MemoryModel .................................................................. 34
4.1.1.MotivationforData-Race-Free-0 .................................................................................. 34
4.1.2.DefinitionofData-Race-Free-0 ..................................................................................... 35
4.1.3.DistinguishingMemoryOperations .............................................................................. 38
4.2.ProgrammingWithData-Race-Free-0 ....................................................................................... 41
iv
-- --
4.3.ImplementationsofData-Race-Free-0 ....................................................................................... 44
4.4.ComparisonofData-Race-Free-0withWeakOrdering ............................................................ 46
5.AFormalismToDescribeSystemImplementationsandImplementationsofData-Race-Free-0 .......... 49
5.1.AFormalismforShared-MemorySystemDesigners ................................................................ 49
5.1.1.AFormalismforShared-MemorySystemsandExecutions ......................................... 49
5.1.2.UsingTheFormalismtoDescribeSystemImplementations ........................................ 51
5.1.3.AnAssumptionandTerminologyforSystem-CentricSpecifications .......................... 52
5.2.AHigh-LevelSystem-CentricSpecificationforData-Race-Free-0 .......................................... 53
5.3.Low-LevelSystem-CentricSpecificationsandImplementationsofData-Race-Free-0 ............ 55
5.3.1.TheDataRequirement .................................................................................................. 56
5.3.2.TheSynchronizationRequirement ................................................................................ 63
5.3.3.TheControlRequirement .............................................................................................. 64
5.4.ImplementationsforCompilers .................................................................................................. 65
5.5.AllImplementationsofWeakOrderingObeyData-Race-Free-0 ............................................. 67
6.ThreeMoreSCNFModels:Data-Race-Free-1,PLpc1,andPLpc2 ....................................................... 68
6.1.TheData-Race-Free-1MemoryModel ...................................................................................... 68
6.1.1.MotivationofData-Race-Free-1 ................................................................................... 68
6.1.2.DefinitionofData-Race-Free-1 ..................................................................................... 69
6.1.3.ProgrammingwithData-Race-Free-1 ........................................................................... 72
6.1.4.ImplementingData-Race-Free-1 ................................................................................... 73
6.1.5.ComparisonofData-Race-Free-1withReleaseConsistency(RCsc) ........................... 74
6.2.ThePLpc1MemoryModel ........................................................................................................ 76
6.2.1.MotivationofPLpc1 ..................................................................................................... 76
6.2.2.DefinitionofPLpc1 ....................................................................................................... 77
6.2.3.ProgrammingWithPLpc1 ............................................................................................. 79
6.2.4.ImplementingPLpc1 ..................................................................................................... 82
6.2.5.ComparisonofPLpc1withSPARCV8andData-Race-FreeSystems ........................ 83
6.3.ThePLpc2MemoryModel ........................................................................................................ 85
6.3.1.MotivationofPLpc2 ..................................................................................................... 85
6.3.2.DefinitionofPLpc2 ....................................................................................................... 85
6.3.3.ProgrammingwithPLpc2 ............................................................................................. 87
6.3.4.ImplementingPLpc2 ..................................................................................................... 87
6.3.5.ComparisonofPLpc2withProcessorConsistency,RCpc,andPLpc1Systems .......... 89
6.4.ThePLpcMemoryModel .......................................................................................................... 90
6.5.ComparisonofPLpcModelswithIBM370andAlpha ............................................................ 91
6.6.Discussion ................................................................................................................................... 93
7.TheDesignSpaceofSCNFMemoryModels ......................................................................................... 94
7.1.AConditionforSequentialConsistency .................................................................................... 95
v
-- --
7.1.1.ASimpleCondition ....................................................................................................... 95
7.1.2.ModificationstotheConditionforSequentialConsistency .......................................... 98
7.2.DesigningAnSCNFMemoryModel ....................................................................................... 102
7.3.NewSCNFMemoryModels .................................................................................................... 103
7.3.1.ModelsMotivatedbyOptimizationsonBaseSystem ................................................ 103
7.3.1.1.Out-Of-OrderIssue ........................................................................................ 104
7.3.1.2.PipeliningOperations(WithIn-OrderIssue) ................................................. 107
7.3.1.3.SinglePhaseUpdateProtocols ....................................................................... 108
7.3.1.4.EliminatingAcknowledgements .................................................................... 110
7.3.2.ModelsMotivatedbyCommonProgrammingConstructs .......................................... 110
7.3.2.1.Producer-ConsumerSynchronization ............................................................ 111
7.3.2.2.Barriers ........................................................................................................... 113
7.3.2.3.LocksandUnlocks ......................................................................................... 114
7.3.2.4.ConstructstoDecreaseLockContention ....................................................... 118
7.4.CharacterizationoftheDesignSpaceofSCNFMemoryModels ........................................... 121
7.5.ImplementingAnSCNFMemoryModel ................................................................................. 123
7.5.1.MotivationfortheControlRequirement ..................................................................... 123
7.5.2.Low-LevelSystem-CentricSpecificationsandHardwareImplementations .............. 126
7.5.3.CompilerImplementations .......................................................................................... 132
7.6.RelationwithPreviousWork ................................................................................................... 133
7.6.1.RelationwithWorkbyShashaandSnir ..................................................................... 133
7.6.2.RelationwithWorkbyCollier .................................................................................... 134
7.6.3.RelationwithWorkbyBitar ....................................................................................... 134
7.6.4.RelationwithData-Race-FreeandPLpcmodels ........................................................ 134
7.6.5.RelationwithaFrameworkforSpecifyingSystem-CentricRequirements ................ 135
7.6.6.RelationofNewModelswithOtherModels .............................................................. 136
7.7.Conclusions .............................................................................................................................. 137
8.DetectingDataRacesOnData-Race-FreeSystems .............................................................................. 139
8.1.ProblemsinApplyingDynamicTechniquestoData-Race-FreeSystems ............................... 140
8.2.ASystem-CentricSpecificationforDynamicDataRaceDetection ........................................ 142
8.3.Data-Race-FreeSystemsOftenObeyConditionforDynamicDataRaceDetection .............. 144
8.4.DetectingDataRacesonData-Race-FreeSystems ................................................................. 145
8.5.RelatedWork ............................................................................................................................ 146
8.6.Conclusions .............................................................................................................................. 146
9.Conclusions ........................................................................................................................................... 148
9.1.ThesisSummary ....................................................................................................................... 148
9.2.Whatnext? ................................................................................................................................ 150
References ................................................................................................................................................. 152
vi
-- --
AppendixA:EquivalenceofDefinitionsofRaceforData-Race-Free-0Programs ................................. 159
AppendixB:ModifiedUniprocessorCorrectnessCondition .................................................................... 160
AppendixC:CorrectnessofCondition7.12forSequentialConsistency ................................................. 162
AppendixD:AConstructiveFormoftheControlRelation ..................................................................... 165
AppendixE:CorrectnessofLow-LevelSystem-CentricSpecificationofControlRequirement ............. 171
AppendixF:CorrectnessofLow-LevelSystem-CentricSpecificationofData-Race-Free-0 .................. 198
AppendixG:System-CentricSpecificationsofPLpc1AndPLpc2 .......................................................... 199
AppendixH:PortingPLpc1andPLpc2ProgramstoHardware-CentricModels .................................... 208
AppendixI:CorrectnessofTheorem8.5forDetectingDataRaces ......................................................... 214
vii
-- --
Chapter 1
Introduction
1.1. Motivation
Parallel systems can potentially use multiple uniprocessors to provide orders of magnitude higher perfor-
mance thanstate-of-the-art uniprocessor systemsatcomparable cost. Parallelsystemsthatprovideanabstraction
of a single address space (or shared-memory systems) simplify many aspects of programming, compared to the
alternative of message passing systems. For example, the shared-memory abstraction facilitates load balancing
through processor independent data structures, allows pointer-based data structures, and allows the effective use
oftheentiresystemmemory[LCW93,LeM92]. Shared-memoryalsopermitsdecouplingthecorrectnessofapro-
gramfromitsperformance,allowingincrementaltuningofprogramsforhigherperformance.
The shared-memory abstraction can be provided either by hardware, software, or a combination of both.
Pure hardware configurations include uniform [GGK83] and non-uniform access machines [ReT86]. Most
configurationsemploycaches toreduce memorylatency,andeitherusesnooping[Bel85] or directory-based pro-
tocols[ASH88,Gus92,LLG90]tokeepthecaches up-to-date. Runtimesoftware basedsystemstypicallyusevir-
tualmemory hardware totrap onnon-local references, and then invoke system software to handle the references
[CBZ91,Li88]. Othersoftware-basedsystemsdependoncompilersthatdetectnon-localsharedmemoryaccesses
in high-level code, and convert them into appropriate messages for the underlying message passing machine
[HKT92]. Several systemsemployacombinationofhardware andsoftwaretechniques. Forexample,insystems
using software-based cache coherence [BMW85,ChV88,PBG85], hardware provides a globally addressable
memory,butthecompilerisresponsibleforensuringthatshareddatainacacheisup-to-datewhenrequiredbyits
processor. The Alewife system [ALK90] and the Cooperative Shared Memory scheme [HLR92] manage cache-
ableshared-datainhardwareforthecommoncases,butinvokeruntimesoftwareforthelessfrequentcases.
Building scalable, high-performance shared-memory systems that are also easy to program, however, has
remained an elusive goal. Thisthesisexamines one aspect of shared-memory systemdesignthataffects boththe
performance and programmability of all of the above types of shared-memory systems. This aspect is called the
memoryconsistencymodelormemorymodel.
A memory model for a shared-memory system is an architectural specification of how memory operations
of a program will appear to execute to the programmer. The memory model, therefore, specifies the values that
read operations of a program executed on a shared-memory system may return. The terms system, program and
programmer can be usedatseveral levels. Atthelowestlevel,thesystemisthemachinehardware, aprogramis
the machine code, and a programmer is any person that writes or reasons about such machine code. At a higher
level, the system may be the machine hardware along with the software that converts high-level language code
intomachine-levelcode,aprogrammaybethehigh-levellanguagecode,andaprogrammer maybethewriterof
suchprograms. Amemorymodelspecificationisrequiredforeverylevelofthesystem,andaffectstheprograms
andprogrammersofthatlevel.
Thememorymodelaffectstheeaseofprogrammingsinceprogrammersmustusethemodeltoreasonabout
theresultstheirprogramswillproduce. Thememorymodelalsoaffectstheperformance ofthesystembecauseit
determines when a processor can execute two memory operations in parallel or out of program order, when a
valuewrittenbyoneprocessor canbemadevisibletootherprocessors,andhowmuchcommunicationamemory
operation will incur. Thus, the memory model forms an integral part of the entire system design (including the
processor,interconnectionnetwork,compiler,andprogramminglanguage)andtheprocessofwritingparallelpro-
grams.