Re: Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)

From: Michael Tautschnig <mt(at)debian(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jade Alglave <jade(dot)alglave(at)cs(dot)ox(dot)ac(dot)uk>, Vincent Nimal <vincent(dot)nimal(at)balliol(dot)ox(dot)ac(dot)uk>, Daniel Kroening <kroening(at)cs(dot)ox(dot)ac(dot)uk>
Subject: Re: Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)
Date: 2012-03-24 17:01:32
Message-ID: 20120324170131.GB8779@l04.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi again,

[...]
>
> However, your example is enough unlike the actual code that the
> conclusion you state following the word "clearly" isn't actually clear
> to me. According to latch.h, the correct method of using a latch is
> like this:
>
> * for (;;)
> * {
> * ResetLatch();
> * if (work to do)
> * Do Stuff();
> * WaitLatch();
> * }
>
> Meanwhile, anyone who is creating additional work to do should add the
> work to the queue and then set the latch.

When writing the above statement, including the "clearly", we were possibly too
much thinking of the above usage hint, which just uses ResetLatch and WaitLatch,
and not considering that SetLatch is to be part of Do Stuff().

So here are once again our version, and the more properly translated one, this
time including SetLatch (in line 16).

Our version: In PostgreSQL-function terms:

1 #define WORKERS 2 1 #define WORKERS 2
2 volatile _Bool latch[WORKERS]; 2 volatile _Bool latch[WORKERS];
3 volatile _Bool flag[WORKERS]; 3 volatile _Bool flag[WORKERS];
4 4
5 void worker(int i) 5 void worker(int i)
6 { 6 {
7 while(!latch[i]); 7 WaitLatch(i);
8 for(;;) 8 for(;;)
9 { 9 {
10 assert(!latch[i] || flag[i]); 10 assert(!latch[i] || flag[i]);
11 latch[i] = 0; 11 ResetLatch(i);
12 if(flag[i]) 12 if(flag[i])
13 { 13 {
14 flag[i] = 0; 14 flag[i] = 0;
15 flag[i+1 % WORKERS] = 1; 15 flag[i+1 % WORKERS] = 1;
16 latch[i+1 % WORKERS] = 1; 16 SetLatch(i+1 % WORKERS);
17 } 17 }
18 18
19 while(!latch[i]); 19 WaitLatch(i);
20 } 20 }
21 } 21 }

>
> So it seems to me that we could potentially fix this by inserting
> barriers at the end of ResetLatch and at the beginning of SetLatch and
> WaitLatch. Then the latch has to get reset before we check whether
> there's work to do; and we've got to finish checking for work before
> we again try to wait for the latch. Similarly, any work that was in
> progress before SetLatch was called will be forced to be committed to
> memory before SetLatch does anything else. Adding that many barriers
> might not be very good for performance but it seems OK from a
> correctness point of view, unless I am missing something, which is
> definitely possible. I'd appreciate any thoughts you have on this, as
> this is clearly subtle and tricky to get exactly right.
>

So we had suggested the following bugfixes:

>
> > In [3] it was suggested to fix the problem by placing a barrier in ResetLatch,
> > which corresponds to placing it between lines 11 and 12 in the code above. This
> > amounts to placing a barrier between the two reads (lines 7/19 and 12, i.e.,
> > between WaitLatch and the if(flag[1]) ) of Worker 1.
> >
> > Placing a sync (i.e., the strongest Power barrier) accordingly would, however,
> > still be insufficient for the second problem, as it would only fix the
> > reordering of read-read pairs by Worker 1 and the store atomicity issue from
> > Worker 0. But the writes on Worker 0 could still be reordered (problem number
> > 2). One possible fix consists of placing a sync between the two writes on Worker
> > 0, and an address dependency between the two reads on Worker 1. Clearly,
> > however, these are changes that cannot any longer be hidden behind the
> > ResetLatch/WaitLatch interface, but rather go in the code using these.
>
>

Here, "the two writes on Worker 0" corresponds to lines 15 and 16. And indeed
line 16 is exactly the call to SetLatch. For solving problem 1, the mp idiom,
the following options are possible (in all cases stronger synchronisation
primitives may be used, i.e., the strongest Power barrier, sync, may be used, or
lwsync may be used instead of an address dependency):

1. An lwsync at the beginning of SetLatch, and lwsync in ResetLatch (preferably
after the write).
2. An lwsync at the beginning of SetLatch, and an address dependency in
ResetLatch.

To address the second problem, the lb idiom, an address dependency has to be put
either in WaitLatch or SetLatch.

To fix both problems, the performance-wise cheapest option would thus be placing
an address dependency in ResetLatch and an lwsync in SetLatch. For practical
reasons, however, placing an lwsync in both places (at the beginning of SetLatch
and after the write in ResetLatch) might be preferable, as address dependencies
may be optimised away by the C compiler or require inline assembly in a form not
as easy to factor out as lwsync, plus the interface of ResetLatch would have to
be amended.

In summary, we were thus able to show that both points marked with "XXX there
really ought to be a memory barrier" in

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4e15a4db5e65e43271f8d20750d6500ab12632d0

are the appropriate points to place memory synchronisation primitives, and
picking an lwsync-equivalent in both cases is sound and does not require any
other modifications.

Best,
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-03-24 18:48:06 Re: Fix PL/Python metadata when there is no result
Previous Message Gianni Ciolli 2012-03-24 10:01:31 Re: [PATCH] Support for foreign keys with arrays