Re: Latches with weak memory ordering (Re: max_wal_senders must die)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Date: 2010-11-15 13:22:32
Message-ID: AANLkTikU5n9q4-3sCP-EwUThMV2WutpMMeCqBjnY9hYt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 15, 2010 at 2:15 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Can you elaborate?
>>
>> Weak memory ordering means that stores into shared memory initiated by
>> one processor are not guaranteed to be observed to occur in the same
>> sequence by another processor.  This implies first that the latch code
>> could malfunction all by itself, if two processes manipulate a latch at
>> about the same time, and second (probably much less likely) that there
>> could be a malfunction involving a process that's waited on a latch not
>> seeing the shared-memory status updates that another process did "before"
>> setting the latch.
>>
>> This is not at all hypothetical --- my first attempt at rewriting the
>> sinval signaling code, a couple years back, failed on PPC machines in
>> the buildfarm because of exactly this type of issue.
>
> Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
> all by itself. And I would've thought that declaring the Latch variable
> "volatile" prevents rearrangements.

It's not a question of code rearrangement. Suppose at time zero, the
latch is unset, but owned. At approximately the same time, SetLatch()
is called in one process and WaitLatch() in another process.
SetLatch() sees that the latch is not set and sends SIGUSR1 to the
other process. The other process receives the signal but, since
waiting is not yet set, it ignores the signal. It then drains the
self-pipe and examines latch->is_set. But as it turns out, the update
by the process which called SetLatch() isn't yet visible to this
process, because this process has a copy of those bytes in some
internal cache that isn't guaranteed to be fully coherent. So even
though SetLatch() already changed latch->is_set to true, it still
looks false here. Therefore, we go to sleep on the latch.

At this point, we are very likely screwed. If we're lucky, yet a
third process will come along, also see the latch as still unset (even
though it is), and set it again, waking up the owner. But if we're
unlucky, by the time that third process comes along, the memory update
will have become visible everywhere and all future calls to SetLatch()
will exit quickly, leaving the poor shmuck who waited on the latch
sleeping for all eternity.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-11-15 13:45:40 Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Previous Message Robert Haas 2010-11-15 13:09:12 Re: changing MyDatabaseId