Re: Latches with weak memory ordering (Re: max_wal_senders must die)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Date: 2010-11-15 13:45:40
Message-ID: 4CE13984.1070900@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.11.2010 15:22, Robert Haas wrote:
> On Mon, Nov 15, 2010 at 2:15 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Can you elaborate?
>>>
>>> Weak memory ordering means that stores into shared memory initiated by
>>> one processor are not guaranteed to be observed to occur in the same
>>> sequence by another processor. This implies first that the latch code
>>> could malfunction all by itself, if two processes manipulate a latch at
>>> about the same time, and second (probably much less likely) that there
>>> could be a malfunction involving a process that's waited on a latch not
>>> seeing the shared-memory status updates that another process did "before"
>>> setting the latch.
>>>
>>> This is not at all hypothetical --- my first attempt at rewriting the
>>> sinval signaling code, a couple years back, failed on PPC machines in
>>> the buildfarm because of exactly this type of issue.
>>
>> Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
>> all by itself. And I would've thought that declaring the Latch variable
>> "volatile" prevents rearrangements.
>
> It's not a question of code rearrangement.

Rearrangement of code, rearrangement of CPU instructions, or
rearrangement of the order the changes in the memory become visible to
other processes. The end result is the same.

> Suppose at time zero, the
> latch is unset, but owned. At approximately the same time, SetLatch()
> is called in one process and WaitLatch() in another process.
> SetLatch() sees that the latch is not set and sends SIGUSR1 to the
> other process. The other process receives the signal but, since
> waiting is not yet set, it ignores the signal. It then drains the
> self-pipe and examines latch->is_set. But as it turns out, the update
> by the process which called SetLatch() isn't yet visible to this
> process, because this process has a copy of those bytes in some
> internal cache that isn't guaranteed to be fully coherent. So even
> though SetLatch() already changed latch->is_set to true, it still
> looks false here. Therefore, we go to sleep on the latch.

Surely marking the latch pointer volatile would force the store to
is_set to be flushed, if not immediately, at least before the kill()
system call. No?

Looking at Tom's patch in sinvaladt.c, the problem there was that the
the store of the shared maxMsgNum variable could become visible to other
processes after the store of the message itself. Using a volatile
pointer for maxMsgNum would not have helped with that, because the
operations on other variables might still be rearranged with respect to
the store of maxMsgNum. SetLatch is simpler, there is only one variable
(ok, two, but your scenario didn't involve a change in owner_pid). It
seems safe to assume that the store becomes visible before the system call.

Tom's other scenario, where changing some other variable in shared
memory might not have become visible to other processes when SetLatch()
runs, seems more plausible (if harder to run into in practice). But if
the variable is meant to be examined by other processes, then you should
use a lock to protect it. Otherwise you'll have concurrency issues
anyway. Or at least use a volatile pointer, I believe it's safe to
assume that two operations using a volatile pointer will not be
rearranged wrt. each other.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-15 13:55:01 Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Previous Message Robert Haas 2010-11-15 13:22:32 Re: Latches with weak memory ordering (Re: max_wal_senders must die)