Quick Links

Re: Latches with weak memory ordering (Re: max_wal_senders must die)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Date:	2010-11-24 03:13:08
Message-ID:	AANLkTimC+tpwAqePGQQcctmbQ_SBeTBLMHLnh_3yTTNB@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Nov 22, 2010 at 6:54 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 21.11.2010 15:18, Robert Haas wrote:
>>
>> On Sat, Nov 20, 2010 at 4:07 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>>>>
>>>> So what DO we need to guard against here?
>>>
>>> I think the general problem can be stated as "process A changes two or
>>> more values in shared memory in a fairly short span of time, and process
>>> B, which is concurrently examining the same variables, sees those
>>> changes occur in a different order than A thought it made them in".
>>>
>>> In practice we do not need to worry about changes made with a kernel
>>> call in between, as any sort of context swap will cause the kernel to
>>> force cache synchronization.
>>>
>>> Also, the intention is that the locking primitives will take care of
>>> this for any shared structures that are protected by a lock. (There
>>> were some comments upthread suggesting maybe our lock code is not
>>> bulletproof; but if so that's something to fix in the lock code, not
>>> a logic error in code using the locks.)
>>>
>>> So what this boils down to is being an issue for shared data structures
>>> that we access without using locks. As, for example, the latch
>>> structures.
>>
>> So is the problem case a race involving owning/disowning a latch vs.
>> setting that same latch?
>
> No. (or maybe that as well, but that's not what we've been concerned about
> here). As far as I've understood correctly, the problem is that process A
> does something like this:
>
> /* set a shared variable */
> ((volatile bool *) shmem)->variable = true;
> /* Wake up process B to notice that we changed the variable */
> SetLatch();
>
> And process B does this:
>
> for (;;)
> {
> ResetLatch();
> if (((volatile bool *) shmem)->variable)
> DoStuff();
>
> WaitLatch();
> }
>
> This is the documented usage pattern of latches. The problem arises if
> process A runs just before ResetLatch, but the effect of setting the shared
> variable doesn't become visible until after the if-test in process B.
> Process B will clear the is_set flag in ResetLatch(), but it will not
> DoStuff(), so it in effect misses the wakeup from process A and goes back to
> sleep even though it would have work to do.
>
> This situation doesn't arise in the current use of latches, because the
> shared state comparable to shmem->variable in the above example is protected
> by a spinlock. But it might become an issue in some future use case.

Eh, so, should we do anything about this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Latches with weak memory ordering (Re: max_wal_senders must die) at 2010-11-22 11:54:55 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Fetter	2010-11-24 03:21:39	Re: Tab completion for view triggers in psql
Previous Message	Peter Tanski	2010-11-24 03:12:18	Re: GiST seems to drop left-branch leaf tuples