Skip site navigation (1) Skip section navigation (2)

Re: Latches with weak memory ordering (Re: max_wal_senders must die)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Date: 2010-11-22 11:54:55
Message-ID: 4CEA5A0F.1030602@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 21.11.2010 15:18, Robert Haas wrote:
> On Sat, Nov 20, 2010 at 4:07 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us>  wrote:
>> Robert Haas<robertmhaas(at)gmail(dot)com>  writes:
>>> So what DO we need to guard against here?
>>
>> I think the general problem can be stated as "process A changes two or
>> more values in shared memory in a fairly short span of time, and process
>> B, which is concurrently examining the same variables, sees those
>> changes occur in a different order than A thought it made them in".
>>
>> In practice we do not need to worry about changes made with a kernel
>> call in between, as any sort of context swap will cause the kernel to
>> force cache synchronization.
>>
>> Also, the intention is that the locking primitives will take care of
>> this for any shared structures that are protected by a lock.  (There
>> were some comments upthread suggesting maybe our lock code is not
>> bulletproof; but if so that's something to fix in the lock code, not
>> a logic error in code using the locks.)
>>
>> So what this boils down to is being an issue for shared data structures
>> that we access without using locks.  As, for example, the latch
>> structures.
>
> So is the problem case a race involving owning/disowning a latch vs.
> setting that same latch?

No. (or maybe that as well, but that's not what we've been concerned 
about here). As far as I've understood correctly, the problem is that 
process A does something like this:

/* set a shared variable */
((volatile bool *) shmem)->variable = true;
/* Wake up process B to notice that we changed the variable */
SetLatch();

And process B does this:

for (;;)
{
   ResetLatch();
   if (((volatile bool *) shmem)->variable)
     DoStuff();

   WaitLatch();
}

This is the documented usage pattern of latches. The problem arises if 
process A runs just before ResetLatch, but the effect of setting the 
shared variable doesn't become visible until after the if-test in 
process B. Process B will clear the is_set flag in ResetLatch(), but it 
will not DoStuff(), so it in effect misses the wakeup from process A and 
goes back to sleep even though it would have work to do.

This situation doesn't arise in the current use of latches, because the 
shared state comparable to shmem->variable in the above example is 
protected by a spinlock. But it might become an issue in some future use 
case.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-hackers by date

Next:From: Itagaki TakahiroDate: 2010-11-22 12:03:15
Subject: format() with embedded to_char() formatter
Previous:From: Magnus HaganderDate: 2010-11-22 11:37:39
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group