Re: max_wal_senders must die

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: max_wal_senders must die
Date: 2010-11-14 20:55:33
Message-ID: 10468.1289768133@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 13.11.2010 17:07, Tom Lane wrote:
>> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>>> Come to think of it, I'm not really sure I understand what protects
>>> SetLatch() against memory ordering hazards. Is that actually safe?
>>
>> Hmm ... that's a good question. It certainly *looks* like it could
>> malfunction on machines with weak memory ordering.

> Can you elaborate?

Weak memory ordering means that stores into shared memory initiated by
one processor are not guaranteed to be observed to occur in the same
sequence by another processor. This implies first that the latch code
could malfunction all by itself, if two processes manipulate a latch at
about the same time, and second (probably much less likely) that there
could be a malfunction involving a process that's waited on a latch not
seeing the shared-memory status updates that another process did "before"
setting the latch.

This is not at all hypothetical --- my first attempt at rewriting the
sinval signaling code, a couple years back, failed on PPC machines in
the buildfarm because of exactly this type of issue.

The quick-and-dirty way to fix this is to attach a spinlock to each
latch, because we already have memory ordering sync instructions in
the spinlock primitives. Doing better would probably involve developing
a new set of processor-specific primitives --- which would be pretty
easy for the processors where we have gcc inline asm, but it would take
some research for the platforms where we're relying on magic OS-provided
subroutines.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Darren Duncan 2010-11-14 20:55:46 Re: Refactoring the Type System
Previous Message Josh Berkus 2010-11-14 20:52:39 Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal