Re: sinval synchronization considered harmful

From: Noah Misch <noah(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sinval synchronization considered harmful
Date: 2011-07-27 03:35:38
Message-ID: 20110727033537.GB18910@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 26, 2011 at 06:04:16PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)2ndQuadrant(dot)com> writes:
> > On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote:
> >> Dirty cache line, maybe not, but what if the assembly code commands the
> >> CPU to load those variables into CPU registers before doing the
> >> comparison? If they're loaded with maxMsgNum coming in last (or at
> >> least after resetState), I think you can have the problem without any
> >> assumptions about cache line behavior at all. You just need the process
> >> to lose the CPU at the right time.
>
> > True. If the compiler places the resetState load first, you could hit the
> > anomaly by "merely" setting a breakpoint on the next instruction, waiting for
> > exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend continue.
> > I think, though, we should either plug that _and_ the cache incoherency case or
> > worry about neither.
>
> How do you figure that? The poor-assembly-code-order risk is both a lot
> easier to fix and a lot higher probability. Admittedly, it's still way
> way down there, but you only need a precisely-timed sleep, not a
> precisely-timed sleep *and* a cache line that somehow remained stale.

I think both probabilities are too low to usefully distinguish. An sinval
wraparound takes a long time even in a deliberate test setup: almost 30 hours @
10k messages/sec. To get a backend to sleep that long, you'll probably need
something like SIGSTOP or a debugger attach. The sleep has to fall within the
space of no more than a few instructions. Then, you'd need to release the
process at the exact moment for it to observe wrapped equality. In other words,
you get one split-millisecond opportunity every 30 hours of process sleep time.
If your backends don't have multi-hour sleeps, it can't ever happen.

Even so, all the better if we settle on an approach that has neither hazard.

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petro Meier 2011-07-27 06:51:22 PQescapeByteaConn - returns wrong string for PG9.1 Beta3
Previous Message Robert Haas 2011-07-27 01:57:10 Re: sinval synchronization considered harmful