Re: cheaper snapshots

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: cheaper snapshots
Date: 2011-07-28 13:05:54
Message-ID: CA+Tgmoa-=+XzifRTeCCtzWjpkFHwWeSqdWwAAupgBr3D4R5PcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 28, 2011 at 4:16 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jul28, 2011, at 04:51 , Robert Haas wrote:
>> One fly in the ointment is that 8-byte
>> stores are apparently done as two 4-byte stores on some platforms.
>> But if the counter runs backward, I think even that is OK.  If you
>> happen to read an 8 byte value as it's being written, you'll get 4
>> bytes of the intended value and 4 bytes of zeros.  The value will
>> therefore appear to be less than what it should be.  However, if the
>> value was in the midst of being written, then it's still in the midst
>> of committing, which means that that XID wasn't going to be visible
>> anyway.  Accidentally reading a smaller value doesn't change the
>> answer.
>
> That only works if the update of the most-significant word is guaranteed
> to be visible before the update to the lest-significant one. Which
> I think you can only enforce if you update the words individually
> (and use a fence on e.g. PPC32). Otherwise you're at the mercy of the
> compiler.
>
> Otherwise, the following might happen (with a 2-byte value instead of an
> 8-byte one, and the assumption that 1-byte stores are atomic while 2-bytes
> ones aren't. Just to keep the numbers smaller. The machine is assumed to be
> big-endian)
>
> The counter is at 0xff00
> Backends 1 decrements, i.e. does
> (1)  STORE [counter+1] 0xff
> (2)  STORE [counter], 0x00
>
> Backend 2 reads
> (1')  LOAD [counter+1]
> (2')  LOAD [counter]
>
> If the sequence of events is (1), (1'), (2'), (2), backend 2 will read
> 0xffff which is higher than it should be.

You're confusing two different things - I agree that you need a
spinlock around reading the counter, unless 8-byte loads and stores
are atomic.

What I'm saying can be done without a lock is reading the commit-order
value for a given XID. If that's in the middle of being updated, then
the old value was zero, so the scenario you describe can't occur.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-07-28 13:38:35 Re: cheaper snapshots
Previous Message Robert Haas 2011-07-28 13:03:29 Re: cheaper snapshots