Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Date: 2016-05-10 16:19:16
Message-ID: 20160510161916.23wwzxn26y5ou2ez@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-05-10 08:09:02 -0400, Robert Haas wrote:
> On Tue, May 10, 2016 at 3:05 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > The easy way to trigger this problem would be to have an oid wraparound
> > - but the WAL shows that that's not the case here. I've not figured
> > that one out entirely (and won't tonight). But I do see WAL records
> > like:
> > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID 4302693
> > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693
> > i.e. two NEXTOID records allocating the same range, which obviously
> > doesn't seem right. There's also every now and then close by ranges:
> > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 1/9A404DB8, prev 1/9A404270, desc: NEXTOID 3311455
> > rmgr: XLOG len (rec/tot): 4/ 30, tx: 7814505, lsn: 1/9A4EC888, prev 1/9A4EB9D0, desc: NEXTOID 3311461
> >
> >
> > As far as I can see something like the above, or an oid wraparound, are
> > pretty much deadly for toast.
> >
> > Is anybody ready with a good defense for SatisfiesToast not doing any
> > actual liveliness checks?
>
> I assume that this was installed as a performance optimization, and I
> don't really see why it shouldn't be or be able to be made safe. I
> assume that the wraparound case was deemed safe because at that time
> the idea of 4 billion OIDs getting used with old transactions still
> active seemed inconceivable.

It's not super likely, yea. But you don't really need to "use" 4 billion
oids to get a wraparound. Once you have a significant number of values
in various toast tables, the oid counter progresses really rather fast,
without many writes. That's because the oid counter is global, but each
individual toast write (and other things), perform checks via
GetNewOidWithIndex().

I'm not sure why you think it's safe? Consider the following scenario:

BEGIN;
-- nextoid: 1
INSERT toastval = chunk_id = 1;
ROLLBACK:
...
-- oid counter wraps around
-- nextoid: 1
INSERT toastval = chunk_id = 1;

-- crash, loosing all pending hint bits

SELECT toastval;

The last SELECT might find either of the toasted data chunks, depending
on heap ordering. As they're not hinted as invalid due to the crash,
HeapTupleSatisfiesToast() will return both as visible.

To make that safe we'd at least make hint bit writes by the scan in
GetNewOidWithIndex() durable, and likely also disable the killtuples
optimization; to avoid a plain SELECT of the toast table to make some
tuples unreachable, but not durably hinted.

That seems fairly fragile. I've a significant amount of doubt that toast
reads are bottlenecked by visibility routines.

> It seems to me that the real question
> here is how you're getting two calls to XLogPutNextOid() with the same
> value of ShmemVariableCache->nextOid, and the answer, as it seems to
> me, must be that LWLocks are broken.

There likely were a bunch of crashes in between, Jeff's test suite
triggers them at a high rate. It seems a lot more likely than that an
lwlock bug only materializes in the oid counter. Investigating.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-05-10 16:20:02 Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Previous Message Tomas Vondra 2016-05-10 16:13:01 Re: what to revert