Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-11-09 22:24:31
Message-ID: 20171109222431.qwj6dtl5chumc6lt@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The reason for that is that I hadn't yet quite figured out how the bug I
> described in the commit message (and the previously committed testcase)
> would cause that. I was planning to diagnose / experiment more with this
> and write an email if I couldn't come up with an explanation. The
> committed test does *not* actually trigger that.
>
> The reason I couldn't quite figure out how the problem triggers is that
> [ long explanation ]

Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving. I don't see any need for letting this run
with arbitrary permutations.

Thanks to whoever allowed isolationtester permutations to go over
multiple lines and allow comments. I was wondering about adding that as
a feature just to discover it's already there ;)

What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing. There's no way we can fix hot chains after the fact, but
preventing dead rows from reapparing seems important. A minimal version
of that is fairly easy - we slap a bunch of if if
!TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll
often trigger clog access errors when the problem occurred - if we want
to do better we need to pass down relfrozenxid/relminmxid to a few
functions. I'm inclined to do so, but it'll make the patch larger...

Comments?

Greetings,

Andres Freund

Attachment Content-Type Size
freeze-the-dead.spec text/plain 2.2 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2017-11-09 23:10:11 pgsql: Add hash partitioning.
Previous Message Peter Eisentraut 2017-11-09 22:12:42 pgsql: Remove junk left from DSSSL to XSL conversion

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-11-09 22:26:16 Re: OpeSSL - PostgreSQL
Previous Message Andres Freund 2017-11-09 22:21:45 Re: pageinspect option to forgo buffer locking?