Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject: Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-10-03 23:10:20
Message-ID: CAH2-WznBLP5s2fvUuwUyNJC6sSRUDkVCMcXjCuL7nTShqi4jig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> But that still doesn't fix the problem;
> as far as I can see, vacuum removes the root of the chain, not yet sure
> why, and then things are just as corrupted as before.

Are you sure it's not opportunistic pruning? Another thing that I've
noticed with this problem is that the relevant IndexTuple will pretty
quickly vanish, presumably due to LP_DEAD setting (but maybe not
actually due to LP_DEAD setting).

(Studies the problem some more...)

I now think that it actually is a VACUUM problem, specifically a
problem with VACUUM pruning. You see the HOT xmin-to-xmax check
pattern that you mentioned within heap_prune_chain(), which looks like
where the incorrect tuple prune (or possibly, at times, redirect?)
takes place. (I refer to the prune/kill that you mentioned today, that
frustrated your first attempt at a fix -- "I modified the multixact
freeze code...".)

The attached patch "fixes" the problem -- I cannot get amcheck to
complain about corruption with this applied. And, "make check-world"
passes. Hopefully it goes without saying that this isn't actually my
proposed fix. It tells us something that this at least *masks* the
problem, though; it's a start.

FYI, the repro case page contents looks like this with the patch applied:

postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+---------+--------+--------+----------+-----------
1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
2 | 2 | | | | |
3 | 0 | | | | |
4 | 0 | | | | |
5 | 0 | | | | |
6 | 0 | | | | |
7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
(7 rows)

--
Peter Geoghegan

Attachment Content-Type Size
suppress-bad-prune.patch text/x-patch 662 bytes

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2017-10-04 00:15:44 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Tom Lane 2017-10-03 22:53:53 pgsql: Allow multiple tables to be specified in one VACUUM or ANALYZE c

Browse pgsql-hackers by date

  From Date Subject
Next Message Sean Chittenden 2017-10-03 23:11:57 Re: [PATCH] BUG #13416: Postgres =?utf-8?Q?>=3D_?=9.3 doesn't use optimized shared memory on Solaris anymore
Previous Message Tom Lane 2017-10-03 22:59:36 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands