Quick Links

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 01:09:53
Message-ID:	10CE2AE7-B1AC-4E81-BF00-E9EF4BA855EE@amazon.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers pgsql-hackers

I’ve just started looking at this again after a few weeks break.

There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is:
pruneheap.c:

/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
break;

chainitems[nchain++] = offnum;

The priorXmax is a multixact key share lock, thus not equal to xmin. This terminates the scan. Unlike the scan termination with “if (!recent_dead) break;” below the switch (...SatisiesVacuum…), this “break” does not put the offnum into the chain even though it is in the chain. If the first not-deleted item isn’t put in the chain then we’ll not call heap_prune_record_redirect().

I do not know what the above ‘if’ test is protecting. Normally the xmin is equal to the priorXmax. Other than protecting against corruption a key share lock can cause this. So, I tried a fix which does the “if” check after adding it to chainitems. This will break whatever real situation this IF was protecting. Tom Lane put this in.

OK: With that hack of a fix the redirect now works correctly. However, we still get the reindex problem with not finding the parent. That problem is caused by:
Pruneheap.c:heap_get_root_tuples()

if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;

In this case, instead of these not being equal because of a multixact key share lock, it is because XMIN is frozen and FrozenTransactionId doesn’t equal the priorXmax. Thus, we don’t build the entire chain from root to most current row version and this causes the reindex failure.

If we disable this ‘if’ as a hack then we no longer get a problem on the reindex. However, YiWen reported that at the end of an install check out index checking reported corruption in the system catalogues. So we are still looking.

We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?

FYI, someone should look at the same ”if” test in heapam.c: heap_lock_updated_tuple_rec(). Also, I hope there are no strange issues with concurrent index builds.

Finally, the idea behind the original fix was to simply NOT to do an unsupported freeze on a dead tuple. It had two drawbacks:
1) CLOG truncation. This could have been handled by keeping track of the old unpruned item found and using that to update the table’s/DB’s freeze xid.
2) Not making freeze progress. The only reason the prune would fail should be because of an open TXN. Unless that TXN was so old such that it’s XID was as old as the ?min freeze threshold? then we would make progress. If we were doing TXN’s that old then we’d be having problems anyway.

On 10/3/17, 5:15 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:

On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I now think that it actually is a VACUUM problem, specifically a
> problem with VACUUM pruning. You see the HOT xmin-to-xmax check
> pattern that you mentioned within heap_prune_chain(), which looks like
> where the incorrect tuple prune (or possibly, at times, redirect?)
> takes place. (I refer to the prune/kill that you mentioned today, that
> frustrated your first attempt at a fix -- "I modified the multixact
> freeze code...".)

My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.

> The attached patch "fixes" the problem -- I cannot get amcheck to
> complain about corruption with this applied. And, "make check-world"
> passes. Hopefully it goes without saying that this isn't actually my
> proposed fix. It tells us something that this at least *masks* the
> problem, though; it's a start.

Yep.

> FYI, the repro case page contents looks like this with the patch applied:
> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
> to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
> lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
> ----+----------+---------+--------+--------+----------+-----------
> 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
> 2 | 2 | | | | |
> 3 | 0 | | | | |
> 4 | 0 | | | | |
> 5 | 0 | | | | |
> 6 | 0 | | | | |
> 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
> (7 rows)

Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.

- * Check the tuple XMIN against prior XMAX, if any
- */
- if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
- break;
If you remove this check, you could also remove completely priorXmax.

Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
@@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
BufferGetBlockNumber(buffer));
offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
- priorXmax = HeapTupleHeaderGetUpdateXid(htup);
+ if (!tupdead)
+ priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
--
Michael

In response to

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple at 2017-10-04 00:15:44 from Michael Paquier

Responses

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple at 2017-10-04 01:19:56 from Peter Geoghegan
Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple at 2017-10-04 09:51:59 from Alvaro Herrera

Browse pgsql-committers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2017-10-04 01:13:24	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message	Michael Paquier	2017-10-04 00:15:44	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2017-10-04 01:13:24	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message	Michael Paquier	2017-10-04 00:54:24	Re: [Proposal] Allow users to specify multiple tables in VACUUM commands