Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-10-04 03:43:00
Message-ID: 1507088579842.74063@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
> break;

So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened

The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro... I've run it a couple of times and have consistently gotten the following page

lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
----+--------+--------+----------+------------+-------------
1 | (0,1) | 8152 | 1 | 2818 | 3
2 | | 7 | 2 | |
3 | | 0 | 0 | |
4 | | 0 | 0 | |
5 | | 0 | 0 | |
6 | | 0 | 0 | |
7 | (0,7) | 8112 | 1 | 11010 | 32771
(7 rows)

I've made this change to conditions in both heap_prune_chain and heap_get_root_tuples and this seems to cause things to pass my smoke tests.

I'll look into this deeper tomorrow.

Yi Wen
________________________________________
From: Peter Geoghegan <pg(at)bowt(dot)ie>
Sent: Tuesday, October 3, 2017 6:19 PM
To: Wood, Dan
Cc: Michael Paquier; Alvaro Herrera; PostgreSQL Hackers; Wong, Yi Wen
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> I’ve just started looking at this again after a few weeks break.

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
> break;

> We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".

--
Peter Geoghegan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-10-04 04:45:20 pgsql: Adjust git_changelog for new-style release tags.
Previous Message Wood, Dan 2017-10-04 01:25:12 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-10-04 04:37:42 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Tom Lane 2017-10-04 01:46:26 Re: Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)