Re: [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
Date: 2017-10-19 16:03:47
Message-ID: CAH2-Wz=xb_7cSHeqG7h8-h=941Yq7oPx7v8Rojv=sxJ=_-yBjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Thu, Oct 19, 2017 at 7:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The commit message for a5736bf7 doesn't say anything about a race; it
> just claims that it is fixing traversal of half-frozen update chains,
> without making any reference to how such a thing as a half-frozen
> update chain came to exist in the first place. I think the commit
> that talks about that race condition is 20b65522.

Well, the race was that the wrong tuple is pruned, because of naivety
in the heap_prune_chain() update chain handling, which is what started
work that became commit a5736bf7. That's by far the worst consequence
of the traversal of half-frozen update chain business that a5736bf7's
commit message describes, since we know it leads to wrong answers to
index scans (and REINDEX fails with "failed to find parent tuple").
It's the only consequence that's been directly observed, but probably
not the only one that's possible. Most other places that suffer the
problem are in obscure-to-users paths like EvalPlanQual().

> /me studies the problem for a while.
>
> What's bothering me about this is: how is cutoff_xid managing to be a
> new enough transaction ID for this to happen in the first place? The
> cutoff XID should certainly be older than anything any current
> snapshot can see, because it's computed using GetOldestXmin() -- which
> means that XIDs older than the cutoff can't still be running (except
> maybe if old_snapshot_threshold is set to a non-default value).

The problem that we directly observed during pruning, which caused
"failed to find parent tuple" even after the first freeze-the-dead
commit (commit 20b65522), was that the mixture of frozen and unfrozen
tuples in a hot chain caused heap_prune_chain() to make an incorrect
conclusion about the continuity of a HOT chain. We'd prune the wrong
tuples -- it failed to find the visible tuple at the end of the HOT
chain, and respect that that meant it could not prune down to a stub
ItemId (blow away what it *incorrectly* thought was an entire HOT
chain).

Now that the problem is fixed by a5736bf7, this test case will prune
and get an LP_REDIRECT ItemId (not an LP_DEAD one), as we're no longer
confused about the continuity of HOT chains within heap_prune_chain().

--
Peter Geoghegan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Geoghegan 2017-10-19 20:44:01 Re: [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
Previous Message Tom Lane 2017-10-19 15:16:23 pgsql: Fix incorrect link in v10 release notes.

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-10-19 16:46:36 Re: Fix performance degradation of contended LWLock on NUMA
Previous Message Robert Haas 2017-10-19 15:44:26 Re: Supporting Windows SChannel as OpenSSL replacement