Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-11-02 16:44:34
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-committers pgsql-hackers

On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Pushed the reverts.
> I noticed while doing so that REL_10_STABLE contains the bogus commits.
> Does that change our opinion regarding what to do for people upgrading
> to a version containing the broken commits? I don't think so, because
> 1) we hope that not many people will trust their data to 10.0
> immediately after release
> 2) the bug is very low probability
> 3) it doesn't look like we can do a lot about it anyway.

Just to be clear, it looks like "Fix freezing of a dead HOT-updated
tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
was stamped, but "Fix traversal of half-frozen update chains"
(22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
therefore unreleased at present.

Users of 10.0 who hit the code introduced by
46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the
xmax fields of tuples that predate relfrozenxid. Those tuples will be
hinted-committed. That's not good, but it might not really have much
in the way of consequences. *IF* the next VACUUM doesn't get confused
by the old XID, then it will prune the tuple then and I think we'll be
OK. And I think it won't, because it should just call
HeapTupleSatisfiesVacuum() and that should see that
HEAP_XMAX_COMMITTED is set and not actually try to consult the old
CLOG. If that hint bit can ever get lost - or fail to propagate to a
standby - then we have more trouble, but the fact that it's set by a
logged operation makes me hope that can't happen. Furthermore, that
follow-on VACUUM should indeed arrive in due time, because we will not
have marked the page all-visible -- HeapTupleSatisfiesVacuum() will
NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(),
and therefore we will have set all_visible = false.

The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous. The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news. That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs). Fortunately, that commit is (I think) not
released anywhere.

Personally, I think it would be best to push the release out a week.
I think we understand this well enough now that we can fix it
relatively easily, but haste makes bugs, and (I know you're all tired
of hearing me say this) patches that implicate the on-disk format are

Robert Haas
The Enterprise PostgreSQL Company

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-11-02 16:47:05 Re: Removing wal_keep_segments as default configuration in
Previous Message Tom Lane 2017-11-02 16:26:33 Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2017-11-02 16:45:42 pgsql: Remove wal_keep_segments from default configuration in PostgresN
Previous Message Peter Eisentraut 2017-11-02 16:13:51 pgsql: doc: Clarify pgstattuple privileges information