Re: First-draft release notes for back-branch releases

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: First-draft release notes for back-branch releases
Date: 2018-11-06 23:44:56
Message-ID: 87zhulzsr5.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> You could be bit by any shutdown of the old code, no, whether it's
Tom> part of a pg_upgrade or not?

Nothing to do with pg_upgrade, this is likely to bite people just doing
an update from the previous minor release.

Tom> Also, it looks like the bug only affects standbys (or at least
Tom> that's what the commit message seems to imply), which makes it
Tom> less of a data-loss hazard than it might've been.

The commit message doesn't really show the severity of the problem at
all.

The problem is this: the updating of minRecoveryPoint in the control
file is almost completely broken in the last point releases. It's not an
"incorrect calculation" as the commit message says, it's that the
bgwriter and checkpointer _do not update the value at all_ except
immediately after a checkpoint. That means that it is common to have a
situation where the recovery restartpoint is at lsn X, the
minRecoveryPoint is at a slightly later lsn Y, but there are on-disk
data pages with a _much_ later lsn Z.

If such a data page was the subject of a Btree/DELETE record, then any
attempt to do recovery will potentially PANIC with a (false) "WAL
contains references to invalid pages" error -- if, and only if, at least
one client (e.g. a monitoring system) is connected when the record is
replayed, which is possible because of the incorrect minRecoveryPoint.

The users whose case I was diagnosing on IRC were finding that their
monitoring system was sufficient to trigger the problem at least 80% of
the time. Consider that the broken minRecoveryPoint can be quite a long
way in the past relative to on-disk data pages, so the window of
vulnerability isn't necessarily small.

So while there _probably_ isn't any data corruption, the standby can get
into a state that isn't restartable unless you know to block client
connections to it until it has caught up. Rebuilding the standby from
the master will work but that may be a significant practical problem if
the data is large.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ideriha, Takeshi 2018-11-06 23:55:06 RE: Number of buckets/partitions of dshash
Previous Message Andres Freund 2018-11-06 22:46:06 Re: Cache relation sizes?