Re: First-draft release notes for back-branch releases

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: First-draft release notes for back-branch releases
Date: 2018-11-07 01:17:37
Message-ID: 20181107011737.GD1677@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 06, 2018 at 11:44:56PM +0000, Andrew Gierth wrote:
> The commit message doesn't really show the severity of the problem at
> all.

I take the blame for that. And my apologies for what it's worth.

> The users whose case I was diagnosing on IRC were finding that their
> monitoring system was sufficient to trigger the problem at least 80% of
> the time. Consider that the broken minRecoveryPoint can be quite a long
> way in the past relative to on-disk data pages, so the window of
> vulnerability isn't necessarily small.

The first report after the last point release on the matter is here, and
those folks had exactly the same symptoms with clients aggressively
connecting to the standby:
https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org

And this came out pretty quickly.

> So while there _probably_ isn't any data corruption, the standby can get
> into a state that isn't restartable unless you know to block client
> connections to it until it has caught up. Rebuilding the standby from
> the master will work but that may be a significant practical problem if
> the data is large.

The problem would show up if you enforce a crash recovery when
restarting the standby, not after when letting it shut down cleanly.
Corruptions could actually happen if you try to promote the standby
before it reaches the actual recovery LSN when it failed to update
minRecoveryPoint after it performed a crash recovery. However this is
proving to be a problem only if have a standby do a crash recovery and a
promotion immediately afterwards, which does not happen when recovering
from a backup as well as the minimum recovery LSN comes from the backup
end record, not from the control file.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-11-07 01:45:57 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation
Previous Message Imai, Yoshikazu 2018-11-07 01:00:17 RE: speeding up planning with partitions