Re: First-draft release notes for back-branch releases

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: First-draft release notes for back-branch releases
Date: 2018-11-07 04:49:15
Message-ID: 20181107044915.GF1677@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 07, 2018 at 02:06:13AM +0000, Andrew Gierth wrote:
> So the minimum recovery location is recorded as 0x201FCFE0, but there
> are data pages on disk with LSNs as recent as 0x25BAFE80. That's a whole
> lot of daylight that could contain a btree delete.

Sorry for the time it took. I was just testing what you reported by
myself, and indeed I can see that even with a clean shutdown on the
standby there could be a gap between minRecoveryPoint and the oldest
LSNs used with pages flushed. I have hacked a bit pg_verify_checksums
so as it is able to list all the LSNs of relations on disk and compared
that with what the control file reported, and there are some
mismatches.

I can also see that HEAD is able to handle things correctly. How to
warn about that in the release notes though? The issue is really to
be careful with clients connecting aggressively to a hot standby if the
problem shows up.

I have designed a test case for that stuff which allows me to reproduce
the problem easily, but this requires a small patch on
pg_verify_checksums to make sure that all on-disk pages do not have a
LSN newer than minRecoveryPoint. This tool could be useful for other
purposes like checking the sanity of a data folder.

So a TAP test could consist in the following actually:
1) On primary
create table aa (a int) with (fillfactor = 10);
insert into aa values (generate_series(1,1000));
checkpoint;
-- generates post-checkpoint FPWs which standby replays.
update aa set a = a + 1;
2) On standby, fill in buffers:
select count(*) from aa;
3) Primary again, no FPWs this time:
update aa set a = a + 1;
4) Standby, restart point which flushes control file:
checkpoint;
5) Shutdown primary with immediate mode.
6) Shutdown standby with fast mode.
7) Check state of control file with on-disk files on standby.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-11-07 06:34:38 Re: move PartitionBoundInfo creation code
Previous Message Thomas Munro 2018-11-07 04:35:18 Re: Copy data to DSA area