Re: [bug fix] Cascaded standby cannot start after a clean shutdown

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] Cascaded standby cannot start after a clean shutdown
Date: 2018-02-26 08:08:49
Message-ID: 20180226080849.GB1960@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 26, 2018 at 07:25:46AM +0000, Tsunakawa, Takayuki wrote:
> From: Michael Paquier [mailto:michael(at)paquier(dot)xyz]
>> The WAL receiver approach also has a drawback. If WAL is streamed at full
>> speed, then the primary sends data with a maximum of 6 WAL pages.
>> When beginning streaming with a new segment, then the WAL sent stops at
>> page boundary. But if you stop once in the middle of a page then you need
>> to zero-fill the page until the current segment is finished streaming. So
>> if the workload generates spiky WAL then the WAL receiver can would a lot
>> of extra lseek() calls with the patch applied, while all the writes would
>> be sequential on HEAD, so that's not performant-wise IMO.
>
> Does even the non-cascading standby stop in the middle of a page? I
> thought the master always the whole WAL blocks without stopping in the
> middle of a page.

You even have problems on normal standbys. I have a small script which
is able to reproduce that if you want (need a small rewrite as it is
adapted to my test framework) which introduces a garbage set of WAL
segments on a stopped standby. With the small monitoring patch I
mentioned upthread then you can see the XLOG reader finding garbage data
as well before validating the record header. With any fixes on the WAL
receiver, your first patch included, then the garbage read goes away,
and XLOG reader complains about a record with an incorrect length
(invalid record length at XX/YYY: wanted 24, got 0) instead of complains
from header validation part. One key point is to cleanly stop the
primary to as it forces the standby's WAL receiver to write to its WAL
segment in the middle of a page.

>> Another idea I am thinking about would be to zero-fill the segments when
>> recycled instead of being just renamed when doing InstallXLogFileSegment().
>> This would also have the advantage of making the segments ahead more
>> compressible, which is a gain for custom backups, and the WAL receiver does
>> not need any tweaks as it would write the data on a clean file. Zero-filling
>> the segments is done only when a new segment is created (see XLogFileInit).
>
> Yes, I was (and am) inclined to take this approach; this is easy and
> clean, but not good for performance... I hope there's something which
> justifies this approach.

InstallXLogFileSegment uses a plain durable_link_or_rename() to recycle
the past segment which syncs the old segment before the rename anyway,
so the I/O effort will be there, no?

This was mentioned back in 2001 by the way, but this did not count much
for the case discussed here:
https://www.postgresql.org/message-id/24901.995381770%40sss.pgh.pa.us
The issue here is that the streaming case makes it easier to hit the
problem as it opens more easily access to not-completely written WAL
pages depending on the message frequency during replication. At the
same time, we are discussing about a very low-probability issue. Note
that if the XLOG reader is bumping into this problem, then at the next
WAL receiver wake up, recovery would begin from the beginning of the
last segment, and if the primary has produced some more WAL then the
standby would be able to actually avoid the random junk. It is also
possible to bypass the problem by zeroing manually the areas in
question, or to actually wait for the standby to generate more WAL so as
the garbage is overwritten automatically. And you really need to be
very, very unlucky to have random garbage able to bypass the header
validation checks.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2018-02-26 08:13:02 RE: [bug fix] pg_rewind takes long time because it mistakenly copies data files
Previous Message Michael Paquier 2018-02-26 07:46:01 Re: [bug fix] pg_rewind takes long time because it mistakenly copies data files