Re: [BUG] pg_basebackup from disconnected standby fails

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] pg_basebackup from disconnected standby fails
Date: 2016-06-10 08:39:59
Message-ID: CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, I found that pg_basebackup from a replication standby
> fails after the following steps, on 9.3 and the master.
>
> - start a replication master
> - start a replication standby
> - stop the master in the mode other than immediate.
>
> pg_basebackup to the standby will fail with the following error.
>
>> pg_basebackup: could not get transaction log end position from server:
>> ERROR: could not find any WAL files

Indeed, and you could just do the following to reproduce the failure
with the recovery test suite, so I would suggest adding this test in
the patch:
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
# pg_basebackup works on a standby).
$node_standby_1->backup($backup_name);

+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;
+

> After looking more closely, I found that the minRecoveryPoint
> tends to be too small as the backup end point, and up to the
> record at the lastReplayedRecPtr can affect the pages on disk and
> they can go into the backup just taken.
>
> My conclusion here is that do_pg_stop_backup should return
> lastReplayedRecPtr, not minRecoveryPoint.

I have been thinking quite a bit about this patch, and this logic
sounds quite right to me. When stopping the backup we need to let the
user know up to which point it needs to replay WAL, and relation pages
are touched up to lastReplayedEndRecPtr. This LSN could be greater
than the minimum recovery point as there is no record to mark the end
of the backup, and pg_control has normally, well surely, being backup
up last but that's not a new problem it would as well have been backup
up before the minimum recovery point has been reached...

Still, perhaps we could improve the documentation regarding that? Say
it is recommended to enforce the minimum recovery point in pg_control
to the stop backup LSN to ensure that the backup recovers to a
consistent state when taking a backup from a standby.

I am attaching an updated patch with the test btw.
--
Michael

Attachment Content-Type Size
backup-standby-v2.patch binary/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-06-10 08:46:44 Re: Reviewing freeze map code
Previous Message John R Pierce 2016-06-10 08:24:33 Re: Online DW