Re: [BUG] pg_basebackup from disconnected standby fails

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(dot)paquier(at)gmail(dot)com
Cc: amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] pg_basebackup from disconnected standby fails
Date: 2016-07-21 08:19:50
Message-ID: 20160721.171950.15026313.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for the absense. I've reached here.

At Thu, 21 Jul 2016 12:20:30 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqR+krKK9fqwquSJnA2oB-RcQ6Mi8ybz3WL0kCOB5SSZEg(at)mail(dot)gmail(dot)com>
> On Thu, Jul 21, 2016 at 11:56 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Thu, Jul 21, 2016 at 7:28 AM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Wed, Jul 20, 2016 at 8:56 PM, Michael Paquier
> >> <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>>>
> >>>> Yeah, I think that is totally different angle to fix this issue, so
> >>>> don't you think it is better to start a separate thread to discuss
> >>>> about it for 10.0 and mark this patch as ready for committer.
> >>>
> >>> I'd like to tackle this problem in 10.0, but that will strongly depend
> >>> on how my patches move on in CF1 and CF2.
> >>
> >> By the way, thank you for taking the time to provide input. I think
> >> we're in good shape here now.
> >>
> >
> > So, if I understand correctly, then we can mark the version posted by
> > you upthread [1] which includes a test along with Kyotaro's fix can be
> > marked as Ready for committer. If so, then please change the status
> > of patch accordingly.
>
> Oops. I thought you did it already. So done.

Thank you very much for the intensive discussion on this.

After some additional consideration, I attached two patches about
comment and documentation. 0001- is a patch related to the
previous patch that adds description on why we use replay end
point instead of minRecoveryPoint. And 0002- is a fix and an
additional mention on what would happen when pg_control is not
copied last during backup.

==== About the first patch.

Related to the previous patch, I found the following comment just
above where it applies.

xlog.c:10412
- * We return the current minimum recovery point as the backup end
- * location. Note that it can be greater than the exact backup end
- * location if the minimum recovery point is updated after the backup of
- * pg_control. This is harmless for current uses.

I haven't gave a notice on this but this seems to be necessary to
edit so as to mention this fix and the gap like the following.

+ * minRecoveryPoint can go behind the last checkpoint's redo location when
+ * the checkpoint writes out no buffer. This does no harm to performing a
+ * recovery but such inversion seems inconsistent from the view of the
+ * callers and prevents them from knowing WAL segments needed by sane
+ * calcuation. For the reason we return the last replayed point as the
+ * backup end location. Note that it can be greater than the exact backup
+ * end location or even the minimum recovery point of pg_control at the
+ * time. This is harmless for current uses.

==== About the second patch.

By the way, related to the following discussion in the upthread,

At Tue, 19 Jul 2016 14:13:36 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqRXMkR2rq7_Wa1aZZqBFVRUwU_5QTBr_kEQqWGUEjAaAQ(at)mail(dot)gmail(dot)com>
> The thing that is really annoying btw is that there will be always a
> gap between minRecoveryPoint and the actual moment where a backup
> finishes because there is no way to rely on the XLOG_BACKUP_END
> record. On top of that we can not be sure if pg_control has been
> backed up last or not. Which is why it would be cool to document that
> gap.

Even with out my previous patch, the gap between minRecoveryPoint
*when pg_control is backed up* and that (or the replay end) at
the end of backup is crucial and should be back-patched.

Addition to that, while I examined the documentation I found the
following description about pg_stop_backup in
continuous-archiving.html, which contradicts to its definition.

> In the same connection as before, issue the command:
>
> SELECT * FROM pg_stop_backup(false);
...
> The pg_stop_backup will return one row with three values.

AFAICS pg_stop_backup returns a single value of LSN. I don't know
where this comes from, but the attached patch fixes this and adds
a mention on the "gap". The original description mentioned
backup_label and tablespace_map but it seems not necessary. The
following is the new content rewritten by the patch.

| The pg_stop_backup will return the LSN when it is called. This
| LSN informs upto where the backup needs WAL segment files to
| complete. For a backup taken from a master, this function leaves
| a backup history file to inform the segment files needed and a
| server starts from the backup performes a recovery up to where a
| backup-end WAL record to reach a consistent state. But things are
| different for a backup taken from a standby. For the case, backup
| history file won't be created and recvoery is perfomed up to the
| Minimum-recovery-ending-location field in the pg_control file
| included in the backup instead. The location is properly stored
| in the backup if you copy the the pg_control file after all the
| other files as pg_basebackup does. Note that a hot standby
| started from a backup not taken in that manner will consider to
| have reached a consistent state mistakenly earlier.
|
| Once the WAL segment files active during the backup are archived,
| you are done. The file contains the LSN returned by
| pg_stop_backup is the last segment that is required to form a
| complete set of backup files.

The "LSN" in the first line is intentionally ambiguated. It may
seems too much to write here.

> Another crazy idea would be to return pg_control as an extra
> return field of pg_stop_backup() and encourage users to write that
> back in the backup itself. This would allow closing any hole in the
> current logic for backups taken from live standbys: minRecoveryPoint
> would be updated directly to the last replayed LSN/TLI in the control
> file.

It seems to be doable at do_pg_*_backup level. non-exclusive
backup code does the same thing to backup_label but no usage
seen.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Add-the-reason-why-we-use-replayEndRecPtr-as-recover.patch text/x-patch 1.7 KB
0002-Fix-documentation-about-pg_stop_backup.patch text/x-patch 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-07-21 09:50:07 Re: asynchronous and vectorized execution
Previous Message Amit Kapila 2016-07-21 07:49:11 Re: PoC: Make it possible to disallow WHERE-less UPDATE and DELETE