Re: [BUG] pg_basebackup from disconnected standby fails

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [BUG] pg_basebackup from disconnected standby fails
Date: 2016-10-24 04:39:08
Message-ID: CAA4eK1+bas_xDo3S5R6o6HKVtQu8K87RcHfo2NrAffaRtyEcSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Thank you for looking and retelling this.
>
> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoaRdyfCqcg9x-E99XCxdC4LW0zygP5qLsPhFsb6BYDKow(at)mail(dot)gmail(dot)com>
>> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> >> So, if I understand correctly, then we can mark the version posted by
>> >> you upthread [1] which includes a test along with Kyotaro's fix can be
>> >> marked as Ready for committer. If so, then please change the status
>> >> of patch accordingly.
>> >
>> > Patch moved to next CF 2016-11, still with status "ready for committer".
>> >
>> > IMO, this thread has reached as conclusion to use this patch to
>> > address the problem reported:
>> > CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q(at)mail(dot)gmail(dot)com
>>
>> I spent some time reading this thread today and trying to understand
>> exactly what's going on here. Here's my attempt to restate the
>> problem:
>>
>> 1. If a restartpoint flushes no dirty buffers, then the redo location
>> advances but the minimum recovery point does not; therefore, the
>> minimum recovery point can fall behind the redo location. That's
>> correct behavior, because if the standby is subsequently restarted, it
>> can correctly begin at the checkpoint record associated with the
>> restartpoint and need not replay any further.
>
> Yes.
>
>> 2. However, it causes a problem when a base backup with the "include
>> WAL" option is taken from a standby because -- on standby servers only
>> -- the end location for a backup as returned by do_pg_stop_backup() is
>> the minimum recovery point. If this precedes the start point for the
>> backup -- which is the restart point -- perform_base_backup() will
>> conclude that no WAL files are required for the backup and fail an
>> internal sanity check.
>
> Yes. The option implies that the copied $PGDATA is a standalone
> backup, that is, usable to start a standby instantly so at least
> one WAL segment is needed, as mentioned in 3 below.
>
>> 3. Removing the sanity check wouldn't help, because it can never be
>> right to end up with zero WAL files as a result of a base backup. At
>> a minimum, we need the WAL file which contains the checkpoint record
>> from which the control file specifies that redo should start.
>> Actually, I think that checkpoint record could be spread across
>> multiple files: it might be big. We need them all, but the current
>> code won't ensure that we get them.
>
> Yes. The "checkpoit records" means that records during a checkpoint.
>
>> The consensus solution on this thread seems to be that we should have
>> pg_do_stop_backup() return the last-replayed XLOG location as the
>> backup end point. If the control file has been updated with a newer
>> redo location, then the associated checkpoint record has surely been
>> replayed, so we'll definitely have enough WAL to replay that
>> checkpoint record (and we don't need to replay anything else, because
>> we're supposing that this is the case where the minimum recovery point
>> precedes the redo location). Although this will work, it might
>> include more WAL in the backup than is strictly necessary. If the
>> additional WAL replayed beyond the minimum recovery point does NOT
>> include a checkpoint, we don't need any of it; if it does, we need
>> only the portion up to and including the last checkpoint record, and
>> not anything beyond that.
>
> StartupXLOG mandates the existence of the start record of the
> latest checkpoint. pg_start_backup() returns the start point
> (redo location) of the restartpoint that it requested and
> minRecoveryPoint is always after the replpayed checkpoint so it
> works if always was going well.
>
> But if the checkpoint contains no update, the restart point
> exceeds minRecoveryPoint. Then no WAL copied.
>
>> I can think of two solutions that would be "tighter":
>>
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record. I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery. A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
>
> This is a candidate that I thought of. But I avoided to change
> the behavior of minRecoveryPoint that (seems to me that) it
> describes only buffer state. So checkpoint with no update doesn't
> advances minRecoveryPoint as my understanding.
>

I think what you are saying is not completely right, because we do
update minRecoveryPoint when we don't perform a new restart point.
When we perform restart point, then it assumes that flushing the
buffers will take care of updating minRecoveryPoint.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-10-24 04:46:39 Re: pg_basebackup stream xlog to tar
Previous Message Andres Freund 2016-10-24 04:38:33 Re: pg_basebackup stream xlog to tar