Re: [BUG] pg_basebackup from disconnected standby fails

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(dot)paquier(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] pg_basebackup from disconnected standby fails
Date: 2016-06-15 03:18:12
Message-ID: 20160615.121812.197115872.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, I'm confused about the minRecoveryPoint.

Reconsidered a bit.

At Tue, 14 Jun 2016 20:31:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160614(dot)203111(dot)229211034(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > After looking more closely, I found that the minRecoveryPoint
> > > tends to be too small as the backup end point, and up to the
> > > record at the lastReplayedRecPtr can affect the pages on disk and
> > > they can go into the backup just taken.
> > >
> > > My conclusion here is that do_pg_stop_backup should return
> > > lastReplayedRecPtr, not minRecoveryPoint.
> >
> > I have been thinking quite a bit about this patch, and this logic
> > sounds quite right to me. When stopping the backup we need to let the
> > user know up to which point it needs to replay WAL, and relation pages
> > are touched up to lastReplayedEndRecPtr.
>
> Yes, but by design, the changes in buffers don't go into disk
> until buffer replacing occurs, which updates minRecoveryPoint. My
> understanding is that the problem is that a restart point that is
> not accompanied with buffer updates advances only the redo point
> of the last checkpoint and doesn't update minRecoveryPoint, which
> may be behind the redo point at the time.
>
> It seems to me that we could more agressively advance the
> minRecoveryPoint (but must not let it go too far..), but it is
> right for it to aim a bit smaller than the ideal location.

It's wrong. minRecoveryPoint should be greater than or equal to
the maximum buffer-touching LSN reached in previous recoveries,
and it can be the same to replayEndRecPtr but may be behind it if
no acutual modification on page files is done hereafter. xlog.c
works that way. The value of the minRecoveryPoint smaller than
the redo point of the last checkpoint with no buffer flush is
allowable from this point of view but it is not proper for the
end point of a backup.

If we skip recording the last checkpoint position when it
eventually causes no buffer flush, minRecoveryPoint is again
usable for the purpose. However, it causes repeated restartpoint
trial on the same (skipped) checkpoint record.

As the consequence, we can solve this problemn also by explicitly
updating the minRecoveryPoint for an executed restartpoint
without no buffer flush.

The attached patch performs this way and also solves the problem.

Which one do you think is more preferable? Or any other solution?

This patch updates minRecoeryPoint only for restartpoints that
caused no buffer flush but such restriction might not be
necessary.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Advancing-minRecoveryPoint-for-executed-empty-restar.patch text/x-patch 2.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-06-15 06:20:33 Re: parallel.c is not marked as test covered
Previous Message Tom Lane 2016-06-15 02:56:56 Re: increase message string buffer size of watch command of psql