Re: patch proposal

From: David Steele <david(at)pgmasters(dot)net>
To: Venkata B Nagothi <nag1010(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch proposal
Date: 2017-03-22 17:21:20
Message-ID: f87fa0a5-d721-a883-3e00-a5d44db7547b@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/21/17 8:45 PM, Venkata B Nagothi wrote:
> On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david(at)pgmasters(dot)net
>
> Unfortunately, I don't think the first patch (recoveryStartPoint)
> will work as currently implemented. The problem I see is that the
> new function recoveryStartsHere() depends on pg_control containing a
> checkpoint right at the end of the backup. There's no guarantee
> that this is true, even if pg_control is copied last. That means a
> time, lsn, or xid that occurs somewhere in the middle of the backup
> can be selected without complaint from this code depending on timing.
>
>
> Yes, that is true. The latest best position, checkpoint position, xid
> and timestamp of the restored backup of the backup is shown up in the
> pg_controldata, which means, that is the position from which the
> recovery would start.

Backup recovery starts from the checkpoint in the backup_label, not from
the checkpoint in pg_control. The original checkpoint that started the
backup is generally overwritten in pg_control by the end of the backup.

> Which in-turn means, WALs start getting replayed
> from that position towards --> minimum recovery position (which is the
> end backup, which again means applying WALs generated between start and
> the end backup) all the way through to --> recovery target position.

minRecoveryPoint is only used when recovering a backup that was made
from a standby. For backups taken on the master, the backup end WAL
record is used.

> The best start position to check with would the position shown up in the
> pg_control file, which is way much better compared to the current
> postgresql behaviour.

I don't agree, for the reasons given previously.

> The tests pass (or rather fail as expected) because they are written
> using values before the start of the backup. It's actually the end
> of the backup (where the database becomes consistent on recovery)
> that defines where PITR can start and this distinction definitely
> matters for very long backups. A better test would be to start the
> backup, get a time/lsn/xid after writing some data, and then make
> sure that time/lsn/xid is flagged as invalid on recovery.
>
> The current behaviour is that, if the XID shown-up by the pg_controldata
> is say 1400 and the specified recovery_target_xid is say 200, then,
> postgresql just completes the recovery and promotes at the immediate
> consistent recovery target, which means, the DBA has to all the way
> start the restoration and recovery process again to promote the database
> at the intended position.

I'm not saying that the current behavior is good, only that the proposed
behavior is incorrect as far as I can tell.

> It is also problematic to assume that transaction IDs commit in
> order. If checkPoint.latestCompletedXid contains 5 then a recovery
> to xid 4 may or may not be successful depending on the commit
> order. However, it appears in this case the patch would disallow
> recovery to 4.
>
> If the txid 4 is the recovery target xid, then, the appropriate backup
> taken previous to txid 4 must be used or as an alternative recovery
> target like recovery_target_timestamp must be used to proceed to the
> respective recovery target xid.

I'm not sure I follow you here, but I do know that using the last
committed xid says little about which xids precede or follow it.

> I think this logic would need to be baked into recoveryStopsAfter()
> and/or recoveryStopsBefore() in order to work. It's not clear to me
> what that would look like, however, or if the xid check is even
> practical.
>
>
> The above two functions would determine if the recovery process has to
> stop at a particular position or not and the patch has nothing to do
> with it.
>
> To summarise, this patch only determines if the WAL replay has to start
> at all. In other words, if the specified recovery target falls prior to
> the restored database position, then, the WAL replay should not start,
> which prevent the database from getting promoted at an unintended
> recovery target.

I understand what you are trying to do. My point is that the
information you are working from (whatever checkpoint happens to be in
pg_control when recovery starts) is not sufficient for the task. This
method is inaccurate and would disallow valid recovery scenarios.

I suggest doing a thorough read-through of StartupXLOG(), particularly
the section where the backup_label is loaded.

Thanks,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-03-22 17:22:08 Re: increasing the default WAL segment size
Previous Message Robert Haas 2017-03-22 17:16:52 Re: Partition-wise join for join between (declaratively) partitioned tables