Re: patch proposal

From: David Steele <david(at)pgmasters(dot)net>
To: Venkata B Nagothi <nag1010(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch proposal
Date: 2017-03-27 12:34:55
Message-ID: 49689209-4a59-ea37-3f40-5e0021061d88@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/26/17 7:34 PM, Venkata B Nagothi wrote:
> Hi David,
>
> On Thu, Mar 23, 2017 at 4:21 AM, David Steele <david(at)pgmasters(dot)net
> <mailto:david(at)pgmasters(dot)net>> wrote:
>
> On 3/21/17 8:45 PM, Venkata B Nagothi wrote:
>
> On Tue, Mar 21, 2017 at 8:46 AM, David Steele
> <david(at)pgmasters(dot)net <mailto:david(at)pgmasters(dot)net>
>
> Unfortunately, I don't think the first patch
> (recoveryStartPoint)
> will work as currently implemented. The problem I see is
> that the
> new function recoveryStartsHere() depends on pg_control
> containing a
> checkpoint right at the end of the backup. There's no guarantee
> that this is true, even if pg_control is copied last. That
> means a
> time, lsn, or xid that occurs somewhere in the middle of the
> backup
> can be selected without complaint from this code depending
> on timing.
>
>
> Yes, that is true. The latest best position, checkpoint
> position, xid
> and timestamp of the restored backup of the backup is shown up
> in the
> pg_controldata, which means, that is the position from which the
> recovery would start.
>
>
> Backup recovery starts from the checkpoint in the backup_label, not
> from the checkpoint in pg_control. The original checkpoint that
> started the backup is generally overwritten in pg_control by the end
> of the backup.
>
>
> Yes, I totally agree. My initial intention was to compare the recovery
> target position(s) with the contents in the backup_label, but, then, the
> checks would fails if the recovery is performed without the backup_label
> file. Then, i decided to compare the recovery target positions with the
> contents in the pg_control file.
>
>
> Which in-turn means, WALs start getting replayed
> from that position towards --> minimum recovery position (which
> is the
> end backup, which again means applying WALs generated between
> start and
> the end backup) all the way through to --> recovery target
> position.
>
>
> minRecoveryPoint is only used when recovering a backup that was made
> from a standby. For backups taken on the master, the backup end WAL
> record is used.
>
> The best start position to check with would the position shown
> up in the
> pg_control file, which is way much better compared to the current
> postgresql behaviour.
>
>
> I don't agree, for the reasons given previously.
>
>
> As explained above, my intention was to ensure that the recovery start
> positions checks are successfully performed irrespective of the presence
> of the backup_label file.
>
> I did some analysis before deciding to use pg_controldata's output
> instead of backup_label file contents.
>
> Comparing the output of the pg_controldata with the contents of
> backup_label contents.
>
> *Recovery Target LSN*
>
> START WAL LOCATION (which is 0/9C000028) in the backup_label =
> pg_controldata's latest checkpoint's REDO location (Latest
> checkpoint's REDO location: 0/9C000028)
>
> *Recovery Target TIME*
>
> backup start time in the backup_label (START TIME: 2017-03-26
> 11:55:46 AEDT) = pg_controldata's latest checkpoint time (Time of
> latest checkpoint : Sun 26 Mar 2017 11:55:46 AM AEDT)
>
> *Recovery Target XID*
>
> To begin with backup_label does contain any start XID. So, the only
> option is to depend on pg_controldata's output.
> After a few quick tests and thorough observation, i do notice that,
> the pg_control file information is copied as it is to the backup
> location at the pg_start_backup. I performed some quick tests by
> running few transactions between pg_start_backup and pg_stop_backup.
> So, i believe, this is ideal start point for WAL replay.
>
> Am i missing anything here ?

You are making assumptions about the contents of pg_control vs.
backup_label based on trivial tests. With PG defaults, the backup must
run about five minutes before the values in pg_control and backup_label
will diverge. Even if pg_control and backup_label do match, those are
the wrong values to use, and will get more incorrect the longer the
backup runs.

I believe there is a correct way to go about this, at least for time and
LSN, and I don't think your very approximate solution will pass muster
with a committer.

Since we are nearly at the end of the CF, I have marked this submission
"Returned with Feedback".

--
-David
david(at)pgmasters(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-03-27 12:36:06 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Michael Paquier 2017-03-27 12:32:00 Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...