Re: patch proposal

From: Venkata B Nagothi <nag1010(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch proposal
Date: 2017-03-26 23:34:36
Message-ID: CAEyp7J9u8AqLSHK5Ce9bOvz51yG0-R0TJtWfLf353OSLDeb1+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

On Thu, Mar 23, 2017 at 4:21 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 3/21/17 8:45 PM, Venkata B Nagothi wrote:
>
>> On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david(at)pgmasters(dot)net
>>
>> Unfortunately, I don't think the first patch (recoveryStartPoint)
>> will work as currently implemented. The problem I see is that the
>> new function recoveryStartsHere() depends on pg_control containing a
>> checkpoint right at the end of the backup. There's no guarantee
>> that this is true, even if pg_control is copied last. That means a
>> time, lsn, or xid that occurs somewhere in the middle of the backup
>> can be selected without complaint from this code depending on timing.
>>
>>
>> Yes, that is true. The latest best position, checkpoint position, xid
>> and timestamp of the restored backup of the backup is shown up in the
>> pg_controldata, which means, that is the position from which the
>> recovery would start.
>>
>
> Backup recovery starts from the checkpoint in the backup_label, not from
> the checkpoint in pg_control. The original checkpoint that started the
> backup is generally overwritten in pg_control by the end of the backup.

Yes, I totally agree. My initial intention was to compare the recovery
target position(s) with the contents in the backup_label, but, then, the
checks would fails if the recovery is performed without the backup_label
file. Then, i decided to compare the recovery target positions with the
contents in the pg_control file.

> Which in-turn means, WALs start getting replayed
>> from that position towards --> minimum recovery position (which is the
>> end backup, which again means applying WALs generated between start and
>> the end backup) all the way through to --> recovery target position.
>>
>
> minRecoveryPoint is only used when recovering a backup that was made from
> a standby. For backups taken on the master, the backup end WAL record is
> used.
>
> The best start position to check with would the position shown up in the
>> pg_control file, which is way much better compared to the current
>> postgresql behaviour.
>>
>
> I don't agree, for the reasons given previously.
>

As explained above, my intention was to ensure that the recovery start
positions checks are successfully performed irrespective of the presence of
the backup_label file.

I did some analysis before deciding to use pg_controldata's output instead
of backup_label file contents.

Comparing the output of the pg_controldata with the contents of
backup_label contents.

*Recovery Target LSN*

START WAL LOCATION (which is 0/9C000028) in the backup_label =
pg_controldata's latest checkpoint's REDO location (Latest checkpoint's
REDO location: 0/9C000028)

*Recovery Target TIME*

backup start time in the backup_label (START TIME: 2017-03-26 11:55:46
AEDT) = pg_controldata's latest checkpoint time (Time of latest checkpoint
: Sun 26 Mar 2017 11:55:46 AM AEDT)

*Recovery Target XID*

To begin with backup_label does contain any start XID. So, the only option
is to depend on pg_controldata's output.
After a few quick tests and thorough observation, i do notice that, the
pg_control file information is copied as it is to the backup location at
the pg_start_backup. I performed some quick tests by running few
transactions between pg_start_backup and pg_stop_backup. So, i believe,
this is ideal start point for WAL replay.

Am i missing anything here ?

The tests pass (or rather fail as expected) because they are written
>> using values before the start of the backup. It's actually the end
>> of the backup (where the database becomes consistent on recovery)
>> that defines where PITR can start and this distinction definitely
>> matters for very long backups. A better test would be to start the
>> backup, get a time/lsn/xid after writing some data, and then make
>> sure that time/lsn/xid is flagged as invalid on recovery.
>>
>> The current behaviour is that, if the XID shown-up by the pg_controldata
>> is say 1400 and the specified recovery_target_xid is say 200, then,
>> postgresql just completes the recovery and promotes at the immediate
>> consistent recovery target, which means, the DBA has to all the way
>> start the restoration and recovery process again to promote the database
>> at the intended position.
>>
>
> I'm not saying that the current behavior is good, only that the proposed
> behavior is incorrect as far as I can tell.

Please consider my explanation above and share your thoughts.

>
> It is also problematic to assume that transaction IDs commit in
>> order. If checkPoint.latestCompletedXid contains 5 then a recovery
>> to xid 4 may or may not be successful depending on the commit
>> order. However, it appears in this case the patch would disallow
>> recovery to 4.
>
>
>> If the txid 4 is the recovery target xid, then, the appropriate backup
>> taken previous to txid 4 must be used or as an alternative recovery
>> target like recovery_target_timestamp must be used to proceed to the
>> respective recovery target xid.
>>
>
> I'm not sure I follow you here, but I do know that using the last
> committed xid says little about which xids precede or follow it.

Yes, I agree, latestCompletedXID only says about the latest completed XID
and has nothing to do with the order of XIDs being committed. It just shows
the latest completed one and that is the natural and expected behaviour -
isn't it ?

Sure. I will try to explain. The point i wanted to make is that, it is kind
of difficult to identify the XIDs which got committed in the order and the
only option we have is to look at the pg_controldata information and also,
backup_label file does not contain any such information about XID.

Generally, in real-time when DBAs intend to perform recovery until a
particular point-in-time by choosing a recovery target in the form of XID,
TIMESTAMP or LSN depending on what ever information they have or whatever
target they feel is appropriate to use. If the XID seems to be
in-appropriate to use (may be due to the order in which they got
committed), then, the equivalent target like timestamp would be chosen to
perform recovery. Hope i am clear here.

Having said that, the only other options which i see here is to use
oldestXID, oldestActiveXID or NextXID and none of them seem appropriate to
use.

> I think this logic would need to be baked into recoveryStopsAfter()
>> and/or recoveryStopsBefore() in order to work. It's not clear to me
>> what that would look like, however, or if the xid check is even
>> practical.
>>
>>
>> The above two functions would determine if the recovery process has to
>> stop at a particular position or not and the patch has nothing to do
>> with it.
>>
>> To summarise, this patch only determines if the WAL replay has to start
>> at all. In other words, if the specified recovery target falls prior to
>> the restored database position, then, the WAL replay should not start,
>> which prevent the database from getting promoted at an unintended
>> recovery target.
>>
>
> I understand what you are trying to do. My point is that the information
> you are working from (whatever checkpoint happens to be in pg_control when
> recovery starts) is not sufficient for the task. This method is inaccurate
> and would disallow valid recovery scenarios.
>

Please consider my explanations above.

>
> I suggest doing a thorough read-through of StartupXLOG(), particularly the
> section where the backup_label is loaded.
>

As communicated above, not choosing to use the backup_label file contents
for recovery start point check was intentional. I will be happy to hear any
comments/objections on that.

Also, the patch does not apply to the latest master and attached is the
updated patch re-based with the latest master. I have also attached the
re-based 2nd patch.

Thank you again for your comments on my patch.

Regards,

Venkata B N
Database Consultant

Attachment Content-Type Size
recoveryStartPoint.patch-version5 application/octet-stream 14.9 KB
recoveryTargetIncomplete.patch-version5 application/octet-stream 17.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-03-26 23:43:22 Re: crashes due to setting max_parallel_workers=0
Previous Message Tomas Vondra 2017-03-26 23:23:12 Re: crashes due to setting max_parallel_workers=0