Re: patch proposal

From: Venkata B Nagothi <nag1010(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch proposal
Date: 2017-03-22 00:45:40
Message-ID: CAEyp7J-7q1_VmyGtnjMNgQnn6korSzBhRz6SM5EkBPmcScJS2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> Hi Venkata,
>
> On 2/28/17 11:59 PM, Venkata B Nagothi wrote:
>
>> On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi <nag1010(at)gmail(dot)com
>> <mailto:nag1010(at)gmail(dot)com>> wrote:
>> On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david(at)pgmasters(dot)net
>> <mailto:david(at)pgmasters(dot)net>> wrote:
>>
>> Do you know when those will be ready?
>>
>> Attached are both the patches with tests included.
>>
>
> Thanks for adding the tests. They bring clarity to the patch.
>

Thank you again reviewing the patch and your comments !

> Unfortunately, I don't think the first patch (recoveryStartPoint) will
> work as currently implemented. The problem I see is that the new function
> recoveryStartsHere() depends on pg_control containing a checkpoint right at
> the end of the backup. There's no guarantee that this is true, even if
> pg_control is copied last. That means a time, lsn, or xid that occurs
> somewhere in the middle of the backup can be selected without complaint
> from this code depending on timing.
>

Yes, that is true. The latest best position, checkpoint position, xid and
timestamp of the restored backup of the backup is shown up in the
pg_controldata, which means, that is the position from which the recovery
would start. Which in-turn means, WALs start getting replayed from that
position towards --> minimum recovery position (which is the end backup,
which again means applying WALs generated between start and the end backup)
all the way through to --> recovery target position. The best start
position to check with would the position shown up in the pg_control file,
which is way much better compared to the current postgresql behaviour.

The tests pass (or rather fail as expected) because they are written using
> values before the start of the backup. It's actually the end of the backup
> (where the database becomes consistent on recovery) that defines where PITR
> can start and this distinction definitely matters for very long backups. A
> better test would be to start the backup, get a time/lsn/xid after writing
> some data, and then make sure that time/lsn/xid is flagged as invalid on
> recovery.
>

Yes, i agree, the databases restored from the backup would start the
recovery and would become consistent once the end-backup is reached. The
point here is that, when the backup is restored and recovery proceeds
further, there is NO WAY to identify the next consistent position or
end-position of the backup. This patch is only implementing a check to
ensure recovery starts and proceeds at the right position instead of
pre-maturely getting promoted which is the current behaviour.

The current behaviour is that, if the XID shown-up by the pg_controldata is
say 1400 and the specified recovery_target_xid is say 200, then, postgresql
just completes the recovery and promotes at the immediate consistent
recovery target, which means, the DBA has to all the way start the
restoration and recovery process again to promote the database at the
intended position.

It is also problematic to assume that transaction IDs commit in order. If
> checkPoint.latestCompletedXid contains 5 then a recovery to xid 4 may or
> may not be successful depending on the commit order. However, it appears
> in this case the patch would disallow recovery to 4.
>

If the txid 4 is the recovery target xid, then, the appropriate backup
taken previous to txid 4 must be used or as an alternative recovery target
like recovery_target_timestamp must be used to proceed to the respective
recovery target xid.

> I think this logic would need to be baked into recoveryStopsAfter() and/or
> recoveryStopsBefore() in order to work. It's not clear to me what that
> would look like, however, or if the xid check is even practical.
>

The above two functions would determine if the recovery process has to stop
at a particular position or not and the patch has nothing to do with it.

To summarise, this patch only determines if the WAL replay has to start at
all. In other words, if the specified recovery target falls prior to the
restored database position, then, the WAL replay should not start, which
prevent the database from getting promoted at an unintended recovery target.

Any thoughts/comments ?

Regards,

Venkata Balaji N
Database Consultant

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ideriha, Takeshi 2017-03-22 01:02:59 Re: Other formats in pset like markdown, rst, mediawiki
Previous Message Peter Geoghegan 2017-03-22 00:44:35 Re: Parallel tuplesort (for parallel B-Tree index creation)