Re: patch proposal

From: Venkata B Nagothi <nag1010(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch proposal
Date: 2016-08-16 05:08:25
Message-ID: CAEyp7J_2KhY5QzJWAWq-VBBiU6_R+rX8n+-pbYiQxyC6JmAaFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 16, 2016 at 2:50 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 8/15/16 2:33 AM, Venkata B Nagothi wrote:
>
> > During the recovery process, It would be nice if PostgreSQL generates an
> > error by aborting the recovery process (instead of starting-up the
> > cluster) if the intended recovery target point is not reached and give
> > an option to DBA to resume the recovery process from where it exactly
> > stopped.
>
> Thom wrote a patch [1] recently that gives warnings in this case. You
> might want to have a look at that first.
>

That is good to know. Yes, this patch is about generating a more meaningful
output messages for recovery process, which makes sense.

> > The issue here is, if by any chance, the required WALs are not available
> > or if there is any WAL missing or corrupted at the restore_command
> > location, then PostgreSQL recovers until the end of the last available
> > WAL and starts-up the cluster.
>
> You can use pause_at_recovery_target/recovery_target_action (depending
> on your version) to prevent promotion. That would work for your stated
> scenario but not for the scenario where replay starts (or the database
> reaches consistency) after the recovery target.
>

The above said parameters can be configured to pause, shutdown or prevent
promotion only after reaching the recovery target point.
To clarify, I am referring to a scenario where recovery target point is not
reached at all ( i mean, half-complete or in-complete recovery) and there
are lots of WALs still pending to be replayed - in this situation,
PostgreSQL just completes the archive recovery until the end of the last
available WAL (WAL file "00000001000000000000001E" in my case) and
starts-up the cluster by generating an error message (saying
"00000001000000000000001F" not found).

Note: I am testing in PostgreSQL-9.5

LOG: restored log file "00000001000000000000001E" from archive
cp: cannot stat ‘/data/pgrestore9531/00000001000000000000001F’: No such
file or directory
LOG: redo done at 0/1EFFDBB8
LOG: last completed transaction was at log time 2016-08-15
11:04:26.795902+10

I have used the following recovery* parameters in the recovery.conf file
here and have intentionally not supplied all the WAL archives needed for
the recovery process to reach the target xid.

recovery_target_xid = xxxx,
recovery_target_inclusive = true
recovery_target_action = pause

It would be nice if PostgreSQL pauses the recovery in-case its not complete
(because of missing or corrupt WAL), shutdown the cluster and allows the
DBA to restart the replay of the remaining WAL Archive files to continue
recovery (from where it stopped previously) until the recovery target point
is reached.

Regards,
Venkata B N

Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Flower 2016-08-16 05:46:49 Re: Why --backup-and-modify-in-place in perltidy config?
Previous Message Rushabh Lathia 2016-08-16 05:05:23 Re: [parallel query] random server crash while running tpc-h query on power2