Re: patch proposal

From: Venkata B Nagothi <nag1010(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch proposal
Date: 2017-02-22 02:09:28
Message-ID: CAEyp7J95sMKJam8KeMytDCsR53B5pCsvvbxTRZH8ZWuEBECheA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 1/27/17 3:19 AM, Venkata B Nagothi wrote:
>
> > I will be adding the tests in
> > src/test/recovery/t/003_recovery_targets.pl
> > <http://003_recovery_targets.pl>. My tests would look more or less
> > similar to recovery_target_action. I do not see any of the tests added
> > for the parameter recovery_target_action ? Anyways, i will work on
> > adding tests for recovery_target_incomplete.
>
> Do you know when those will be ready?
>

Apologies for the delayed response. Actually, I am working through to add
the tests for both the patches and started with adding the tests to
recovery_target_incomplete parameter.
I have hit an issue while adding the tests which i have explained below.

> >
> >
> > 4) I'm not convinced that fatal errors by default are the way to go.
> > It's a pretty big departure from what we have now.
> >
> >
> > I have changed the code to generate ERRORs instead of FATALs. Which is
> > in the patch recoveryStartPoint.patch
>
> I think at this point in recovery any ERROR at all will probably be
> fatal. Once you have some tests in place we'll know for sure.
>

Sure. I can add the tests for the first patch.

Either way, the goal would be to keep recovery from proceeding and let
> the user adjust their targets. Since this is a big behavioral change
> which will need buy in from the community.
>

Sure. Agreed.

> > Also, I don't think it's very intuitive how
> recovery_target_incomplete
> > works. For instance, if I set recovery_target_incomplete=promote
> and
> > set recovery_target_time, but pick a backup after the specified time,
> > then there will be an error "recovery_target_time %s is older..."
> rather
> > than a promotion.
> >
> >
> > Yes, that is correct and that is the expected behaviour. Let me explain -
>
> My point was that this combination of options could lead to confusion on
> the part of users. However, I'd rather leave that alone for the time
> being and focus on the first patch.
>

Sure.

> > I have split the patch into two different
> > pieces. One is to determine if the recovery can start at all and other
> > patch deals with the incomplete recovery situation.
>
> I think the first patch looks promising and I would rather work through
> that before considering the second patch. Once there are tests for the
> first patch I will complete my review.
>

Sure. Will submit the first patch along with tests once i can get through
the issue i am facing.

*The issue i am facing while writing the tests*

Below is the problem i am facing while adding the tests to
003_recovery_targets.pl

1. Test case : Test the Incomplete recovery with parameters
restore_command, recovery_target_xid and recovery_target_incomplete
='shutdown' (option 'pause' and 'promote' work fine)

2. Expected test Result : Standby node successfully shuts-down if the
recovery process reaches mid-way and does not reach the intended recovery
target due to unavailability of WALs

3. All good. Everything works as expected. The problem is that, the test
exits with the exit code 256, bails out and shows-up as "pg_ctl failed".

Is there anyway, i can ensure the test does not bail-out and exits with the
exit code '0' when you do "pg_ctl start" and the server starts and
shuts-down for any reason ?

If you look at the log file, pg_ctl actually starts successfully and then
shuts down cleanly because intended recovery target is not reached. The
messages in the log file are also appropriate.
Since it is a clean-start and clean-shutdown (which means the test is
successful), the test should return a success, which is not happening.

Below is the log (a test for the second patch). I believe Perl generates an
error code 256 because pg_ctl responds with an error - "pg_ctl: could not
start the server".
I would need my test case to return a success in this scenario. Any
suggestions would be great.

[dba(at)buildhost ~]$ /data/PostgreSQL-Repo/postgresql/tmp_install/disk1/
pginstall/pgsql-10-dev-master-xlog-2-wal/bin/pg_ctl -D
/data/PostgreSQL-Repo/postgresql/src/test/recovery/
tmp_check/data_pitr_3_U_oA/pgdata start
waiting for server to start....2017-02-22 12:06:41.773 AEDT [23858] LOG:
database system was interrupted while in recovery at log time 2017-02-15
16:40:41 AEDT
2017-02-22 12:06:41.773 AEDT [23858] HINT: If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2017-02-22 12:06:41.773 AEDT [23858] LOG: starting point-in-time recovery
to XID 559
2017-02-22 12:06:41.782 AEDT [23858] LOG: restored log file
"000000010000000000000002" from archive
2017-02-22 12:06:41.784 AEDT [23858] LOG: redo starts at 0/2000028
2017-02-22 12:06:41.794 AEDT [23858] LOG: restored log file
"000000010000000000000003" from archive
2017-02-22 12:06:41.814 AEDT [23858] LOG: restored log file
"000000010000000000000004" from archive
2017-02-22 12:06:41.835 AEDT [23858] LOG: restored log file
"000000010000000000000005" from archive
2017-02-22 12:06:41.858 AEDT [23858] LOG: restored log file
"000000010000000000000006" from archive
2017-02-22 12:06:41.879 AEDT [23858] LOG: restored log file
"000000010000000000000007" from archive
2017-02-22 12:06:41.891 AEDT [23858] LOG: consistent recovery state
reached at 0/8000000
*2017-02-22 12:06:41.891 AEDT [23857] LOG: database system is ready to
accept read only connections*
cp: cannot stat ‘/data/PostgreSQL-Repo/postgresql/src/test/recovery/
tmp_check/data_master_cyOw/archives/000000010000000000000008’: No such file
or directory
2017-02-22 12:06:41.893 AEDT [23858] LOG: recovery has reached
end-of-the-wal and has not reached the recovery target yet
2017-02-22 12:06:41.893 AEDT [23858] HINT: This could be due to corrupt or
missing WAL files.
All the WAL files needed for the recovery must be available to
proceed to the recovery target Or you might need to choose an earlier
recovery target.
*2017-02-22 12:06:41.893 AEDT [23858] LOG: shutdown at end-of-the-wal*
2017-02-22 12:06:41.893 AEDT [23857] LOG: startup process (PID 23858)
exited with exit code 2
2017-02-22 12:06:41.893 AEDT [23857] LOG: terminating any other active
server processes
2017-02-22 12:06:41.894 AEDT [23857] LOG: database system is shut down
stopped waiting
*pg_ctl: could not start server*
Examine the log output.

I will be facing the same problem while adding the tests for the first
patch. In the first patch, the recovery will not proceed and database will
shutdown if the specified recovery target is not legitimate.

To summarise, i would require the tests to exit with an exit code "0" when
the pg_ctl could not the start the server or shuts down the server when
"pg_ctl start" is issued in certain recovery scenarios.

Thanks,

Venkata B N
Database Consultant

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Venkata B Nagothi 2017-02-22 02:10:32 Re: patch proposal
Previous Message Michael Paquier 2017-02-22 02:07:19 Re: Speedup twophase transactions