Re: Logical decoding on standby

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding on standby
Date: 2016-12-22 05:43:38
Message-ID: CAB7nPqTv9iPo9jpKN3rc64_WZfK_vQBB1TmK+dB0OSu3gMpuRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 20, 2016 at 4:03 PM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> That's about approach, but since there are prerequisite patches in the
> patchset that don't really depend on the approach I will comment about
> them as well.
>
> 0001 and 0002 add testing infrastructure and look fine to me, possibly
> committable.
>
> But in 0003 I don't understand following code:
>> + if (endpos != InvalidXLogRecPtr && !do_start_slot)
>> + {
>> + fprintf(stderr,
>> + _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"),
>> + progname);
>> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> + progname);
>> + exit(1);
>> + }
>
> Why is --create-slot and --endpos not allowed together?
>
> 0004 again looks good but depends on 0003.
>
> 0005 is timeline following which is IMHO ready for committer, as is 0006
> and 0008 and I still maintain the opinion that these should go in soon.
>
> 0007 is unfinished as you said in your mail (missing option to specify
> behavior). And the last one 0009 is the implementation discussed above,
> which I think needs rework. IMHO 0007 and 0009 should be ultimately merged.
>
> I think parts of this could be committed separately and are ready for
> committer IMHO, but there is no way in CF application to mark only part
> of patch-set for committer to attract the attention.

Craig has pinged me about looking at 0001, 0002, 0004 and 0006 as
those involve the TAP infrastructure.

So, for 0001:
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -93,6 +93,7 @@ use RecursiveCopy;
use Socket;
use Test::More;
use TestLib ();
+use pg_lsn qw(parse_lsn);
use Scalar::Util qw(blessed);
This depends on 0002, so the order should be reversed.

+sub lsn
+{
+ my $self = shift;
+ return $self->safe_psql('postgres', 'select case when
pg_is_in_recovery() then pg_last_xlog_replay_location() else
pg_current_xlog_insert_location() end as lsn;');
+}
The design of the test should be in charge of choosing which value it
wants to get, and the routine should just blindly do the work. More
flexibility is more useful to design tests. So it would be nice to
have one routine able to switch at will between 'flush', 'insert',
'write', 'receive' and 'replay modes to get values from the
corresponding xlog functions.

- die "error running SQL: '$$stderr'\nwhile running '@psql_params'"
+ die "error running SQL: '$$stderr'\nwhile running
'@psql_params' with sql '$sql'"
if $ret == 3;
That's separate from this patch, and definitely useful.

+ if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) {
+ die "valid modes are restart, confirmed_flush";
+ }
+ if (!defined($target_lsn)) {
+ $target_lsn = $self->lsn;
+ }
That's not really the style followed by the perl scripts present in
the code regarding the use of the brackets. Do we really need to care
about the object type checks by the way?

Regarding wait_for_catchup, there are two ways to do things. Either
query the standby like in the way 004_timeline_switch.pl does it or
the way this routine does. The way of this routine looks more
straight-forward IMO, and other tests should be refactored to use it.
In short I would make the target LSN a mandatory argument, and have
the caller send a standby's application_name instead of a PostgresNode
object, the current way to enforce the value of $standby_name being
really confusing.

+ my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1,
'replay' => 1 );
What's actually the point of 'sent'?

+ my @fields = ('plugin', 'slot_type', 'datoid', 'database',
'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn');
+ my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ',
@fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name =
'$slot_name'");
+ $result = undef if $result eq '';
+ # hash slice, see http://stackoverflow.com/a/16755894/398670 .
Couldn't this portion be made more generic? Other queries could
benefit from that by having a routine that accepts as argument an
array of column names for example.

Now looking at 0002....
One whitespace:
$ git diff HEAD~1 --check
src/test/perl/pg_lsn.pm:139: trailing whitespace.
+=cut

pg_lsn sounds like a fine name, now we are more used to camel case for
module names. And routines are written as lower case separated by an
underscore.

+++ b/src/test/perl/t/002_pg_lsn.pl
@@ -0,0 +1,68 @@
+use strict;
+use warnings;
+use Test::More tests => 42;
+use Scalar::Util qw(blessed);
Most of the tests added don't have a description. This makes things
harder to debug when tracking an issue.

It may be good to begin using this module within the other tests in
this patch as well. Now do we actually need it? Most of the existing
tests I recall rely on the backend's operators for the pg_lsn data
type, so this is actually duplicating an exiting facility. And all the
values are just passed as-is.

+++ b/src/test/perl/t/001_load.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+use Test::More tests => 5;
I can guess the meaning of this test, having a comment on top of it to
explain the purpose of the test is good practice though.

Looking at 0004...
+Disallows pg_recvlogial from internally retrying on error by passing --no-loop.
s/pg_recvlogial/pg_recvlogical

+sub pg_recvlogical_upto
+{
This looks like a good idea for your tests.

+my $endpos = $node_master->safe_psql('postgres', "SELECT location
FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL) ORDER BY
location DESC LIMIT 1;");
+diag "waiting to replay $endpos";
On the same wave as the pg_recvlogical wrapper, you may want to
consider some kind of wrapper at SQL level calling the slot functions.

And finally 0006.
+$node_standby_1->append_conf('recovery.conf', "primary_slot_name =
$slotname_1\n");
+$node_standby_1->append_conf('postgresql.conf',
"wal_receiver_status_interval = 1\n");
+$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n");
No need to call multiple times this routine.

Increasing the test coverage is definitely worth it.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2016-12-22 06:04:51 Re: Getting rid of "unknown error" in dblink and postgres_fdw
Previous Message Tom Lane 2016-12-22 05:20:30 Re: Getting rid of "unknown error" in dblink and postgres_fdw