Re: pglogical - logical replication contrib module

From: Steve Singer <steve(at)ssinger(dot)info>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: pglogical - logical replication contrib module
Date: 2016-01-09 18:30:45
Message-ID: 569151D5.3060003@ssinger.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/31/2015 06:34 PM, Petr Jelinek wrote:
> Hi,
>
> I'd like to submit the replication solution which is based on the
> pglogical_output [1] module (which is obviously needed for this to
> compile).
>
> The pglogical contrib module provides extension which does the
> master-slave logical replication based on the logical decoding.
>
> The basic documentation is in README.md, I didn't bother making sgml
> docs yet since I expect that there will be ongoing changes happening
> and it's easier for me to update the markdown docs than sgml. I will
> do the conversion once we start approaching committable state.
I am going to send my comments/issues out in batches as I find them
instead of waiting till I look over everything.

I find this part of the documentation a bit unclear

+Once the provider node is setup, subscribers can be subscribed to it.
First the
+subscriber node must be created:
+
+ SELECT pglogical.create_node(
+ node_name := 'subscriber1',
+ dsn := 'host=thishost port=5432 dbname=db'
+ );
+

My initial reading was that I should execute this on the provider node.
Perhaps instead
-----------------
Once the provider node is setup you can then create subscriber nodes.
Create the subscriber nodes and
then execute the following commands on each subscriber node

create extension pglogical

select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost
dbname=db port=5432');

-------------------

Also the documentation for create_subscription talks about

+ - `synchronize_structure` - specifies if to synchronize structure from
+ provider to the subscriber, default true

I did the following

test2=# select pglogical.create_subscription(subscription_name:='default
sub',provider_dsn:='host=localhost dbname=test1 port=5436');
create_subscription
---------------------
247109879

Which then resulted in the following showing up in my PG log

LOG: worker process: pglogical apply 16542:247109879 (PID 4079) exited
with exit code 1
ERROR: replication slot name "pgl_test2_test1_default sub" contains
invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.
FATAL: could not send replication command "CREATE_REPLICATION_SLOT
"pgl_test2_test1_default sub" LOGICAL pglogical_output": status
PGRES_FATAL_ERROR: ERROR: replication slot name
"pgl_test2_test1_default sub" contains invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.

The create_subscription command should check if the subscription name is
valid (meets the rules that will be applied against the slot command).

I wondered how I could fix my mistake.

The docs say

+- `pglogical.pglogical_drop_subscription(subscription_name name,
ifexists bool)`
+ Disconnects the subscription and removes it from the catalog.
+

test2=# select pglogical.pglogical_drop_subscription('default sub', true);
ERROR: function pglogical.pglogical_drop_subscription(unknown, boolean)
does not exist

The command is actually called pglogical.drop_subscription the docs
should be fixed to show the actual command name

I then wanted to add a second table to my database. ('b').

select pglogical.replication_set_add_table('default','public.b',true);
replication_set_add_table
---------------------------
t
(1 row)

In my pglog I then got

LOG: starting sync of table public.b for subscriber defaultsub
ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b"
contains invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.
FATAL: could not send replication command "CREATE_REPLICATION_SLOT
"pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status
PGRES_FATAL_ERROR: ERROR: replication slot name
"pgl_test2_test1_defaultsub_public.b" contains invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.

I then did

test1=# select pglogical.replication_set_remove_table('default','public.b');
replication_set_remove_table
------------------------------
t
(1 row)

but my log still keep repeating the error, so I tried connecting to the
replica and did the same

test2=# select pglogical.replication_set_remove_table('default','public.b');
ERROR: replication set mapping -303842815:16726 not found

Is there any way to recover from this situation?

The documenation says I can drop a replication set, maybe that will let
replication continue.

+- `pglogical.delete_replication_set(set_name text)`
+ Removes the replication set.
+

select pglogical.delete_replication_set('default');
ERROR: function pglogical.delete_replication_set(unknown) does not exist
LINE 1: select pglogical.delete_replication_set('default');
^
HINT: No function matches the given name and argument types. You might
need to add explicit type casts.

The function is actually pglogical.drop_replication_set , the docs
should be updated.
(note that didn't fix my problem either but then dropping the
subscription did seem to work).

I then re-added the default set to the origin and resubscribed my replica

test2=# select
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
dbname=test1 port=5436');
create_subscription
---------------------
2974019075

I then saw a bunch of
LOG: worker process: pglogical apply 16542:2974019075 (PID 26778)
exited with exit code 1
ERROR: subscriber defaultsub initialization failed during
nonrecoverable step (s), please try the setup again
LOG: worker process: pglogical apply 16542:2974019075 (PID 26779)
exited with exit code 1

in the log but then those stopped and I see

test2=# select pglogical.show_subscription_status();
show_subscription_status

--------------------------------------------------------------------------------
--------------------------------------------------
(defaultsub,down,test1,"host=localhost dbname=test1
port=5436",pgl_test2_test1_
defaultsub,"{default,default_insert_only}",{all})
(1 row)

I'm not really sure what to do to 'recover' my cluster at this point so
I'll send this off and rebuild my cluster and start over.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-01-09 19:02:35 Re: WIP: bloom filter in Hash Joins with batches
Previous Message Simon Riggs 2016-01-09 18:27:47 Re: pgsql: Avoid pin scan for replay of XLOG_BTREE_VACUUM