Re: pglogical - logical replication contrib module

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: pglogical - logical replication contrib module
Date: 2016-01-15 17:01:14
Message-ID: 569925DA.7090102@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-01-09 19:30, Steve Singer wrote:\
> I am going to send my comments/issues out in batches as I find them
> instead of waiting till I look over everything.
>

Thanks for looking at this! Yes going in batches/steps makes sense, this
is huge patch.

>
> I find this part of the documentation a bit unclear
>
>
> +Once the provider node is setup, subscribers can be subscribed to it.
> First the
> +subscriber node must be created:
> +
> + SELECT pglogical.create_node(
> + node_name := 'subscriber1',
> + dsn := 'host=thishost port=5432 dbname=db'
> + );
> +
>
> My initial reading was that I should execute this on the provider node.
> Perhaps instead
> -----------------
> Once the provider node is setup you can then create subscriber nodes.
> Create the subscriber nodes and
> then execute the following commands on each subscriber node
>
> create extension pglogical
>
> select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost
> dbname=db port=5432');
>
> -------------------

Makes sense I guess, this is probably relic of how this internally
evolved (we used to have providers and subscribers before we merged them
into nodes).

>
> Also the documentation for create_subscription talks about
>
> + - `synchronize_structure` - specifies if to synchronize structure from
> + provider to the subscriber, default true
>

Not sure what's your comment on this.

>
>
> I did the following
>
> test2=# select pglogical.create_subscription(subscription_name:='default
> sub',provider_dsn:='host=localhost dbname=test1 port=5436');
> create_subscription
> ---------------------
> 247109879
>
>
> Which then resulted in the following showing up in my PG log
>
> LOG: worker process: pglogical apply 16542:247109879 (PID 4079) exited
> with exit code 1
> ERROR: replication slot name "pgl_test2_test1_default sub" contains
> invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
> FATAL: could not send replication command "CREATE_REPLICATION_SLOT
> "pgl_test2_test1_default sub" LOGICAL pglogical_output": status
> PGRES_FATAL_ERROR: ERROR: replication slot name
> "pgl_test2_test1_default sub" contains invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>
>
> The create_subscription command should check if the subscription name is
> valid (meets the rules that will be applied against the slot command).
>

Yes, fixed. Also added some other sensitization code since we also use
dbname in slot name and that can contain whatever.

> I wondered how I could fix my mistake.
>
> The docs say
>
> +- `pglogical.pglogical_drop_subscription(subscription_name name,
> ifexists bool)`
> + Disconnects the subscription and removes it from the catalog.
> +
>
> test2=# select pglogical.pglogical_drop_subscription('default sub', true);
> ERROR: function pglogical.pglogical_drop_subscription(unknown, boolean)
> does not exist
>
>
> The command is actually called pglogical.drop_subscription the docs
> should be fixed to show the actual command name
>

Yep, got this from other people as well, fixed.

>
> I then wanted to add a second table to my database. ('b').
>
> select pglogical.replication_set_add_table('default','public.b',true);
> replication_set_add_table
> ---------------------------
> t
> (1 row)
>
> In my pglog I then got
>
> LOG: starting sync of table public.b for subscriber defaultsub
> ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b"
> contains invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
> FATAL: could not send replication command "CREATE_REPLICATION_SLOT
> "pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status
> PGRES_FATAL_ERROR: ERROR: replication slot name
> "pgl_test2_test1_defaultsub_public.b" contains invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>

Right, needed the sensitization as well (I am actually using the hash
now as there is only 8 chars left anyway).

>
> I then did
>
> test1=# select
> pglogical.replication_set_remove_table('default','public.b');
> replication_set_remove_table
> ------------------------------
> t
> (1 row)
>
>
> but my log still keep repeating the error, so I tried connecting to the
> replica and did the same
>
> test2=# select
> pglogical.replication_set_remove_table('default','public.b');
> ERROR: replication set mapping -303842815:16726 not found
>
> Is there any way to recover from this situation?
>

Not really, there is no api yet to remove table from synchronization
process so you'd have to manually delete row from
pglogical.local_sync_status on subscriber, kill the sync process and
remove the slot. I will think about what would be good api to solve this.

> The documenation says I can drop a replication set, maybe that will let
> replication continue.
>
> +- `pglogical.delete_replication_set(set_name text)`
> + Removes the replication set.
> +
>
> select pglogical.delete_replication_set('default');
> ERROR: function pglogical.delete_replication_set(unknown) does not exist
> LINE 1: select pglogical.delete_replication_set('default');
> ^
> HINT: No function matches the given name and argument types. You might
> need to add explicit type casts.
>
> The function is actually pglogical.drop_replication_set , the docs
> should be updated.
> (note that didn't fix my problem either but then dropping the
> subscription did seem to work).

Yeah it doesn't as the problem is on subscriber, replication sets only
affect provider. And fixed the docs.

>
>
>
> I then re-added the default set to the origin and resubscribed my replica
>
>
>
> test2=# select
> pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
> dbname=test1 port=5436');
> create_subscription
> ---------------------
> 2974019075
>
>
> I then saw a bunch of
> LOG: worker process: pglogical apply 16542:2974019075 (PID 26778)
> exited with exit code 1
> ERROR: subscriber defaultsub initialization failed during
> nonrecoverable step (s), please try the setup again
> LOG: worker process: pglogical apply 16542:2974019075 (PID 26779)
> exited with exit code 1
>
> in the log but then those stopped and I see
>
> test2=# select pglogical.show_subscription_status();
> show_subscription_status
>
> --------------------------------------------------------------------------------
>
> --------------------------------------------------
> (defaultsub,down,test1,"host=localhost dbname=test1
> port=5436",pgl_test2_test1_
> defaultsub,"{default,default_insert_only}",{all})
> (1 row)
>
>
> I'm not really sure what to do to 'recover' my cluster at this point so
> I'll send this off and rebuild my cluster and start over.
>

I think the problem here is that you resubscribed with
syncrhonize_structure := true while the conflicting structure already
existed, that option only works correctly when there is no conflicting
structure (we don't try to make diffs or anything, just dump/restore).
Recovering should be drop the uninitialized subscription and create new
one where you don't synchronize structure.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-01-15 17:03:22 Re: [BUGS] about test_parser installation failure problem(PostgreSQL in 9.5.0)?
Previous Message Benedikt Grundmann 2016-01-15 16:59:46 Re: Death by regexp_replace