From: | Steve Singer <steve(at)ssinger(dot)info> |
---|---|
To: | Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Subject: | Re: pglogical - logical replication contrib module |
Date: | 2016-01-10 19:57:12 |
Message-ID: | 5692B798.2050002@ssinger.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 01/09/2016 01:30 PM, Steve Singer wrote:
> On 12/31/2015 06:34 PM, Petr Jelinek wrote:
>
> I'm not really sure what to do to 'recover' my cluster at this point
> so I'll send this off and rebuild my cluster and start over.
>
>
I had a setup test1--->test2 (with 2 tables in the default set)
I then created a third database (all three hosted on the same PG cluster)
In the third database (test3)
test3=# create extension pglogical;
CREATE EXTENSION
test3=# select pglogical.create_node(node_name:='test3',
dsn:='host=localhost dbname=test3 port=5436');
create_node
-------------
2001662995
(1 row)
test3=# select
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
dbname=test2 port=5436');
create_subscription
---------------------
2974019075
It copied the schema over but not the data (if I use test2 as the
provider_dsn then it does copy the data).
I then tried inserting a row into a table on test1. Things crashed and
after crash recovery I keep getting
2016-01-10 13:03:15 EST LOG: database system is ready to accept connections
2016-01-10 13:03:15 EST LOG: autovacuum launcher started
2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub
2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub
2016-01-10 13:03:15 EST test2LOG: starting logical decoding for slot
"pgl_test3
_test2_defaultsub"
2016-01-10 13:03:15 EST test2DETAIL: streaming transactions committing
after 0/
18292D8, reading WAL from 0/18292D8
2016-01-10 13:03:15 EST test2LOG: logical decoding found consistent
point at 0/
18292D8
2016-01-10 13:03:15 EST test2DETAIL: Logical decoding will begin using
saved sn
apshot.
TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c",
Line: 1127)
2016-01-10 13:03:15 EST test2LOG: unexpected EOF on standby connection
2016-01-10 13:03:15 EST LOG: worker process: pglogical apply
17016:2974019075 (
PID 24746) was terminated by signal 6: Aborted
The stack trace is
#3 0x00000000007b83af in SearchCatCache (cache=0xe27d18, v1=15015784,
v2=v2(at)entry=0, v3=v3(at)entry=0, v4=v4(at)entry=0) at catcache.c:1127
#4 0x00000000007c503e in SearchSysCache (cacheId=cacheId(at)entry=47,
key1=<optimized out>, key2=key2(at)entry=0, key3=key3(at)entry=0,
key4=key4(at)entry=0) at syscache.c:981
#5 0x00000000006996d4 in replorigin_by_name (
roname=0xe51f30 "pgl_test2_test1_defaultsub",
missing_ok=missing_ok(at)entry=0 '\000') at origin.c:216
#6 0x00007fdb54a908d3 in handle_origin (s=0x7ffd873f6da0)
at pglogical_apply.c:235
#7 replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031
#8 apply_work (streamConn=streamConn(at)entry=0xe84fb0) at
pglogical_apply.c:1309
#9 0x00007fdb54a911cc in pglogical_apply_main (main_arg=<optimized out>)
at pglogical_apply.c:1691
#10 0x0000000000674912 in StartBackgroundWorker () at bgworker.c:726
---Type <return> to continue, or q <return> to quit---
#11 0x000000000067f7e2 in do_start_bgworker (rw=0xe03890) at
postmaster.c:5501
#12 maybe_start_bgworker () at postmaster.c:5676
#13 0x0000000000680206 in sigusr1_handler
(postgres_signal_arg=<optimized out>)
at postmaster.c:4937
#14 <signal handler called>
#15 0x00007fdb54fa2293 in __select_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#16 0x0000000000468285 in ServerLoop () at postmaster.c:1648
#17 0x000000000068161e in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0xddede0) at postmaster.c:1292
#18 0x000000000046979d in main (argc=3, argv=0xddede0) at main.c:223
I tried dropping the subscription and re-adding it. I keep getting
2016-01-10 13:21:48 EST test1LOG: logical decoding found consistent
point at 0/1830080
2016-01-10 13:21:48 EST test1DETAIL: There are no running transactions.
2016-01-10 13:21:48 EST test1LOG: exported logical decoding snapshot:
"000004DE-1" with 0 transaction IDs
2016-01-10 13:21:48 EST test3ERROR: relation "a" already exists
2016-01-10 13:21:48 EST test3STATEMENT: CREATE TABLE a (
a integer NOT NULL,
b integer
);
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a
ssinger
pg_restore: [archiver (db)] could not execute query: ERROR: relation "a"
already exists
Command was: CREATE TABLE a (
a integer NOT NULL,
b integer
);
2016-01-10 13:21:48 EST ERROR: could not execute command
"/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data"
--exit-on-error -1 -d "host=localhost dbname=test3 port=5436"
"/tmp/pglogical-28079.dump""
2016-01-10 13:21:48 EST test1LOG: unexpected EOF on client connection
with an open transaction
2016-01-10 13:21:48 EST LOG: worker process: pglogical apply
17016:844915593 (PID 28079) exited with exit code 1
2016-01-10 13:21:48 EST ERROR: subscriber defaultsub4 initialization
failed during nonrecoverable step (s), please try the setup again
Which is probably also the cause of the error I reported yesterday (that
I tried creating a subscription without dropping the tables).
From a usability point of view I think we need a way of making this
errors available in the output of pglogical.show_subscription_status().
I asked to subscribe something through psql, even thought it is
asynchronous, if the async operation fails I should be able to learn
about the problem through psql. If I am writing a script to subscribe a
node it needs a way in my script of checking if the subscription has
failed and reporting the error.
My subscription script might not have easy access to the server log.
+- `pglogical.show_subscription_table(subscription_name name,
+ relation regclass)`
+ Shows synchronization status of a table.
+
+ Parameters:
+ - `subscription_name` - name of the existing subscription
+ - `relation` - name of existing table, optionally qualified
+
It isn't clear from the documentation what the output of this function
means, nor could I tell looking at it. Is this function just supposed
to tell us if a table is part of the replication set or if it is
'up-to-date'. It still reports 'synchornized' when a table is behind.
From | Date | Subject | |
---|---|---|---|
Next Message | Christian Ullrich | 2016-01-10 19:58:26 | Close handle leak in SSPI auth |
Previous Message | Tomas Vondra | 2016-01-10 19:12:39 | Re: WIP: bloom filter in Hash Joins with batches |