Re: Logical Replication WIP

From: Steve Singer <steve(at)ssinger(dot)info>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Replication WIP
Date: 2016-09-05 21:35:32
Message-ID: 57CDE524.3050607@ssinger.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/05/2016 03:58 PM, Steve Singer wrote:
> On 08/31/2016 04:51 PM, Petr Jelinek wrote:
>> Hi,
>>
>> and one more version with bug fixes, improved code docs and couple
>> more tests, some general cleanup and also rebased on current master
>> for the start of CF.
>>
>>
>>
>

A few more things I noticed when playing with the patches

1, Creating a subscription to yourself ends pretty badly,
the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill
it. The background process seems to be waiting for a transaction to
commit (I assume the create subscription command). I had to kill -9 the
various processes to get things to stop. Getting confused about
hostnames and ports is a common operator error.

2. Failures during the initial subscription aren't recoverable

For example

on db1
create table a(id serial4 primary key,b text);
insert into a(b) values ('1');
create publication testpub for table a;

on db2
create table a(id serial4 primary key,b text);
insert into a(b) values ('1');
create subscription testsub connection 'host=localhost port=5440
dbname=test' publication testpub;

I then get in my db2 log

ERROR: duplicate key value violates unique constraint "a_pkey"
DETAIL: Key (id)=(1) already exists.
LOG: worker process: logical replication worker 16396 sync 16387 (PID
10583) exited with exit code 1
LOG: logical replication sync for subscription testsub, table a started
ERROR: could not crate replication slot "testsub_sync_a": ERROR:
replication slot "testsub_sync_a" already exists

LOG: worker process: logical replication worker 16396 sync 16387 (PID
10585) exited with exit code 1
LOG: logical replication sync for subscription testsub, table a started
ERROR: could not crate replication slot "testsub_sync_a": ERROR:
replication slot "testsub_sync_a" already exists

and it keeps looping.
If I then truncate "a" on db2 it doesn't help. (I'd expect at that point
the initial subscription to work)

If I then do on db2
drop subscription testsub cascade;

I still see a slot in use on db1

select * FROM pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | active |
active_pid | xmin | catalog_xmin | rest
art_lsn | confirmed_flush_lsn
----------------+----------+-----------+--------+----------+--------+------------+------+--------------+-----
--------+---------------------
testsub_sync_a | pgoutput | logical | 16384 | test | f
| | | 1173 | 0/15
66E08 | 0/1566E40

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-09-05 21:54:54 Bug in 9.6 tuplesort batch memory growth logic
Previous Message Claudio Freire 2016-09-05 20:58:17 Re: Vacuum: allow usage of more than 1GB of work mem