Re: Logical replication existing data copy

From: Erik Rijkers <er(at)xs4all(dot)nl>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical replication existing data copy
Date: 2017-02-11 10:16:34
Message-ID: 479dceb8c0acc8026ced81caf9cce750@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-02-08 23:25, Petr Jelinek wrote:

> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
> 0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
> 0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
> 0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
> 0001-Logical-replication-support-for-initial-data-copy-v4.patch

Apart from the failing one make check test (test 'object_address') which
I reported earlier, I find it is easy to 'confuse' the replication.

I attach a script that intends to test the default COPY DATA. There
are two instances, initially without any replication. The script inits
pgbench on the master, adds a serial column to pgbench_history, and
dump-restores the 4 pgbench-tables to the future replica. It then
empties the 4 pgbench-tables on the 'replica'. The idea is that when
logrep is initiated, data will be replicated from master, with the end
result being that there are 4 identical tables on master and replica.

This often works but it also fails far too often (in my hands). I test
whether the tables are identical by comparing an md5 from an ordered
resultset, from both replica and master. I estimate that 1 in 5 tries
fail; 'fail' being a somewhat different table on replica (compared to
mater), most often pgbench_accounts (typically there are 10-30 differing
rows). No errors or warnings in either logfile. I'm not sure but I
think testing on faster machines seem to be doing somewhat better
('better' being less replication error).

Another, probably unrelated, problem occurs (but much more rarely) when
executing 'DROP SUBSCRIPTION sub1' on the replica (see the beginning of
the script). Sometimes that command hangs, and refuses to accept
shutdown of the server. I don't know how to recover from this -- I just
have to kill the replica server (master server still obeys normal
shutdown) and restart the instances.

The script accepts 2 parameters, scale and clients (used in pgbench -s
resp. -c)

I don't think I've managed to successfully run the script with more than
1 client yet.

Can you have a look whether this is reproducible elsewhere?

thanks,

Erik Rijkers

Attachment Content-Type Size
pgbench_derail2.sh text/x-shellscript 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ryan Murphy 2017-02-11 10:19:14 Access inside pg_node_tree from query?
Previous Message Magnus Hagander 2017-02-11 10:07:59 Re: gitlab post-mortem: pg_basebackup waiting for checkpoint