Re: Logical replication existing data copy

From: Erik Rijkers <er(at)xs4all(dot)nl>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical replication existing data copy
Date: 2017-02-22 17:13:11
Message-ID: b0dbcb2a1066d6728cbf62e391e7edf4@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-02-22 14:48, Erik Rijkers wrote:
> On 2017-02-22 13:03, Petr Jelinek wrote:
>
>> 0001-Skip-unnecessary-snapshot-builds.patch
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
>> 0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
>> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
>> 0002-Fix-after-trigger-execution-in-logical-replication.patch
>> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
>> 0001-Logical-replication-support-for-initial-data-copy-v5.patch
>
> It works well now, or at least my particular test case seems now
> solved.

Cried victory too early, I'm afraid.

The logical replication is now certainly much more stable but there are
still errors, just less often.

The rare 'hang'-error that I mentioned a few emails back I have not yet
encountered; I am beginning to trust that that is indeed solved.

But there is still sometimes incorrect replication. The symptoms are
the ones I mentioned earlier:
- incorrect number of rows in one of (mostly) pgbench_accounts or
pgbench_history.
the numers are always off by a very small number, say less than 20,
often even only 1 row.
- incorrect content in one of pgbench_accounts or pgbench_history
(detected via md5). Also mostly the two tables named above.

I see sometimes primary key violations on the replica. That should not
be possible if I have understood the intent of logical replication
correctly.
( ERROR: duplicate key value violates unique constraint
"pgbench_tellers_pkey" )
mostly *_tellers, also seen *_branches

Understandably, the errors become more frequent with higher client
counts: a 25x repeat with 1 client yielded only 1 failed run whereas a
25x repeat with 16 clients gave 16 failures.

I attach once more the current incarnation of my test-bash pgbench
runner, pgbench_derail2.sh.
Easiest to run it yourself, I guess.

I also attach the output (of pgbench_derail2.sh) of those two 25x
repeats:
d2_scale__1_client__1_25x.txt
d2_scale__1_client_16_25x.txt

I worry a bit about the correctness of that test program
(pgbench_derail2.sh). I especially wonder if it should look around
better at startup (e.g., at stuff left over from previous iterations).
If you see any incorrect/dumb things there, or a better way to monitor
(aka pre-flight checks), please let me know.

But the current state si certainly a big step forward -- I guess it's
just your bad luck that I had the afternoon off ;)

thanks,

Erik Rijkers

Attachment Content-Type Size
pgbench_derail2.sh text/x-shellscript 7.1 KB
d2_scale__1_client__1_25x.txt text/plain 42.3 KB
d2_scale__1_client_16_25x.txt text/plain 82.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2017-02-22 17:15:58 Re: pg_monitor role
Previous Message Bernd Helmle 2017-02-22 17:09:26 Re: Make subquery alias optional in FROM clause