Re: logical replication - still unstable after all these months

From: Erik Rijkers <er(at)xs4all(dot)nl>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: logical replication - still unstable after all these months
Date: 2017-05-26 07:27:16
Message-ID: 2248d971c274c30615254594f5c2dbf0@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-05-26 08:58, Simon Riggs wrote:
> On 26 May 2017 at 07:10, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>
>> - Do you agree this number of failures is far too high?
>> - Am I the only one finding so many failures?
>
> What type of failure are you getting?

The failure is that in the result state the replicated tables differ
from the original tables.

For instance,

-- out_20170525_0944.txt
100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n -- scale 25
93 -- All is well.
7 -- Not good.

These numbers mean: the result state of primary and replica is not the
same, in 7 out of 100 runs.

'not the same state' means: at least one of the 4 md5's of the sorted
content of the 4 pgbench tables on the primary is different from those
taken from the replica.

So, 'failure' means: the 4 pgbench tables on primary and replica are not
exactly the same after the (one-minute) pgbench-run has finished, and
logical replication has 'finished'. (plenty of time is given for the
replica to catchup. The test only calls 'failure' after 20x waiting (for
15 seconds) and 20x finding the same erroneous state (erroneous because
not-same as on primary).

I would really like to know it you think that that doesn't amount to
'failure'.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-05-26 07:40:36 Re: logical replication - still unstable after all these months
Previous Message Simon Riggs 2017-05-26 06:58:52 Re: logical replication - still unstable after all these months