Re: logical replication - still unstable after all these months

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: logical replication - still unstable after all these months
Date: 2017-05-29 01:33:51
Message-ID: CAMkU=1zsThCJV03SvdUtYGapsm+yA_GkVBgm_e+xpb2FEcoEtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 28, 2017 at 3:17 PM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:

> On 28/05/17 19:01, Mark Kirkwood wrote:
>
>
>> So running in cloud land now...so for no errors - will update.
>>
>>
>>
>>
> The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3
> failed test runs (all scale 25 and 8 pgbench clients). Given the way the
> test decides on failure (gets tired of waiting for the table md5's to
> match) - it begs the question 'What if it had waited a bit longer'? However
> from what I can see in all cases:
>
> - the rowcounts were the same in master and replica
> - the md5 of pgbench_accounts was different
>

All four tables should be wrong if there is still a transaction it is
waiting for, as all the changes happen in a single transaction.

I also got a failure, after 87 iterations of a similar test case. It
waited for hours, as mine requires manual intervention to stop waiting. On
the subscriber, one account still had a zero balance, while the history
table on the subscriber agreed with both history and accounts on the
publisher and the account should not have been zero, so definitely a
transaction atomicity got busted.

I altered the script to also save the tellers and branches tables and
repeated the runs, but so far it hasn't failed again in over 800 iterations
using the altered script.

>
> ...so does seem possible that there is some bug being tickled here.
> Unfortunately the test framework blasts away the failed tables and
> subscription and continues on...I'm going to amend it to stop on failure so
> I can have a closer look at what happened.
>

What would you want to look at? Would saving the WAL from the master be
helpful?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-05-29 01:47:40 Re: Extra Vietnamese unaccent rules
Previous Message Mark Kirkwood 2017-05-28 22:17:49 Re: logical replication - still unstable after all these months