Re: logical replication seems broken

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication seems broken
Date: 2021-02-15 06:23:00
Message-ID: CALDaNm1qKxkEc06vxTL+iKQKFf8XGm4qWOA9Z5uNXZHA_qOn4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 13, 2021 at 5:58 PM Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>
> > On 02/13/2021 11:49 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Feb 12, 2021 at 10:00 PM <er(at)xs4all(dot)nl> wrote:
> > >
> > > > On 02/12/2021 1:51 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Feb 12, 2021 at 6:04 PM Erik Rijkers <er(at)xs4all(dot)nl> wrote:
> > > > >
> > > > > I am seeing errors in replication in a test program that I've been running for years with very little change (since 2017, really [1]).
> > >
> > > Hi,
> > >
> > > Here is a test program. Careful, it deletes stuff. And it will need some changes:
> > >
> >
> > Thanks for sharing the test. I think I have found the problem.
> > Actually, it was an existing code problem exposed by the commit
> > ce0fdbfe97. In pgoutput_begin_txn(), we were sometimes sending the
> > prepare_write ('w') message but then the actual message was not being
> > sent. This was the case when we didn't found the origin of a txn. This
> > can happen after that commit because we have now started using origins
> > for tablesync workers as well and those origins are removed once the
> > tablesync workers are finished. We might want to change the behavior
> > related to the origin messages as indicated in the comments but for
> > now, fixing the existing code.
> >
> > Can you please test if the attached fixes the problem at your end as well?
>
> > [fix_origin_message_1.patch]
>
> I compiled just now a binary from HEAD, and a binary from HEAD+patch
>
> HEAD is still broken; your patch rescues it, so yes, fixed.
>
> Maybe a test (check or check-world) should be added to run a second replica? (Assuming that would have caught this bug)
>

+1 for the idea of having a test for this. I have written a test for this.
Thanks for the fix Amit, I could reproduce the issue without your fix
and verified that the issue gets fixed with the patch you shared.
Attached a patch for the same. Thoughts?

Regards,
Vignesh

Attachment Content-Type Size
v1-0001-Test-for-verifying-data-is-replicated-in-cascaded.patch text/x-patch 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lawrence Barwick 2021-02-15 06:52:31 Re: [DOC] add missing "[ NO ]" to various "DEPENDS ON" synopses
Previous Message Fujii Masao 2021-02-15 06:17:51 Re: adding wait_start column to pg_locks