Re: Skipping logical replication transactions on subscriber side

From: Noah Misch <noah(at)leadboat(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2022-04-02 01:59:43
Message-ID: 20220402015943.GB3719101@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 02, 2022 at 06:49:20AM +0530, Amit Kapila wrote:
> On Sat, Apr 2, 2022 at 5:41 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> >
> > On Fri, Apr 01, 2022 at 09:25:52PM +0900, Masahiko Sawada wrote:
> > > > On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > > src/test/subscription/t/029_on_error.pl has been failing reliably on the five
> > > > > AIX buildfarm members:
> > > > >
> > > > > # poll_query_until timed out executing this query:
> > > > > # SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
> > > > > # expecting this output:
> > > > > # t
> > > > > # last actual query output:
> > > > > # f
> > > > > # with stderr:
> > > > > timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.pl line 50.
> > > > >
> > > > > I've posted five sets of logs (2.7 MiB compressed) here:
> > > > > https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing
> >
> > > Given that "SELECT subskiplsn = '0/0'
> > > FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
> > > value was set to subskiplsn even after the unique key error.
> > >
> > > So I'm guessing that the apply worker could not get the updated value
> > > of the subskiplsn or its MySubscription->skiplsn could not match with
> > > the transaction's finish LSN. Also, given that the test is failing on
> > > all AIX buildfarm members, there might be something specific to AIX.
> > >
> > > Noah, to investigate this issue further, is it possible for you to
> > > apply the attached patch and run the 029_on_error.pl test? The patch
> > > adds some logs to get additional information.
> >
> > Logs attached.
>
> Thank you.
>
> By seeing the below Logs:
> ----
> ....
> 2022-04-01 18:19:34.710 CUT [58327402] LOG: not started skipping
> changes: my_skiplsn 14EB7D8/B0706F72 finish_lsn 0/14EB7D8
> ...
> ----
>
> It seems that the value of skiplsn read in GetSubscription is wrong
> which makes the apply worker think it doesn't need to skip the
> transaction. Now, in Alter/Create Subscription, we are using
> LSNGetDatum() to store skiplsn value in pg_subscription but while
> reading it in GetSubscription(), we are not converting back the datum
> to LSN by using DatumGetLSN(). Is it possible that on this machine it
> might be leading to not getting the right value for skiplsn? I think
> it is worth trying to see if this fixes the problem.

After applying datum_to_lsn_skiplsn_1.patch, I get another failure. Logs
attached.

Attachment Content-Type Size
log-subscription-20220401b.tar.xz application/octet-stream 13.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-04-02 02:58:55 Re: PostgreSQL shutdown modes
Previous Message Amit Kapila 2022-04-02 01:19:20 Re: Skipping logical replication transactions on subscriber side