Re: Column Filtering in Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-21 10:38:59
Message-ID: CAA4eK1K79sYJo00YqKJ02yMQUZ44o=HRDrdBA3zaQs8-4m-7Pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 20, 2022 at 4:53 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 3/20/22 07:23, Amit Kapila wrote:
> > On Sun, Mar 20, 2022 at 8:41 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>
> >> On Fri, Mar 18, 2022 at 10:42 PM Tomas Vondra
> >> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >>> So the question is why those two sync workers never complete - I guess
> >>> there's some sort of lock wait (deadlock?) or infinite loop.
> >>>
> >>
> >> It would be a bit tricky to reproduce this even if the above theory is
> >> correct but I'll try it today or tomorrow.
> >>
> >
> > I am able to reproduce it with the help of a debugger. Firstly, I have
> > added the LOG message and some While (true) loops to debug sync and
> > apply workers. Test setup
> >
> > Node-1:
> > create table t1(c1);
> > create table t2(c1);
> > insert into t1 values(1);
> > create publication pub1 for table t1;
> > create publication pu2;
> >
> > Node-2:
> > change max_sync_workers_per_subscription to 1 in potgresql.conf
> > create table t1(c1);
> > create table t2(c1);
> > create subscription sub1 connection 'dbname = postgres' publication pub1;
> >
> > Till this point, just allow debuggers in both workers just continue.
> >
> > Node-1:
> > alter publication pub1 add table t2;
> > insert into t1 values(2);
> >
> > Here, we have to debug the apply worker such that when it tries to
> > apply the insert, stop the debugger in function apply_handle_insert()
> > after doing begin_replication_step().
> >
> > Node-2:
> > alter subscription sub1 set pub1, pub2;
> >
> > Now, continue the debugger of apply worker, it should first start the
> > sync worker and then exit because of parameter change. All of these
> > debugging steps are to just ensure the point that it should first
> > start the sync worker and then exit. After this point, table sync
> > worker never finishes and log is filled with messages: "reached
> > max_sync_workers_per_subscription limit" (a newly added message by me
> > in the attached debug patch).
> >
> > Now, it is not completely clear to me how exactly '013_partition.pl'
> > leads to this situation but there is a possibility based on the LOGs
> > it shows.
> >
>
> Thanks, I'll take a look later. From the description it seems this is an
> issue that existed before any of the patches, right? It might be more
> likely to hit due to some test changes, but the root cause is older.
>

Yes, your understanding is correct. If my understanding is correct,
then we need probably just need some changes in the new test to make
it behave as per the current code.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Arne Roland 2022-03-21 10:50:46 Re: Detaching a partition with a FK on itself is not possible
Previous Message Jehan-Guillaume de Rorthais 2022-03-21 10:36:34 Re: Detaching a partition with a FK on itself is not possible