Re: Column Filtering in Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-29 10:00:22
Message-ID: CAA4eK1JNrbhSsq4NMbVr+6sBOEaLOssuefLCL1HjKHX0xD5dRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 20, 2022 at 4:53 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 3/20/22 07:23, Amit Kapila wrote:
> > On Sun, Mar 20, 2022 at 8:41 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>
> >> On Fri, Mar 18, 2022 at 10:42 PM Tomas Vondra
> >> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >>> So the question is why those two sync workers never complete - I guess
> >>> there's some sort of lock wait (deadlock?) or infinite loop.
> >>>
> >>
> >> It would be a bit tricky to reproduce this even if the above theory is
> >> correct but I'll try it today or tomorrow.
> >>
> >
> > I am able to reproduce it with the help of a debugger. Firstly, I have
> > added the LOG message and some While (true) loops to debug sync and
> > apply workers. Test setup
> >
> > Node-1:
> > create table t1(c1);
> > create table t2(c1);
> > insert into t1 values(1);
> > create publication pub1 for table t1;
> > create publication pu2;
> >
> > Node-2:
> > change max_sync_workers_per_subscription to 1 in potgresql.conf
> > create table t1(c1);
> > create table t2(c1);
> > create subscription sub1 connection 'dbname = postgres' publication pub1;
> >
> > Till this point, just allow debuggers in both workers just continue.
> >
> > Node-1:
> > alter publication pub1 add table t2;
> > insert into t1 values(2);
> >
> > Here, we have to debug the apply worker such that when it tries to
> > apply the insert, stop the debugger in function apply_handle_insert()
> > after doing begin_replication_step().
> >
> > Node-2:
> > alter subscription sub1 set pub1, pub2;
> >
> > Now, continue the debugger of apply worker, it should first start the
> > sync worker and then exit because of parameter change. All of these
> > debugging steps are to just ensure the point that it should first
> > start the sync worker and then exit. After this point, table sync
> > worker never finishes and log is filled with messages: "reached
> > max_sync_workers_per_subscription limit" (a newly added message by me
> > in the attached debug patch).
> >
> > Now, it is not completely clear to me how exactly '013_partition.pl'
> > leads to this situation but there is a possibility based on the LOGs
> > it shows.
> >
>
> Thanks, I'll take a look later.
>

This is still failing [1][2].

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2022-03-28%2005%3A16%3A53
[2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2022-03-24%2013%3A13%3A08

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 孔凡深 (云梳) 2022-03-29 10:07:56 回复:POC: Cleaning up orphaned files using undo logs
Previous Message gkokolatos 2022-03-29 09:46:27 Re: Add LZ4 compression in pg_dump