Re: Column Filtering in Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-20 06:23:41
Message-ID: CAA4eK1JcQRQw0G-U4A+vaGaBWSvggYMMDJH4eDtJ0Yf2eUYXyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 20, 2022 at 8:41 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Mar 18, 2022 at 10:42 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> > So the question is why those two sync workers never complete - I guess
> > there's some sort of lock wait (deadlock?) or infinite loop.
> >
>
> It would be a bit tricky to reproduce this even if the above theory is
> correct but I'll try it today or tomorrow.
>

I am able to reproduce it with the help of a debugger. Firstly, I have
added the LOG message and some While (true) loops to debug sync and
apply workers. Test setup

Node-1:
create table t1(c1);
create table t2(c1);
insert into t1 values(1);
create publication pub1 for table t1;
create publication pu2;

Node-2:
change max_sync_workers_per_subscription to 1 in potgresql.conf
create table t1(c1);
create table t2(c1);
create subscription sub1 connection 'dbname = postgres' publication pub1;

Till this point, just allow debuggers in both workers just continue.

Node-1:
alter publication pub1 add table t2;
insert into t1 values(2);

Here, we have to debug the apply worker such that when it tries to
apply the insert, stop the debugger in function apply_handle_insert()
after doing begin_replication_step().

Node-2:
alter subscription sub1 set pub1, pub2;

Now, continue the debugger of apply worker, it should first start the
sync worker and then exit because of parameter change. All of these
debugging steps are to just ensure the point that it should first
start the sync worker and then exit. After this point, table sync
worker never finishes and log is filled with messages: "reached
max_sync_workers_per_subscription limit" (a newly added message by me
in the attached debug patch).

Now, it is not completely clear to me how exactly '013_partition.pl'
leads to this situation but there is a possibility based on the LOGs
it shows.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
debug_sub_workers_1.patch application/octet-stream 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2022-03-20 07:11:43 Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Previous Message Dilip Kumar 2022-03-20 05:34:39 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints