Re: Handle infinite recursion in logical replication setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2023-01-10 15:17:13
Message-ID: CAA4eK1LRBo1OA3pp1CY7X35uJhLr3fUVkfbe1AzMn8oPKyJo_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 10, 2023 at 8:13 AM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
>
> On 9/12/22 1:23 AM, vignesh C wrote:
> > On Fri, 9 Sept 2022 at 11:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > Thanks for pushing the patch. I have closed this entry in commitfest.
> > I will wait for some more time and see the response regarding the
> > documentation patch and then start a new thread if required.
>
> I've been testing this patch in advancing of working on the
> documentation and came across a behavior I wanted to note. Specifically,
> I am hitting a deadlock while trying to synchronous replicate between
> the two instances at any `synchronous_commit` level above `local`.
>
> Here is my set up. I have two instances, "A" and "B".
>
> On A and B, run:
>
> CREATE TABLE sync (id int PRIMARY KEY, info float);
> CREATE PUBLICATION sync FOR TABLE sync;
>
> On A, run:
>
> CREATE SUBSCRIPTION sync
> CONNECTION 'connstr-to-B'
> PUBLICATION sync
> WITH (
> streaming=true, copy_data=false,
> origin=none, synchronous_commit='on');
>
> On B, run:
>
> CREATE SUBSCRIPTION sync
> CONNECTION 'connstr-to-A'
> PUBLICATION sync
> WITH (
> streaming=true, copy_data=false,
> origin=none, synchronous_commit='on');
>
> On A and B, run:
>
> ALTER SYSTEM SET synchronous_standby_names TO 'sync';
> SELECT pg_reload_conf();
>
> Verify on A and B that pg_stat_replication.sync_state is set to "sync"
>
> SELECT application_name, sync_state = 'sync' AS is_sync
> FROM pg_stat_replication
> WHERE application_name = 'sync';
>
> The next to commands should be run simultaneously on A and B:
>
> -- run this on A
> INSERT INTO sync
> SELECT x, random() FROM generate_series(1,2000000, 2) x;
>
> -- run this on B
> INSERT INTO sync
> SELECT x, random() FROM generate_series(2,2000000, 2) x;
>
> This consistently created the deadlock in my testing.
>
> Discussing with Masahiko off-list, this is due to a deadlock from 4
> processes: the walsenders on A and B, and the apply workers on A and B.
> The walsenders are waiting for feedback from the apply workers, and the
> apply workers are waiting for the walsenders to synchronize (I may be
> oversimplifying).
>
> He suggested I try the above example instead with `synchronous_commit`
> set to `local`. In this case, I verified that there is no more deadlock,
> but he informed me that we would not be able to use cascading
> synchronous replication when "origin=none".
>

This has nothing to do with the origin feature. I mean this should
happen with origin=any or even in PG15 without using origin at all.
Am, I missing something? One related point to note is that in physical
replication cascading replication is asynchronous. See docs [1]
(Cascading replication is currently asynchronous....)

> If we decide that this is a documentation issue, I'd suggest we improve
> the guidance around using `synchronous_commit`[1] on the CREATE
> SUBSCRIPTION page, as the GUC page[2] warns against using `local`:
>

Yeah, but on Create Subscription page, we have mentioned that it is
safe to use off for logical replication. One can use local or higher
for reducing the latency for COMMIT when synchronous replication is
used in the publisher. Won't using 'local' while creating subscription
would suffice the need to consistently replicate the data? I mean it
is equivalent to somebody using levels greater than local in case of
physical replication. I think in the case of physical replication, we
won't wait for standby to replicate to another node before sending a
response, so why to wait in the case of logical replication? If this
understanding is correct, then probably it is sufficient to support
'local' for a subscription.

[1] - https://www.postgresql.org/docs/devel/warm-standby.html

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-01-10 15:17:58 Re: Announcing Release 15 of the PostgreSQL Buildfarm client
Previous Message Andrew Dunstan 2023-01-10 15:03:20 Re: SQL/JSON revisited