Re: Handle infinite recursion in logical replication setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-03-07 12:20:59
Message-ID: CAA4eK1+Mtz+StvNNtTg9=9BTq8=pMu-V5i4yWqs=KJUh0Z_L4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 7, 2022 at 5:01 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> Hi Vignesh,
> I agree with Peter's comment that the changes to
> FilterRemoteOriginData() should be part of FilterByOrigin()
>
> Further, I wonder why "onlylocal_data" is a replication slot's
> property. A replication slot tracks the progress of replication and it
> may be used by different receivers with different options. I could
> start one receiver which wants only local data, say using
> "pg_logical_slot_get_changes" and later start another receiver which
> fetches all the data starting from where the first receiver left. This
> option prevents such flexibility.
>
> As discussed earlier in the thread, local_only can be property of
> publication or subscription, depending upon the use case, but I can't
> see any reason that it should be tied to a replication slot.
>

I thought it should be similar to 'streaming' option of subscription
but may be Vignesh has some other reason which makes it different.

> I have a similar question for "two_phase" but the ship has sailed and
> probably it makes some sense there which I don't know.
>

two_phase is different from some of the other subscription options
like 'streaming' such that it can be enabled only at the time of slot
and subscription creation, we can't change/specify it via
pg_logical_slot_get_changes. This is to avoid the case where we won't
know at the time of the commit prepared whether the prepare for the
transaction has already been sent. For the same reason, we need to
also know the 'two_phase_at' information.

> As for publication vs subscription, I think both are useful cases.
> 1. It will be a publication's property, if we want the node to not
> publish any data that it receives from other nodes for a given set of
> tables.
> 2. It will be the subscription's property, if we want the subscription
> to decide whether it wants to fetch the data changed on only upstream
> or other nodes as well.
>

I think it could be useful to allow it via both publication and
subscription but I guess it is better to provide it via one way
initially just to keep things simple and give users some way to deal
with such cases. I would prefer to allow it via subscription initially
for the reasons specified by Vignesh in his previous email [1]. Now,
if we think those all are ignorable things and it is more important to
allow this option first by publication or we must allow it via both
publication and subscription then it makes sense to change it.

[1] - https://www.postgresql.org/message-id/CALDaNm3jkotRhKfCqu5CXOf36_yiiW_cYE5%3DbG%3Dj6N3gOWJkqw%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2022-03-07 12:27:02 Re: Expose JIT counters/timing in pg_stat_statements
Previous Message Magnus Hagander 2022-03-07 12:10:32 Re: Add parameter jit_warn_above_fraction