Re: replication_origin and replication_origin_lsn usage on subscriber

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, movead(dot)li(at)highgo(dot)ca, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: replication_origin and replication_origin_lsn usage on subscriber
Date: 2020-07-09 12:44:56
Message-ID: 188d15be-8699-c045-486a-f0439c9c2b7d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 09/07/2020 14:34, Amit Kapila wrote:
> On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>>
>> On 09/07/2020 13:10, Amit Kapila wrote:
>>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>>
>>>> During logical decoding, we send replication_origin and
>>>> replication_origin_lsn when we decode commit. In pgoutput_begin_txn,
>>>> we send values for these two but never used on the subscriber side.
>>>> Though we have provided a function (logicalrep_read_origin) to read
>>>> these two values but that is not used in code anywhere.
>>>>
>>
>> We don't use the origin message anywhere really because we don't support
>> origin forwarding in the built-in replication yet. That part I left out
>> intentionally in the original PG10 patchset as it's mostly useful for
>> circular replication detection when you want to replicate both ways.
>> However that's relatively useless without also having some kind of
>> conflict detection which would be another huge pile of code and I
>> expected we would end up not getting logical replication in PG10 at all
>> if I tried to push conflict detection as well :)
>>
>
> Fair enough. However, without tests and more documentation about this
> concept, it is likely that future development might break it. It is
> good that you and others who know this part well are there to respond
> but still, the more documentation and tests would be preferred.
>

Honestly that part didn't even need to be committed given it's unused.
Protocol supports versioning so it could have been added at later time.

>>>
>>> For the purpose of decoding in-progress transactions, I think we can
>>> send replication_origin in the first 'start' message as it is present
>>> with each WAL record, however replication_origin_lsn is only logged at
>>> commit time, so can't send it before commit. The
>>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
>>> but it is not clear how and when that function can be used. Do we
>>> really need replication_origin_lsn before we decode the commit record?
>>>
>>
>> That's the SQL interface, C interface does not require that and I don't
>> think we need to do that.
>>
>
> I think when you are saying SQL interface, you referred to
> pg_replication_origin_xact_setup() but I am not sure which C interface
> you are referring to in the above sentence?
>

All the stuff pg_replication_origin_xact_setup does internally.

>> The existing apply code sets the
>> replorigin_session_origin_lsn only when processing commit message IIRC.
>>
>
> That's correct. However, we do send it via 'begin' callback which
> won't be possible with the streaming of in-progress transactions. Do
> we need to send this origin related information (origin, origin_lsn)
> while streaming of in-progress transactions? If so, when? As far as
> I can see, the origin_id can be sent with the first 'start' message.
> The origin_lsn and origin_commit can be sent with the last 'start' of
> streaming commit if we want but not sure if that is of use. If we
> need to send origin_lsn earlier than that then we need to record it
> with other WAL records (other than Commit WAL record).
>

If we were to support the origin forwarding, then strictly speaking we
need everything only at commit time from correctness perspective, but
ideally origin_id would be best sent with first message as it can be
used to filter out changes at decoding stage rather than while we
process the commit so having it set early improves performance of decoding.

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2020-07-09 12:59:32 Re: [doc] modifying unit from characters to bytes
Previous Message Amit Kapila 2020-07-09 12:34:19 Re: replication_origin and replication_origin_lsn usage on subscriber