Re: replication_origin and replication_origin_lsn usage on subscriber

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, movead(dot)li(at)highgo(dot)ca, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: replication_origin and replication_origin_lsn usage on subscriber
Date: 2020-07-09 13:24:47
Message-ID: CAA4eK1KWNrUQGFeYoLKdHXfGybE62LV1cD1nq7K-N5=xkL=RcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> On 09/07/2020 14:34, Amit Kapila wrote:
> > On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> >>
> >> On 09/07/2020 13:10, Amit Kapila wrote:
> >>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>>>
> >>>> During logical decoding, we send replication_origin and
> >>>> replication_origin_lsn when we decode commit. In pgoutput_begin_txn,
> >>>> we send values for these two but never used on the subscriber side.
> >>>> Though we have provided a function (logicalrep_read_origin) to read
> >>>> these two values but that is not used in code anywhere.
> >>>>
> >>
> >> We don't use the origin message anywhere really because we don't support
> >> origin forwarding in the built-in replication yet. That part I left out
> >> intentionally in the original PG10 patchset as it's mostly useful for
> >> circular replication detection when you want to replicate both ways.
> >> However that's relatively useless without also having some kind of
> >> conflict detection which would be another huge pile of code and I
> >> expected we would end up not getting logical replication in PG10 at all
> >> if I tried to push conflict detection as well :)
> >>
> >
> > Fair enough. However, without tests and more documentation about this
> > concept, it is likely that future development might break it. It is
> > good that you and others who know this part well are there to respond
> > but still, the more documentation and tests would be preferred.
> >
>
> Honestly that part didn't even need to be committed given it's unused.
> Protocol supports versioning so it could have been added at later time.
>
> >>>
> >>> For the purpose of decoding in-progress transactions, I think we can
> >>> send replication_origin in the first 'start' message as it is present
> >>> with each WAL record, however replication_origin_lsn is only logged at
> >>> commit time, so can't send it before commit. The
> >>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
> >>> but it is not clear how and when that function can be used. Do we
> >>> really need replication_origin_lsn before we decode the commit record?
> >>>
> >>
> >> That's the SQL interface, C interface does not require that and I don't
> >> think we need to do that.
> >>
> >
> > I think when you are saying SQL interface, you referred to
> > pg_replication_origin_xact_setup() but I am not sure which C interface
> > you are referring to in the above sentence?
> >
>
> All the stuff pg_replication_origin_xact_setup does internally.
>
> >> The existing apply code sets the
> >> replorigin_session_origin_lsn only when processing commit message IIRC.
> >>
> >
> > That's correct. However, we do send it via 'begin' callback which
> > won't be possible with the streaming of in-progress transactions. Do
> > we need to send this origin related information (origin, origin_lsn)
> > while streaming of in-progress transactions? If so, when? As far as
> > I can see, the origin_id can be sent with the first 'start' message.
> > The origin_lsn and origin_commit can be sent with the last 'start' of
> > streaming commit if we want but not sure if that is of use. If we
> > need to send origin_lsn earlier than that then we need to record it
> > with other WAL records (other than Commit WAL record).
> >
>
> If we were to support the origin forwarding, then strictly speaking we
> need everything only at commit time from correctness perspective,
>

Okay. Anyway streaming mode is optional, so in such cases, we can keep it 'off'

> but
> ideally origin_id would be best sent with first message as it can be
> used to filter out changes at decoding stage rather than while we
> process the commit so having it set early improves performance of decoding.
>

Yeah, makes sense. So, we will just send origin_id (with first
streaming start message) and leave others.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2020-07-09 13:43:53 Re: Is this a bug in pg_current_logfile() on Windows?
Previous Message Fujii Masao 2020-07-09 13:03:10 Re: Implement UNLOGGED clause for COPY FROM