Re: Skipping logical replication transactions on subscriber side

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2022-01-22 07:10:53
Message-ID: CAKFQuwYmGrGu63NZ-xKsb5be53sTyYSdDhaqScMuS=aJs_VF1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 21, 2022 at 10:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> >
> > On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >>
> >> Apart from this, I have changed a few comments and ran pgindent. Do
> >> let me know what you think of the changes?
> >
> >
> > The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily
> repetitive. Consider:
> > """
> > Skips applying all changes of the specified remote transaction, whose
> value should be obtained from pg_stat_subscription_workers.last_error_xid.
> >
>
> Here, you can also say that the value can be found from server logs as
> well.
>

subscriber's server logs, right? I would agree that adding that for
completeness is warranted.

> > Then change the subskipxid column description to be:
> > """
> > ID of the transaction whose changes are to be skipped. It is 0 when
> there are no pending skips. This is set by issuing ALTER SUBSCRIPTION SKIP
> and resets back to 0 when the identified transactions passes through the
> subscription stream and is successfully ignored.
> > """
> >
>
> Users can manually reset it by specifying NONE, so that should be
> covered in the above text, otherwise, looks good.
>

I agree with incorporating "reset" into the paragraph somehow - does not
have to mention NONE, just that ALTER SUBSCRIPTION SKIP (not a family
friendly abbreviation...) is what does it.

> > I don't understand why/how ", if a valid transaction ID;" comes into
> play (how would we know whether it is valid, or if we do ALTER SUBSCRIPTION
> SKIP should prohibit the invalid value from being chosen).
> >
>
> What do you mean by invalid value here? Is it the value lesser than
> FirstNormalTransactionId or a value that is of the non-error
> transaction? For the former, we already have a check in the patch and
> for later we can't identify it with any certainty because the error
> stats are collected by the stats collector.
>

The original proposal qualifies the non-zero transaction id in
subskipxid as being a "valid transaction ID" and that invalid ones (which
is how "otherwise" is interpreted given the "valid" qualification preceding
it) are shown as 0. As an end-user that makes me wonder what it means for
a transaction ID to be invalid. My point is that dropping the mention of
"valid transaction ID" avoids that and lets the reader operate with an
understanding that things should "just work". If I see a non-zero in the
column I have a pending skip and if I see zero I do not. My wording
assumes it is that simple. If it isn't I would need some clarity as to why
it is not in order to write something I could read and understand from my
inexperienced user-centric point-of-view.

I get that I may provide a transaction ID that is invalid such that the
system could never see it (or at least not for a long while) - say we
error on transaction 102 and I typo it as 1002 or 101. But I would expect
either an error where I make the typo or the numbers 1002 or 101 to appear
on the table. I would not expect my 101 typo to result in a 0 appearing on
the table (and if it does so today I argue that is a POLA violation).
Thus, "if a valid transaction ID" from the original text just doesn't make
sense to me.

In typical usage it would seem strange to allow a skip to be recorded if
there is no existing error in the subscription. Should we (do we, haven't
read the code) warn in that situation?

*Or, why even force them to specify a number instead of just saying SKIP
and if there is a current error we skip its transaction, otherwise we warn
them that nothing happened because there is no last error.*

Additionally, the description for pg_stat_subscription_workers should
describe what happens once the transaction represented by last_error_xid
has either been successfully processed or skipped. Does this "last error"
stick around until another error happens (which is hopefully very rare) or
does it reset to blanks? Seems like it should reset, which really makes
this more of an "active_error" instead of a "last_error". This system is
linear, we are stuck until this error is resolved, making it active.

> > I'm against mentioning subtransactions in the skip_option description.
> >
>
> We have mentioned that because currently, we don't support it but in
> the future one can come up with an idea to support it. What problem do
> you see with it?
>

If you ever get around to implementing the feature then by all means add
it. My main issue is that we basically never talk about subtransactions in
the user-facing documentation and it doesn't seem desirable to do so here.
Knowing that a whole transaction is skipped is all I need to care about as
a user. I believe that no users will be asking "what about subtransactions
(savepoints)" but by mentioning it less experienced ones will now have
something to be curious about that they really do not need to be.

>
> > The Logical Replication page changes provide good content overall but I
> dislike going into detail about how to perform conflict resolution in the
> third paragraph and then summarize the various forms of conflict resolution
> in the newly added forth. Maybe re-work things like:
>
> Personally, I don't see much value in the split (especially giving
> context like "Prior to v15 ..) but specifying the circumstances where
> each of the options could be useful.
>

Yes, I've been reminded of the desire to avoid mentioning versions and
agree doing so here is correct. The added context is desired, the style
depends on the content.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shruthi Gowda 2022-01-22 07:17:35 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Previous Message Michael Paquier 2022-01-22 05:46:51 Re: Refactoring of compression options in pg_basebackup