Re: Skipping logical replication transactions on subscriber side

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2022-01-22 16:21:24
Message-ID: CAKFQuwaLfgeForu2XYgaCuzRg5=8=5vPe13kKwxGbwu0LyeJgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 22, 2022 at 2:41 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Sat, Jan 22, 2022 at 12:41 PM David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> >
> > On Fri, Jan 21, 2022 at 10:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >>
> >> On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
> >> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> >> >
> >> > On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >> >>
>
> >
> > I agree with incorporating "reset" into the paragraph somehow - does not
> have to mention NONE, just that ALTER SUBSCRIPTION SKIP (not a family
> friendly abbreviation...) is what does it.
> >
>
> It is not clear to me what you have in mind here but to me in this
> context saying "Setting <literal>NONE</literal> resets the transaction
> ID." seems quite reasonable.
>

OK

>
> Yeah, we will error in that situation. The only invalid values are
> system reserved values (1,2).
>

So long as the ALTER command errors when asked to skip those IDs there
isn't any reason for an end-user, who likely doesn't know or care that 1
and 2 are special, to be concerned about them (the only two invalid values)
while reading the docs.

> > Or, why even force them to specify a number instead of just saying SKIP
> and if there is a current error we skip its transaction, otherwise we warn
> them that nothing happened because there is no last error.
> >
>
> The idea is that we might extend this feature to skip specific
> operations on relations or maybe by having other identifiers.

Again, you've already got syntax reserved that lets you add more features
to this command in the future; and removing warnings or errors because new
features make them moot is easy. Lets document and code what we are
willing to implement today. A single top-level transaction xid that is
presently blocking the worker from applying any more WAL.

One idea
> we discussed was to automatically fetch the last error xid but then
> decided it can be done as a later patch.
>

This seems backwards. The user-friendly approach is to not make them type
in anything at all. That said, this particular UX seems like it could use
some safety. Thus I would propose at this time that attempting to set the
skip_option to anything but THE active_error_xid for the named subscription
results in an error. Once you add new features the user can set the
skip_option to other things without provoking errors. Again, I consider
this a safety feature since the user now has to accurately match the xid to
the name in the SQL in order to perform a successful skip - and the to-be
affected transaction has to be one that is preventing replication from
moving forward. I'm not interested in providing a foot-gun where an
arbitrary future transaction can be scheduled to be skipped. Running the
command twice with the same values should provoke an error since the first
run should be allowed to finish (?). Also, we handle the situation where
the state of the worker changes between when the user saw the error and
wrote down the xid to skip and the actual execution of the alter command.
Maybe not highly anticipated scenarios but this is an easy win to deal with
them.

> > Additionally, the description for pg_stat_subscription_workers should
> describe what happens once the transaction represented by last_error_xid
> has either been successfully processed or skipped. Does this "last error"
> stick around until another error happens (which is hopefully very rare) or
> does it reset to blanks?
> >
>
> It will be reset only on subscription drop, otherwise, it will stick
> around until another error happens.

I really dislike the user experience this provides, and given it is new in
v15 (and right now this table seems to exist solely to support this
feature) changing this seems within the realm of possibility. I have to
imagine these workers have a sense of local state that would just be "no
errors, no need to touch pg_stat_subscription_workers at the end of this
transaction's commit". It would save a local state of the error_xid and if
a successfully committed transaction has that xid it would clear the
error. The skip code path would also check for and see the matching xid
value and clear the error. Even if the local state thing doesn't work, one
catalog lookup per transaction seems like potentially reasonable overhead
to incur here.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-01-22 16:47:14 Re: Skipping logical replication transactions on subscriber side
Previous Message Andrew Dunstan 2022-01-22 16:00:54 Re: fairywren is generating bogus BASE_BACKUP commands