Re: Add an option to skip loading missing publication to avoid logical replication failure

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add an option to skip loading missing publication to avoid logical replication failure
Date: 2025-05-06 10:03:40
Message-ID: CABPTF7V6LPkLFgQTeJ_ZS-96021bM7Y6zKRhUh7-9sKPwq4SHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

A clear benefit of addressing this in code is to ensure that the user sees
the log message, which can be valuable for trouble-shooting—even under race
conditions.

ereport(WARNING,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("skipped loading
publication: %s", pubname),

errdetail("The publication does not
exist at this point in the WAL."),

errhint("Create the publication if
it does not exist."));

The performance impact appears low, assuming the
AcceptInvalidationMessages and maybe_reread_subscription check are
introduced only in the code path that handles keepalive messages requiring
a reply.

>
> > vignesh C <vignesh21(at)gmail(dot)com> writes:
> > > Due to the asynchronous nature of these processes, the ALTER
> > > SUBSCRIPTION command may not be immediately observed by the apply
> > > worker. Meanwhile, the walsender may process and decode an INSERT
> > > statement.
> > > If the insert targets a table (e.g., tab_3) that does not belong to
> > > the current publication (pub1), the walsender silently skips
> > > replicating the record and advances its decoding position. This
> > > position is sent in a keepalive message to the subscriber, and since
> > > there are no pending transactions to flush, the apply worker reports
> > > it as the latest received LSN.
> >
> > So this theory presumes that the apply worker receives and reacts to
> > the keepalive message, yet it has not observed a relevant
> > subscriber-side catalog update that surely committed before the
> > keepalive was generated. It's fairly hard to see how that is okay,
> > because it's at least adjacent to something that must be considered a
> > bug: applying transmitted data without having observed DDL updates to
> > the target table. Why is the processing of keepalives laxer than the
> > processing of data messages?
> >
>
> Valid question, as of now, we don't have a specific rule about
> ordering the processing of keepalives or invalidation messages. The
> effect of invalidation messages is realized by calling
> maybe_reread_subscription at three different times after accepting
> invalidation message, (a) after starting a transaction in
> begin_replication_step, (b) in the commit message handling if there is
> no data modification happened in that transaction, and (c) when we
> don't get any transactions for a while
>
> The (a) ensures we consume any target table change before applying a
> new transaction. The other two places ensure that we keep consuming
> invalidation messages from time to time.
>
> Now, we can consume invalidation messages during keepalive message
> handling and or at some other places, to ensure that we never process
> any remote message before consuming an invalidation message. However,
> it is not clear to if this is a must kind of thing. We can provide
> strict guarantees for ordering of messages from any one of the
> servers, but providing it across nodes doesn't sound to be a
> must-criterion.
>
> --
> With Regards,
> Amit Kapila.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-05-06 10:31:38 Re: bug: virtual generated column can be partition key
Previous Message Yura Sokolov 2025-05-06 09:57:29 Re: bug: virtual generated column can be partition key