Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-06-01 04:01:33
Message-ID: CAA4eK1+2O68tkwdZsyfw3aZ7zB4YdejM4GzCciRtwcON6gBbTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 1, 2021 at 12:55 AM Peter Eisentraut
<peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>
> On 27.05.21 12:04, Amit Kapila wrote:
> >>> Also, I am thinking that instead of a stat view, do we need
> >>> to consider having a system table (pg_replication_conflicts or
> >>> something like that) for this because what if stats information is
> >>> lost (say either due to crash or due to udp packet loss), can we rely
> >>> on stats view for this?
> >> Yeah, it seems better to use a catalog.
> >>
> > Okay.
>
> Could you store it shared memory? You don't need it to be crash safe,
> since the subscription will just run into the same error again after
> restart. You just don't want it to be lost, like with the statistics
> collector.
>

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

Also, I think we can't assume after the restart we will get the same
error because the user can perform some operations after the restart
and before we try to apply the same transaction. It might be that the
user wanted to see all the errors before the user can set the skip
identifier (and or method).

I think the XID (or say another identifier like commitLSN) which we
want to use for skipping the transaction as specified by the user has
to be stored in the catalog because otherwise, after the restart we
won't remember it and the user won't know that he needs to set it
again. Now, say we have multiple skip identifiers (XIDs, commitLSN,
..), isn't it better to store all conflict-related information in a
separate catalog like pg_subscription_conflict or something like that.
I think it might be also better to later extend it for auto conflict
resolution where the user can specify auto conflict resolution info
for a subscription. Is it better to store all such information in
pg_subscription or have a separate catalog? It is possible that even
if we have a separate catalog for conflict info, we might not want to
store error info there.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2021-06-01 04:03:22 Re: Duplicate history file?
Previous Message Noah Misch 2021-06-01 03:48:58 Re: A new function to wait for the backend exit after termination