Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-05-27 10:04:37
Message-ID: CAA4eK1K=9Z1qTRQ8FDUuoK1r4te9TgUHka97cG0Ua65SaTyEPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 27, 2021 at 1:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, May 27, 2021 at 2:48 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > Okay, that makes sense but still not sure how will you identify if we
> > need to reset XID in case of failure doing that in the previous
> > attempt.
>
> It's a just idea but we can record the failed transaction with XID as
> well as its commit LSN passed? The sequence I'm thinking is,
>
> 1. the worker records the XID and commit LSN of the failed transaction
> to a catalog.
>

When will you record this info? I am not sure if we can try to update
this when an error has occurred. We can think of using try..catch in
apply worker and then record it in catch on error but would that be
advisable? One random thought that occurred to me is to that apply
worker notifies such information to the launcher (or maybe another
process) which will log this information.

> 2. the user specifies how to resolve that conflict transaction
> (currently only 'skip' is supported) and writes to the catalog.
> 3. the worker does the resolution method according to the catalog. If
> the worker didn't start to apply those changes, it can skip the entire
> transaction. If did, it rollbacks the transaction and ignores the
> remaining.
>
> The worker needs neither to reset information of the last failed
> transaction nor to mark the conflicted transaction as resolved. The
> worker will ignore that information when checking the catalog if the
> commit LSN is passed.
>

So won't this require us to check the required info in the catalog
before applying each transaction? If so, that might be overhead, maybe
we can build some cache of the highest commitLSN that can be consulted
rather than the catalog table. I think we need to think about when to
remove rows for which conflict has been resolved as we can't let that
information grow infinitely.

> > Also, I am thinking that instead of a stat view, do we need
> > to consider having a system table (pg_replication_conflicts or
> > something like that) for this because what if stats information is
> > lost (say either due to crash or due to udp packet loss), can we rely
> > on stats view for this?
>
> Yeah, it seems better to use a catalog.
>

Okay.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-05-27 10:26:33 Re: Skipping logical replication transactions on subscriber side
Previous Message Bharath Rupireddy 2021-05-27 09:47:01 Re: Parallel Inserts in CREATE TABLE AS