Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-05-31 11:40:50
Message-ID: CAA4eK1KtWKpu3qEOff=2HLD3OiMMd9LKFMuWJzx9=MESaG4fJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 31, 2021 at 12:39 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sat, May 29, 2021 at 3:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > > > 1. the worker records the XID and commit LSN of the failed transaction
> > > > > to a catalog.
> > > > >
> > > >
> > > > When will you record this info? I am not sure if we can try to update
> > > > this when an error has occurred. We can think of using try..catch in
> > > > apply worker and then record it in catch on error but would that be
> > > > advisable? One random thought that occurred to me is to that apply
> > > > worker notifies such information to the launcher (or maybe another
> > > > process) which will log this information.
> > >
> > > Yeah, I was concerned about that too and had the same idea. The
> > > information still could not be written if the server crashes before
> > > the launcher writes it. But I think it's an acceptable.
> > >
> >
> > True, because even if the launcher restarts, the apply worker will
> > error out again and resend the information. I guess we can have an
> > error queue where apply workers can add their information and the
> > launcher will then process those. If we do that, then we need to
> > probably define what we want to do if the queue gets full, either
> > apply worker nudge launcher and wait or it can just throw an error and
> > continue. If you have any better ideas to share this information then
> > we can consider those as well.
>
> +1 for using error queue. Maybe we need to avoid queuing the same
> error more than once to avoid the catalog from being updated
> frequently?
>

Yes, I think it is important because after logging the subscription
may still error again unless the user does something to skip or
resolve the conflict. I guess you need to check for the existence of
error in systable and or in the queue.

> >
> > >
> > > I guess we can update catalog tuples in place when another conflict
> > > happens next time. The catalog tuple should be fixed size. The
> > > already-resolved conflict will have the commit LSN older than its
> > > replication origin's LSN.
> > >
> >
> > Okay, but I have a slight concern that we will keep xid in the system
> > which might have been no longer valid. So, we will keep this info
> > about subscribers around till one performs drop subscription,
> > hopefully, that doesn't lead to too many rows. This will be okay as
> > per the current design but say tomorrow we decide to parallelize the
> > apply for a subscription then there could be multiple errors
> > corresponding to a subscription and in that case, such a design might
> > appear quite limiting. One possibility could be that when the launcher
> > is periodically checking for new error messages, it can clean up the
> > conflicts catalog as well, or maybe autovacuum does this periodically
> > as it does for stats (via pgstat_vacuum_stat).
>
> Yeah, it's better to have a way to cleanup no longer valid entries in
> the catalog in the case where the worker failed to remove it. I prefer
> the former idea so far,
>

Which idea do you refer to here as former (cleaning up by launcher)?

> so I'll implement it in a PoC patch.
>

Okay.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2021-05-31 12:08:11 Re: postgres_fdw: Handle boolean comparison predicates
Previous Message Amit Kapila 2021-05-31 11:16:27 Re: [HACKERS] logical decoding of two-phase transactions