Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-07-16 15:02:58
Message-ID: CAD21AoDoQ6pUdXN=wx2UoB5_uWR=24w0q+YwYDr4LEcEjeqxKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jul 12, 2021 at 8:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >>
> > > > >> >
> > > > >> > Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?
> > > > >> >
> > > > >>
> > > > >> We can't proceed unless the first error is resolved, so there
> > > > >> shouldn't be multiple unresolved errors.
> > > > >
> > > > >
> > > > > Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).
> > > > >
> > > >
> > > > Yeah, that is possible but that covers under the second condition
> > > > mentioned by me and in such cases I think we should have separate rows
> > > > for each tablesync. Is that right, Sawada-san or do you have something
> > > > else in mind?
> > >
> > > Yeah, I agree to have separate rows for each table sync. The table
> > > should not be processed by both the table sync worker and the apply
> > > worker at a time so the pair of subscription OID and relation OID will
> > > be unique. I think that we have a boolean column in the view,
> > > indicating whether the error entry is reported by the table sync
> > > worker or the apply worker, or maybe we also can have the action
> > > column show "TABLE SYNC" if the error is reported by the table sync
> > > worker.
> > >
> >
> > Or similar to backend_type (text) in pg_stat_activity, we can have
> > something like error_source (text) which will display apply worker or
> > tablesync worker? I think if we have this column then even if there is
> > a chance that both apply and sync worker operates on the same
> > relation, we can identify it via this column.
>
> Sounds good. I'll incorporate this in the next version patch that I'm
> planning to submit this week.

Sorry, I could not make it this week. I'll submit them early next week.
While updating the patch I thought we need to have more design
discussion on two points of clearing error details after the error is
resolved:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions. In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

2. Do we really want to leave the table sync worker even after the
error is resolved and the table sync completes? Unlike the apply
worker error, the number of table sync worker errors could be very
large, for example, if a subscriber subscribes to many tables. If we
leave those errors in the stats view, it uses more memory space and
could affect writing and reading stats file performance. If such left
table sync error entries are not helpful in practice I think we can
remove them rather than clear some fields. What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-07-16 15:14:34 Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?
Previous Message vignesh C 2021-07-16 14:42:56 Added documentation for cascade and restrict option of drop statistics