Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-07-19 06:39:30
Message-ID: CAD21AoA-6C22tkA8V9jnDaw2p7fufGMyKoTvXHeBQGL=EaPA2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 17, 2021 at 12:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 12, 2021 at 8:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >>
> > > > > >> >
> > > > > >> > Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?
> > > > > >> >
> > > > > >>
> > > > > >> We can't proceed unless the first error is resolved, so there
> > > > > >> shouldn't be multiple unresolved errors.
> > > > > >
> > > > > >
> > > > > > Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).
> > > > > >
> > > > >
> > > > > Yeah, that is possible but that covers under the second condition
> > > > > mentioned by me and in such cases I think we should have separate rows
> > > > > for each tablesync. Is that right, Sawada-san or do you have something
> > > > > else in mind?
> > > >
> > > > Yeah, I agree to have separate rows for each table sync. The table
> > > > should not be processed by both the table sync worker and the apply
> > > > worker at a time so the pair of subscription OID and relation OID will
> > > > be unique. I think that we have a boolean column in the view,
> > > > indicating whether the error entry is reported by the table sync
> > > > worker or the apply worker, or maybe we also can have the action
> > > > column show "TABLE SYNC" if the error is reported by the table sync
> > > > worker.
> > > >
> > >
> > > Or similar to backend_type (text) in pg_stat_activity, we can have
> > > something like error_source (text) which will display apply worker or
> > > tablesync worker? I think if we have this column then even if there is
> > > a chance that both apply and sync worker operates on the same
> > > relation, we can identify it via this column.
> >
> > Sounds good. I'll incorporate this in the next version patch that I'm
> > planning to submit this week.
>
> Sorry, I could not make it this week. I'll submit them early next week.
> While updating the patch I thought we need to have more design
> discussion on two points of clearing error details after the error is
> resolved:
>
> 1. How to clear apply worker errors. IIUC we've discussed that once
> the apply worker skipped the transaction we leave the error entry
> itself but clear its fields except for some fields such as failure
> counts. But given that the stats messages could be lost, how can we
> ensure to clear those error details? For table sync workers’ error, we
> can have autovacuum workers periodically check entires of
> pg_subscription_rel and clear the error entry if the table sync worker
> completes table sync (i.g., checking if srsubstate = ‘r’). But there
> is no such information for the apply workers and subscriptions. In
> addition to sending the message clearing the error details just after
> skipping the transaction, I thought that we can have apply workers
> periodically send the message clearing the error details but it seems
> not good.

I think that the motivation behind the idea of leaving error entries
and clearing theirs some fields is that users can check if the error
is successfully resolved and the worker is working find. But we can
check it also in another way, for example, checking
pg_stat_subscription view. So is it worth considering leaving the
apply worker errors as they are?

>
> 2. Do we really want to leave the table sync worker even after the
> error is resolved and the table sync completes? Unlike the apply
> worker error, the number of table sync worker errors could be very
> large, for example, if a subscriber subscribes to many tables. If we
> leave those errors in the stats view, it uses more memory space and
> could affect writing and reading stats file performance. If such left
> table sync error entries are not helpful in practice I think we can
> remove them rather than clear some fields. What do you think?
>

I've attached the updated version patch that incorporated all comments
I got so far except for the clearing error details part I mentioned
above. After getting a consensus on those parts, I'll incorporate the
idea into the patches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
v2-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch application/x-patch 45.9 KB
v2-0001-Add-errcontext-to-errors-of-the-applying-logical-.patch application/x-patch 17.2 KB
v2-0002-Add-pg_stat_logical_replication_error-statistics-.patch application/x-patch 37.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-07-19 06:41:35 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Masahiko Sawada 2021-07-19 06:35:47 Re: Skipping logical replication transactions on subscriber side