Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-11-01 01:48:17
Message-ID: CAD21AoD_Fvj-rbEUNGmOJ=Usg0Q669md=bFXTznm1jivXXfvJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Oct 29, 2021 at 10:54 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > >
> > > > > > I've attached updated patches.
> > > >
> > > > Thank you for the comments!
> > > >
> > > > >
> > > > > Few comments:
> > > > > ==============
> > > > > 1. Is the patch cleaning tablesync error entries except via vacuum? If
> > > > > not, can't we send a message to remove tablesync errors once tablesync
> > > > > is successful (say when we reset skip_xid or when tablesync is
> > > > > finished) or when we drop subscription? I think the same applies to
> > > > > apply worker. I think we may want to track it in some way whether an
> > > > > error has occurred before sending the message but relying completely
> > > > > on a vacuum might be the recipe of bloat. I think in the case of a
> > > > > drop subscription we can simply send the message as that is not a
> > > > > frequent operation. I might be missing something here because in the
> > > > > tests after drop subscription you are expecting the entries from the
> > > > > view to get cleared
> > > >
> > > > Yes, I think we can have tablesync worker send a message to drop stats
> > > > once tablesync is successful. But if we do that also when dropping a
> > > > subscription, I think we need to do that only the transaction is
> > > > committed since we can drop a subscription that doesn't have a
> > > > replication slot and rollback the transaction. Probably we can send
> > > > the message only when the subscritpion does have a replication slot.
> > > >
> > >
> > > Right. And probably for apply worker after updating skip xid.
> >
> > I'm not sure it's better to drop apply worker stats after resetting
> > skip xid (i.g., after skipping the transaction). Since the view is a
> > cumulative view and has last_error_time, I thought we can have the
> > apply worker stats until the subscription gets dropped.
> >
>
> Fair enough. So statistics can be removed either by vacuum or drop
> subscription. Also, if we go by this logic then there is no harm in
> retaining the stat entries for tablesync errors. Why have different
> behavior for apply and tablesync workers?

My understanding is that the subscription worker statistics entry
corresponds to workers (but not physical workers since the physical
process is changed after restarting). So if the worker finishes its
jobs, it is no longer necessary to show errors since further problems
will not occur after that. Table sync worker’s job finishes when
completing table copy (unless table sync is performed again by REFRESH
PUBLICATION) whereas apply worker’s job finishes when the subscription
is dropped. Also, I’m concerned about a situation like where a lot of
table sync failed. In which case, if we don’t drop table sync worker
statistics after completing its job, we end up having a lot of entries
in the view unless the subscription is dropped.

>
> I have another question in this regard. Currently, the reset function
> seems to be resetting only the first stat entry for a subscription.
> But can't we have multiple stat entries for a subscription considering
> the view's cumulative nature?

I might be missing your points but I think that with the current
patch, the view has multiple entries for a subscription. That is,
there is one apply worker stats and multiple table sync worker stats
per subscription. And pg_stat_reset_subscription() function can reset
any stats by specifying subscription OID and relation OID.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-11-01 01:54:31 Re: Skipping logical replication transactions on subscriber side
Previous Message Masahiko Sawada 2021-11-01 01:44:34 Re: parallel vacuum comments