Re: Design of pg_stat_subscription_workers vs pgstats

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Date: 2022-02-01 06:17:04
Message-ID: CAD21AoCWZDmW01BVpr_h=4uJERTBo3qFO6n_hzE0vynAH+=Gpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 28, 2022 at 2:59 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jan 28, 2022 at 1:49 AM David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 27, 2022 at 5:08 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>
> >> On Thu, Jan 27, 2022 at 11:16 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> >
> >> > On 2022-01-25 20:27:07 +0900, Masahiko Sawada wrote:
> >> >
> >> > > There will be some challenges in a case where updating pg_subscription_rel
> >> > > also failed too (what to report to the user, etc.). And moreover, we don't
> >> > > want to consume space for temporary information in the system catalog.
> >> >
> >> > You're consuming resources in a *WAY* worse way right now. The stats file gets
> >> > constantly written out, and quite often read back by backends. In contrast to
> >> > parts of pg_subscription_rel or such that data can't be removed from
> >> > shared_buffers under pressure.
> >> >
> >>
> >> I don't think pg_subscription_rel is the right place to store error
> >> info as the error can happen say while processing some message type
> >> like BEGIN where we can't map it to pg_subscription_rel entry. There
> >> could be other cases as well where we won't be able to map it to
> >> pg_subscription_rel like some error related to some other table while
> >> processing trigger function.
> >>
> >> In general, there doesn't appear to be much advantage in storing all
> >> the error info in system catalogs as we don't want it to be persistent
> >> (crash-safe). Also, this information is not about any system object
> >> that future operations can use, so won't map from that angle as well.
> >
> >
> > Repeating myself here to try and keep complaints regarding pg_stat_subscription_worker in one place.
> >
> > This is my specific email with respect to the pg_stat_scription_workers design.
> >
> > https://www.postgresql.org/message-id/CAKFQuwZbFuPSV1WLiNFuODst1sUZon2Qwbj8d9tT%3D38hMhJfvw%40mail.gmail.com
> >
> > Specifically,
> >
> > pg_stat_subscription_workers is defined as showing:
> > "will contain one row per subscription
> > worker on which errors have occurred, for workers applying logical
> > replication changes and workers handling the initial data copy of the
> > subscribed tables."
> >
> > The fact that these errors remain (last_error_*) even after they are no longer relevant is my main gripe regarding this feature. The information itself is generally useful though last_error_count is not. These fields should auto-clear and be named current_error_* as they exist for purposes of describing the current state of any error-encountering logical replication worker so that ALTER SUBSCRIPTION SKIP, or someone other manual intervention, can be done with that knowledge without having to scan the subscriber's server logs.
> >
>
> We can discuss names of columns but the main reason was that tomorrow
> say we want to account for total errors not only the current error
> then we have to introduce the field error_count or something like that
> which will then conflict with names like current_*. Similar for
> transaction abort_count. In the initial versions of the patch, we were
> not using last_* for column names but similar arguments led us to
> change names to last_ terminology and the same was being used in
> pg_stat_archiver. But, feel free to suggest better names. Yes, I agree
> with an auto-clear point as well and there seems to be an agreement
> for doing it after the next successful apply and or after we skipped
> the failed xact.
>
> > This is my email trying to understand reality better in order to figure out what exactly is causing the limitations that are negatively impacting the design of this feature.
> >
> > https://www.postgresql.org/message-id/CAKFQuwYJ7dsW%2BStsw5%2BZVoY3nwQ9j6pPt-7oYjGddH-h7uVb%2Bg%40mail.gmail.com
> >
> > In short, it was convenient to use the statistics collector here even if doing so resulted in a non-user friendly (IMO) design. Given all of the limitations to the statistics collection infrastructure, and the fact that this data is not statistical in the usual usage of the term, I find that to be less than satisfying.
> >
>
> I think the failures/conflicts are also important information for
> users to know, so having a view of those doesn't appear to be a bad
> idea. All this data is less suitable for system catalogs like
> pg_subscription_rel or others for the reasons quoted in my previous
> email [1].

I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP
feature to pass error-XID or error-LSN information to the worker
whereas I'm also not sure of the advantages in storing all error
information in a system catalog. Since what we need to do for this
purpose is only error-XID/LSN, we can store only error-XID/LSN in the
catalog? That is, the worker stores error-XID/LSN in the catalog on an
error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip
the transaction in question. The worker clears the error-XID/LSN after
successfully applying or skipping the first non-empty transaction.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-02-01 06:27:56 Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Previous Message Amit Kapila 2022-02-01 05:28:59 Re: row filtering for logical replication