Re: Design of pg_stat_subscription_workers vs pgstats

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Date: 2022-01-28 05:59:08
Message-ID: CAA4eK1K12FFF0wNV0k6UtLK9BP=S6PawWvepAAVbx7g+hb245Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 28, 2022 at 1:49 AM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Thu, Jan 27, 2022 at 5:08 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Thu, Jan 27, 2022 at 11:16 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> >
>> > On 2022-01-25 20:27:07 +0900, Masahiko Sawada wrote:
>> >
>> > > There will be some challenges in a case where updating pg_subscription_rel
>> > > also failed too (what to report to the user, etc.). And moreover, we don't
>> > > want to consume space for temporary information in the system catalog.
>> >
>> > You're consuming resources in a *WAY* worse way right now. The stats file gets
>> > constantly written out, and quite often read back by backends. In contrast to
>> > parts of pg_subscription_rel or such that data can't be removed from
>> > shared_buffers under pressure.
>> >
>>
>> I don't think pg_subscription_rel is the right place to store error
>> info as the error can happen say while processing some message type
>> like BEGIN where we can't map it to pg_subscription_rel entry. There
>> could be other cases as well where we won't be able to map it to
>> pg_subscription_rel like some error related to some other table while
>> processing trigger function.
>>
>> In general, there doesn't appear to be much advantage in storing all
>> the error info in system catalogs as we don't want it to be persistent
>> (crash-safe). Also, this information is not about any system object
>> that future operations can use, so won't map from that angle as well.
>
>
> Repeating myself here to try and keep complaints regarding pg_stat_subscription_worker in one place.
>
> This is my specific email with respect to the pg_stat_scription_workers design.
>
> https://www.postgresql.org/message-id/CAKFQuwZbFuPSV1WLiNFuODst1sUZon2Qwbj8d9tT%3D38hMhJfvw%40mail.gmail.com
>
> Specifically,
>
> pg_stat_subscription_workers is defined as showing:
> "will contain one row per subscription
> worker on which errors have occurred, for workers applying logical
> replication changes and workers handling the initial data copy of the
> subscribed tables."
>
> The fact that these errors remain (last_error_*) even after they are no longer relevant is my main gripe regarding this feature. The information itself is generally useful though last_error_count is not. These fields should auto-clear and be named current_error_* as they exist for purposes of describing the current state of any error-encountering logical replication worker so that ALTER SUBSCRIPTION SKIP, or someone other manual intervention, can be done with that knowledge without having to scan the subscriber's server logs.
>

We can discuss names of columns but the main reason was that tomorrow
say we want to account for total errors not only the current error
then we have to introduce the field error_count or something like that
which will then conflict with names like current_*. Similar for
transaction abort_count. In the initial versions of the patch, we were
not using last_* for column names but similar arguments led us to
change names to last_ terminology and the same was being used in
pg_stat_archiver. But, feel free to suggest better names. Yes, I agree
with an auto-clear point as well and there seems to be an agreement
for doing it after the next successful apply and or after we skipped
the failed xact.

> This is my email trying to understand reality better in order to figure out what exactly is causing the limitations that are negatively impacting the design of this feature.
>
> https://www.postgresql.org/message-id/CAKFQuwYJ7dsW%2BStsw5%2BZVoY3nwQ9j6pPt-7oYjGddH-h7uVb%2Bg%40mail.gmail.com
>
> In short, it was convenient to use the statistics collector here even if doing so resulted in a non-user friendly (IMO) design. Given all of the limitations to the statistics collection infrastructure, and the fact that this data is not statistical in the usual usage of the term, I find that to be less than satisfying.
>

I think the failures/conflicts are also important information for
users to know, so having a view of those doesn't appear to be a bad
idea. All this data is less suitable for system catalogs like
pg_subscription_rel or others for the reasons quoted in my previous
email [1]. You have already noted one view pg_stat_archiver and we do
have failure information like checksum_failures, deadlocks in some
other views. Then, we have some information like conflicts available
via pg_stat_database_conflicts. I think the error/conflict info about
apply failures is on similar lines.

> To the point that I'd be inclined to revert this feature and hold up the ALTER SUBSCRIPTION SET patch until a more user-friendly design can be done using proper IPC techniques.
>

If we find a different/better way and that is a conclusion then I will
do it. But, in my humble opinion, let's first discuss and see why this
is incorrect? IIUC, your main argument was to allow auto skip instead
of allowing users to fetch XID (from a view or server logs) and then
use it in the command. We are already trying to brainstorm that and
Sawada-San has already proposed a couple of ideas [2]. Also, the other
idea Andres has shared is to use LSN (of the corresponding failed
transaction) instead of XID which I find better.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BMDngbOQfMcAMsrf__s2a-MMMHaCR0zwde3GVeEi-bbQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAD21AoBdEcyXKMCMws7HjcYDbbPyq_KfUbCnTX84rDeP45Hbrw%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tanghy.fnst@fujitsu.com 2022-01-28 06:37:07 RE: Support tab completion for upper character inputs in psql
Previous Message Nathan Bossart 2022-01-28 05:46:20 Re: Add checkpoint and redo LSN to LogCheckpointEnd log message