Re: Design of pg_stat_subscription_workers vs pgstats

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Date: 2022-02-24 08:53:33
Message-ID: CAHut+Pv=VmXtHmPKp4fg8VDF+TQP6xWgL91Jn-hrqg5QObfCZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. Below are my review comments for the v2 patch.

======

1. Commit message

This patch changes the pg_stat_subscription_workers view (introduced
by commit 8d74fc9) so that it stores only statistics counters:
apply_error_count and sync_error_count, and has one entry for
subscription.

SUGGESTION
"and has one entry for subscription." --> "and has one entry for each
subscription."

~~~

2. Commit message

After removing these error details, there are no longer relation
information, so the subscription statistics are now a cluster-wide
statistics.

SUGGESTION
"there are no longer relation information," --> "there is no longer
any relation information,"

~~~

3. doc/src/sgml/monitoring.sgml

- <para>
- The error message
+ Number of times the error occurred during the application of changes
</para></entry>
</row>

<row>
<entry role="catalog_table_entry"><para role="column_definition">
- <structfield>last_error_time</structfield> <type>timestamp
with time zone</type>
+ <structfield>sync_error_count</structfield> <type>bigint</type>
</para>
<para>
- Last time at which this error occurred
+ Number of times the error occurred during the initial table
+ synchronization
</para></entry>

SUGGESTION (both places)
"Number of times the error occurred ..." --> "Number of times an error
occurred ..."

~~~

4. doc/src/sgml/monitoring.sgml - missing column

(duplicate - also reported by [Tang-v2])

The PG docs for the new "stats_reset" column are missing.

~~~

5. src/backend/catalog/system_views.sql - whitespace

(duplicate - also reported by [Tang-v2])

- JOIN pg_subscription s ON (w.subid = s.oid);
+ a.apply_error_count,
+ a.sync_error_count,
+ a.stats_reset
+ FROM pg_subscription as s,
+ pg_stat_get_subscription_activity(oid) as a;

inconsistent tab/space indenting for 'a.stats_reset'.

~~~

6. src/backend/postmaster/pgstat.c - function name

+/* ----------
+ * pgstat_reset_subscription_counter() -
+ *
+ * Tell the statistics collector to reset a single subscription
+ * counter, or all subscription counters (when subid is InvalidOid).
+ *
+ * Permission checking for this function is managed through the normal
+ * GRANT system.
+ * ----------
+ */
+void
+pgstat_reset_subscription_counter(Oid subid)

SUGGESTION (function name)
"pgstat_reset_subscription_counter" -->
"pgstat_reset_subscription_counters" (plural)

~~

7. src/backend/postmaster/pgstat.c - pgstat_recv_resetsubcounter

@@ -5645,6 +5598,51 @@
pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
}
}

+/* ----------
+ * pgstat_recv_resetsubcounter() -
+ *
+ * Reset some subscription statistics of the cluster.
+ * ----------
+ */
+static void
+pgstat_recv_resetsubcounter(PgStat_MsgResetsubcounter *msg, int len)

"Reset some" seems a bit vague. Why not describe that it is all or
none according to the msg->m_subid?

~~~

8. src/backend/postmaster/pgstat.c - pgstat_recv_resetsubcounter

+ if (!OidIsValid(msg->m_subid))
+ {
+ HASH_SEQ_STATUS sstat;
+
+ /* Clear all subscription counters */
+ hash_seq_init(&sstat, subscriptionStatHash);
+ while ((subentry = (PgStat_StatSubEntry *) hash_seq_search(&sstat)) != NULL)
+ pgstat_reset_subscription(subentry, ts);
+ }
+ else
+ {
+ /* Get the subscription statistics to reset */
+ subentry = pgstat_get_subscription_entry(msg->m_subid, false);
+
+ /*
+ * Nothing to do if the given subscription entry is not found. This
+ * could happen when the subscription with the subid is removed and
+ * the corresponding statistics entry is also removed before receiving
+ * the reset message.
+ */
+ if (!subentry)
+ return;
+
+ /* Reset the stats for the requested replication slot */
+ pgstat_reset_subscription(subentry, ts);
+ }
+}

Why not reverse the if/else?

Checking OidIsValid(...) seems more natural than checking !OidIsValid(...)

~~~

9. src/backend/postmaster/pgstat.c - pgstat_recv_subscription_purge

static void
pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
{
/* Return if we don't have replication subscription statistics */
if (subscriptionStatHash == NULL)
return;

/* Remove from hashtable if present; we don't care if it's not */
(void) hash_search(subscriptionStatHash, (void *) &(msg->m_subid),
HASH_REMOVE, NULL);
}

SUGGESTION
Wouldn't the above code be simpler written like:

if (subscriptionStatHash)
{
/* Remove from hashtable if present; we don't care if it's not */
(void) hash_search(subscriptionStatHash, (void *) &(msg->m_subid),
HASH_REMOVE, NULL);
}
~~~

10. src/backend/replication/logical/worker.c

(from my previous [Peter-v1] #9)

>> + /* Report the worker failed during table synchronization */
>> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, false);
>>
>> and
>>
>> + /* Report the worker failed during the application of the change */
>> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, true);
>>
>>
>> Why don't these use MySubscription->oid instead of MyLogicalRepWorker->subid?

> It's just because we used to use MyLogicalRepWorker->subid, is there
> any particular reason why we should use MySubscription->oid here?

I felt MySubscription->oid is a more natural and more direct way of
expressing the same thing.

Consider: "the oid of the current subscription" versus "the oid of
the subscription of the current worker". IMO the first one is simpler.
YMMV.

Also, it is shorter :)

~~~

11. src/include/pgstat.h - enum order

(follow-on from my previous v1 review comment #10)

>I assume you're concerned about binary compatibility or something. I
>think it should not be a problem since both
>PGSTAT_MTYPE_SUBWORKERERROR and PGSTAT_MTYPE_SUBSCRIPTIONPURGE are
>introduced to PG15.

Yes, maybe it is OK for those ones. But now in v2 there is a new
PGSTAT_MTYPE_RESETSUBCOUNTER.

Shouldn't at least that one be put at the end for the same reason?

~~~

12. src/include/pgstat.h - PgStat_MsgResetsubcounter

Maybe a better name for this is "PgStat_MsgResetsubcounters" (plural)?

~~~

13. src/test/subscription/t/026_worker_stats.pl - missing test?

Shouldn't there also be some test to reset the counters to confirm
that they really do get reset to zero?

~~~

14. src/tools/pgindent/typedefs.list

PgStat_MsgResetsubcounter (from pgstat.h) is missing?

------

15. pg_stat_subscription_activity view name?

Has the view name already been decided or still under discussion - I
was not sure?

If is it already decided then fine, but if not then my vote would be
for something different like:
e.g.1 - pg_stat_subscription_errors
e.g.2 - pg_stat_subscription_counters
e.g.3 - pg_stat_subscription_metrics

Maybe "activity" was chosen to be deliberately vague in case some
future unknown stats columns get added? But it means now there is a
corresponding function "pg_stat_reset_subscription_activity", when in
practice you don't really reset activity - what you want to do is
reset some statistics *about* the activity... so it all seems a bit
odd to me.

------
[Tang-v2] https://www.postgresql.org/message-id/OS0PR01MB6113769B17E90ADC9ACA14B2FB3D9%40OS0PR01MB6113.jpnprd01.prod.outlook.com
[Peter-v1] https://www.postgresql.org/message-id/CAHut%2BPtH-uN5rbGRh-%3DkCd8xvQYDf_JCcjLcVjW3OXGz6T%2BxCw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2022-02-24 08:56:36 Re: Support logical replication of DDLs
Previous Message Kyotaro Horiguchi 2022-02-24 08:27:03 Re: [BUG] Panic due to incorrect missingContrecPtr after promotion