Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-09-24 11:01:00
Message-ID: CAA4eK1JKf2Jejg1abuuU-Gx8Dkw0tey4Jiz5ACyQJ4QzKDvHSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 21, 2021 at 10:23 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've attached the updated version patches. Please review them.
>

Review comments for v14-0001-Add-pg_stat_subscription_errors-statistics-view
==============================================================
1.
<entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>command</structfield> <type>text</type>
+ </para>
+ <para>
+ Name of command being applied when the error occurred. This
+ field is always NULL if the error is reported by the
+ <literal>tablesync</literal> worker.
+ </para></entry>
+ </row>
..
..
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>xid</structfield> <type>xid</type>
+ </para>
+ <para>
+ Transaction ID of the publisher node being applied when the error
+ occurred. This field is always NULL if the error is reported
+ by the <literal>tablesync</literal> worker.
+ </para></entry>

Shouldn't we display command and transaction id even for table sync
worker if it occurs during sync phase (syncing with apply worker
position)

2.
+ /*
+ * The number of not-ready relations can be high for example right
+ * after creating a subscription, so we load the list of
+ * SubscriptionRelState into the hash table for faster lookups.
+ */

I am not sure this optimization of converting to not-ready relations
list to hash table is worth it. Are we expecting thousands of
relations per subscription? I think that will be a rare case even if
it is there.

3.
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+ if (subscriptionHash == NULL)
+ return;
+
+ for (int i = 0; i < msg->m_nentries; i++)
+ {
+ PgStat_StatSubEntry *subent;
+
+ subent = pgstat_get_subscription_entry(msg->m_subids[i], false);
+
+ /*
+ * Nothing to do if the subscription entry is not found. This could
+ * happen when the subscription is dropped and the message for
+ * dropping subscription entry arrived before the message for
+ * reporting the error.
+ */
+ if (subent == NULL)

Is the above comment true even during the purge? I can think of this
during normal processing but not during the purge.

4.
+typedef struct PgStat_MsgSubscriptionErr
+{
+ PgStat_MsgHdr m_hdr;
+
+ /*
+ * m_subid and m_subrelid are used to determine the subscription and the
+ * reporter of this error. m_subrelid is InvalidOid if reported by the
+ * apply worker, otherwise by the table sync worker. In table sync worker
+ * case, m_subrelid must be the same as m_relid.
+ */
+ Oid m_subid;
+ Oid m_subrelid;
+
+ /* Error information */
+ Oid m_relid;

Is m_subrelid is used only to distinguish the type of worker? I think
it could be InvalidOid during the syncing phase in the table sync
worker.

5.
+/*
+ * Subscription error statistics kept in the stats collector, representing
+ * an error that occurred during application of logical replication or

The part of the message " ... application of logical replication ..."
sounds a little unclear. Shall we instead write: " ... application of
logical message ..."?

6.
+typedef struct PgStat_StatSubEntry
+{
+ Oid subid; /* hash table key */
+
+ /*
+ * Statistics of errors that occurred during logical replication. While
+ * having the hash table for table sync errors we have a separate
+ * statistics value for apply error (apply_error), because we can avoid
+ * building a nested hash table for table sync errors in the case where
+ * there is no table sync error, which is the common case in practice.
+ *

The above comment is not clear to me. Why do you need to have a
separate hash table for table sync errors? And what makes it avoid
building nested hash table?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2021-09-24 11:03:38 Re: Empty string in lexeme for tsvector
Previous Message houzj.fnst@fujitsu.com 2021-09-24 08:53:10 RE: Skipping logical replication transactions on subscriber side