Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-11-24 11:43:35
Message-ID: CAD21AoDFQn8EGkB=zL_HXR1mnBmfuEYvz8m6aSeqZfc1skef2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 17, 2021 at 8:14 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Right. I've fixed this issue and attached an updated patch.
> >
>
> Few comments/questions:
> =====================
> 1.
> + <para>
> + The <structname>pg_stat_subscription_workers</structname> view will contain
> + one row per subscription error reported by workers applying logical
> + replication changes and workers handling the initial data copy of the
> + subscribed tables. The statistics entry is removed when the subscription
> + the worker is running on is removed.
> + </para>
>
> The last line of this paragraph is not clear to me. First "the" before
> "worker" in the following part of the sentence seems unnecessary
> "..when the subscription the worker..". Then the part "running on is
> removed" is unclear because it could also mean that we remove the
> entry when a subscription is disabled. Can we rephrase it to: "The
> statistics entry is removed when the corresponding subscription is
> dropped"?

Agreed. Fixed.

>
> 2.
> Between v20 and v23 versions of patch the size of hash table
> PGSTAT_SUBWORKER_HASH_SIZE is increased from 32 to 256. I might have
> missed the comment which lead to this change, can you point me to the
> same or if you changed it for some other reason, can you let me know
> the same?

I'd missed reverting this change. I considered increasing this value
since the lifetime of subscription is long. But when it comes to
unshared hashtable can be expanded on-the-fly, it's better to start
with a small value. Reverted.

>
> 3.
> +
> + /*
> + * Repeat for subscription workers. Similarly, we needn't bother
> + * in the common case where no function stats are being collected.
> + */
>
> /function/subscription workers'

Fixed.

>
> 4.
> + <para>
> + Name of command being applied when the error occurred. This field
> + is always NULL if the error was reported during the initial data
> + copy.
> + </para></entry>
> + </row>
> +
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>xid</structfield> <type>xid</type>
> + </para>
> + <para>
> + Transaction ID of the publisher node being applied when the error
> + occurred. This field is always NULL if the error was reported
> + during the initial data copy.
> + </para></entry>
>
> Is it important to stress on 'always' in the above two descriptions?

No, removed.

>
> 5.
> The current description of first/last_error_time seems sliglthy
> misleading as one can interpret that these are about different errors.
> Let's slightly change the description of first/last_error_time as
> follows or something on those lines:
>
> </para>
> + <para>
> + Time at which the first error occurred
> + </para></entry>
> + </row>
>
> First time at which this error occurred
>
> <structfield>last_error_time</structfield> <type>timestamp with time zone</type>
> + </para>
> + <para>
> + Time at which the last error occurred
>
> Last time at which this error occurred. This will be the same as
> first_error_time except when the same error occurred more than once
> consecutively.

Changed. I've removed first_error_time as per discussion on the thread
for adding xact stats.

>
> 6.
> + </indexterm>
> + <function>pg_stat_reset_subscription_worker</function> (
> <parameter>subid</parameter> <type>oid</type>, <optional>
> <parameter>relid</parameter> <type>oid</type> </optional> )
> + <returnvalue>void</returnvalue>
> + </para>
> + <para>
> + Resets the statistics of a single subscription worker running on the
> + subscription with <parameter>subid</parameter> shown in the
> + <structname>pg_stat_subscription_worker</structname> view. If the
> + argument <parameter>relid</parameter> is not <literal>NULL</literal>,
> + resets statistics of the subscription worker handling the initial data
> + copy of the relation with <parameter>relid</parameter>. Otherwise,
> + resets the subscription worker statistics of the main apply worker.
> + If the argument <parameter>relid</parameter> is omitted, resets the
> + statistics of all subscription workers running on the subscription
> + with <parameter>subid</parameter>.
> + </para>
>
> The first line of this description seems to indicate that we can only
> reset the stats of a single worker but the later part indicates that
> we can reset stats of all subscription workers. Can we change the
> first line as: "Resets the statistics of subscription workers running
> on the subscription with <parameter>subid</parameter> shown in the
> <structname>pg_stat_subscription_worker</structname> view.".
>

Changed.

> 7.
> pgstat_vacuum_stat()
> {
> ..
> + pgstat_setheader(&spmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
> + spmsg.m_databaseid = MyDatabaseId;
> + spmsg.m_nentries = 0;
> ..
> }
>
> Do we really need to set the header here? It seems to be getting set
> in pgstat_send_subscription_purge() while sending this message.

Removed.

>
> 8.
> pgstat_vacuum_stat()
> {
> ..
> +
> + if (hash_search(htab, (void *) &(subwentry->key.subid), HASH_FIND, NULL)
> + != NULL)
> + continue;
> +
> + /* This subscription is dead, add the subid to the message */
> + spmsg.m_subids[spmsg.m_nentries++] = subwentry->key.subid;
> ..
> }
>
> I think it is better to use a separate variable here for subid as we
> are using for funcid and tableid. That will make this part of the code
> easier to follow and look consistent.

Agreed, and changed.

>
> 9.
> +/* ----------
> + * PgStat_MsgSubWorkerError Sent by the apply worker or the table
> sync worker to
> + * report the error occurred during logical replication.
> + * ----------
>
> In this comment "during logical replication" sounds too generic. Can
> we instead use "while processing changes." or something like that to
> make it a bit more specific?

"while processing changes" sounds good.

I've attached an updated version patch. Unless I miss something, all
comments I got so far have been incorporated into this patch. Please
review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
v24-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch application/octet-stream 52.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Ladhe 2021-11-24 11:54:16 Re: Teach pg_receivewal to use lz4 compression
Previous Message Michael Paquier 2021-11-24 11:36:27 Re: VS2022: Support Visual Studio 2022 on Windows