Re: Failed transaction statistics to measure the logical replication progress

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Failed transaction statistics to measure the logical replication progress
Date: 2021-12-06 14:27:01
Message-ID: CALDaNm1T3qnUnCiTytACUFiw=vktBPi9dvy8iNXCNAyPALOP_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 4, 2021 at 6:32 PM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Friday, December 3, 2021 3:12 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > Thanks for the updated patch.
> > Currently we are storing the commit count, error_count and abort_count for
> > each table of the table sync operation. If we have thousands of tables, we will
> > be storing the information for each of the tables.
> > Shouldn't we be storing the consolidated information in this case.
> > diff --git a/src/backend/replication/logical/tablesync.c
> > b/src/backend/replication/logical/tablesync.c
> > index f07983a..02e9486 100644
> > --- a/src/backend/replication/logical/tablesync.c
> > +++ b/src/backend/replication/logical/tablesync.c
> > @@ -1149,6 +1149,11 @@ copy_table_done:
> > MyLogicalRepWorker->relstate_lsn = *origin_startpos;
> > SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >
> > + /* Report the success of table sync. */
> > + pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
> > +
> > MyLogicalRepWorker->relid,
> > +
> > 0 /* no logical message type */ );
> Okay.
>
> I united all stats into that of apply worker.
> In line with this change, I fixed the TAP tests as well
> to cover the updates of stats done by table sync workers.
>
> Also, during my self-review, I noticed that
> I should call pgstat_report_subworker_xact_end() before
> process_syncing_tables() because it can lead to process
> exit, which results in missing one increment of the stats columns.
> I noted this point in a comment as well.

Thanks for the updated patch, few comments:
1) We can keep the documentation similar to mention the count includes
both table sync worker / main apply worker in case of
commit_count/error_count and abort_count to keep it consistent.
+ <structfield>commit_count</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of transactions successfully applied in this subscription.
+ COMMIT and COMMIT PREPARED increments this counter.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>error_count</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of transactions that failed to be applied by the table
+ sync worker or main apply worker in this subscription.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>abort_count</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of transactions aborted in this subscription.
+ ROLLBACK PREPARED increments this counter.
+ </para></entry>
+ </row>

2) Can this be changed:
+ /*
+ * If this is a new error reported by table sync worker,
consolidate this
+ * error count into the entry of apply worker.
+ */
+ if (OidIsValid(msg->m_subrelid))
+ {
+ /* Gain the apply worker stats */
+ subwentry = pgstat_get_subworker_entry(dbentry, msg->m_subid,
+
InvalidOid, true);
+ subwentry->error_count++;
+ }
+ else
+ subwentry->error_count++; /* increment the apply
worker's counter. */
To:
+ /*
+ * If this is a new error reported by table sync worker,
consolidate this
+ * error count into the entry of apply worker.
+ */
+ if (OidIsValid(msg->m_subrelid))
+ /* Gain the apply worker stats */
+ subwentry = pgstat_get_subworker_entry(dbentry, msg->m_subid,
+
InvalidOid, true);
+
+ subwentry->error_count++; /* increment the apply
worker's counter. */

3) Since both 026_worker_stats and 027_worker_xact_stats.pl are
testing pg_stat_subscription_workers, can we move the tests to
026_worker_stats.pl. If possible the error_count validation can be
combined with the existing tests.
diff --git a/src/test/subscription/t/027_worker_xact_stats.pl
b/src/test/subscription/t/027_worker_xact_stats.pl
new file mode 100644
index 0000000..31dbea1
--- /dev/null
+++ b/src/test/subscription/t/027_worker_xact_stats.pl
@@ -0,0 +1,162 @@
+
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Tests for subscription worker statistics during apply.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 1;
+
+# Create publisher node

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-12-06 14:34:29 Re: pg_get_publication_tables() output duplicate relid
Previous Message Masahiko Sawada 2021-12-06 14:24:09 Re: Make pg_waldump report replication origin ID, LSN, and timestamp.