RE: Failed transaction statistics to measure the logical replication progress

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Greg Nancarrow' <gregn4422(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Failed transaction statistics to measure the logical replication progress
Date: 2021-12-22 10:14:13
Message-ID: TYCPR01MB8373176545F7AE2CCDA23B81ED7D9@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, December 21, 2021 6:00 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> Some review comments on the v18 patches:
Thank you for your review !

> v18-0002
>
> doc/src/sgml/monitoring.sgml
> (1) tablesync worker stats?
>
> Shouldn't the comment below only mention the apply worker? (since we're no
> longer recording stats of the tablesync worker)
>
> + Number of transactions that failed to be applied by the table
> + sync worker or main apply worker in this subscription. This
> + counter is updated after confirming the error is not same as
> + the previous one.
> + </para></entry>
>
> Also, it should say "... the error is not the same as the previous one."
Fixed.

> src/backend/catalog/system_views.sql
> (2) pgstat_report_subworker_xact_end()
>
> Fix typo and some wording:
>
> BEFORE:
> + * This should be called before the call of process_syning_tables()
> + not to
> AFTER:
> + * This should be called before the call of
> process_syncing_tables(), so to not
Fixed.

> src/backend/postmaster/pgstat.c
> (3) pgstat_send_subworker_xact_stats()
>
> BEFORE:
> + * Send a subworker transaction stats to the collector.
> AFTER:
> + * Send a subworker's transaction stats to the collector.
Fixed.

> (4)
> Wouldn't it be best for:
>
> + if (!TimestampDifferenceExceeds(last_report, now,
> + PGSTAT_STAT_INTERVAL))
>
> to be:
>
> + if (last_report != 0 && !TimestampDifferenceExceeds(last_report,
> now, PGSTAT_STAT_INTERVAL))
>
> ?
I'm not sure which is better and
I never have strong objection to your idea but currently
I prefer the previous code because we don't need to
add one extra condition (last_report != 0) in the function called really frequently
and the optimization to avoid calling TimestampDifferenceExceeds works just once
with your change, I'd say.

We call pgstat_send_subworker_xact_stats() in the LogicalRepApplyLoop's loop.
For the apply worker, this should be the first call for normal operation,
before any call of the apply_dispatch(and subsequent commit-like functions which
calls pgstat_send_subworker_xact_stats() in the end).

In the first call, existing v18's codes (without your suggested change) just initializes
'last_report' variable because of the diff between 0 and 'now' and returns
because of no stats values in commit_count and abort_count in the function.
After this, 'last_report' will not be 0 for the apply worker.

On the other hand, in the case I add your change, in the first call of
pgstat_send_subworker_xact_stats(), similarly 'last_report' is initialized but
without one call of TimestampDifferenceExceeds(), which might be the optimization
effect and the function returns with no stats again. Here the 'last_report' will
not be 0 after this. But, then we'll have to check the condition in the apply worker
in the every loop. Besides, after the first setting of 'last_report',
every call of the pgstat_send_subworker_xact_stats() calculates the time subtraction.
This means the one skipped call of the function looks less worth in the case of frequent
calls of the function. So, I'm not sure it' a good idea to incorporate this change.

Kindly let me know if I miss something.
At present, I'll keep the code as it is.

> (5) pgstat_send_subworker_xact_stats()
>
> I think that the comment:
>
> + * Clear out the statistics buffer, so it can be re-used.
>
> should instead say:
>
> + * Clear out the supplied statistics.
>
> because the current comment infers that repWorker is pointed at the
> MyLogicalRepWorker buffer (which it might be but it shouldn't be relying on
> that) Also, I think that the function header should mention something like:
> "The supplied repWorker statistics are cleared upon return, to assist re-use by
> the caller."
Fixed.

Attached the new patch v19.

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
v19-0002-Extend-pg_stat_subscription_workers-to-include-g.patch application/octet-stream 27.3 KB
v19-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch application/octet-stream 7.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-12-22 10:24:05 RE: Optionally automatically disable logical replication subscriptions on error
Previous Message Dilip Kumar 2021-12-22 09:14:37 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints