| From: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> | 
|---|---|
| To: | 'Greg Nancarrow' <gregn4422(at)gmail(dot)com> | 
| Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | RE: Failed transaction statistics to measure the logical replication progress | 
| Date: | 2021-12-22 10:14:13 | 
| Message-ID: | TYCPR01MB8373176545F7AE2CCDA23B81ED7D9@TYCPR01MB8373.jpnprd01.prod.outlook.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Tuesday, December 21, 2021 6:00 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> Some review comments on the v18 patches:
Thank you for your review !
> v18-0002
> 
> doc/src/sgml/monitoring.sgml
> (1) tablesync worker stats?
> 
> Shouldn't the comment below only mention the apply worker? (since we're no
> longer recording stats of the tablesync worker)
> 
> +       Number of transactions that failed to be applied by the table
> +       sync worker or main apply worker in this subscription. This
> +       counter is updated after confirming the error is not same as
> +       the previous one.
> +       </para></entry>
> 
> Also, it should say "... the error is not the same as the previous one."
Fixed.
 
> src/backend/catalog/system_views.sql
> (2) pgstat_report_subworker_xact_end()
> 
> Fix typo and some wording:
> 
> BEFORE:
> + *  This should be called before the call of process_syning_tables()
> + not to
> AFTER:
> + *  This should be called before the call of
> process_syncing_tables(), so to not
Fixed.
> src/backend/postmaster/pgstat.c
> (3) pgstat_send_subworker_xact_stats()
> 
> BEFORE:
> + * Send a subworker transaction stats to the collector.
> AFTER:
> + * Send a subworker's transaction stats to the collector.
Fixed.
> (4)
> Wouldn't it be best for:
> 
> +   if (!TimestampDifferenceExceeds(last_report, now,
> + PGSTAT_STAT_INTERVAL))
> 
> to be:
> 
> +   if (last_report != 0 && !TimestampDifferenceExceeds(last_report,
> now, PGSTAT_STAT_INTERVAL))
> 
> ?
I'm not sure which is better and
I never have strong objection to your idea but currently
I prefer the previous code because we don't need to
add one extra condition (last_report != 0) in the function called really frequently
and the optimization to avoid calling TimestampDifferenceExceeds works just once
with your change, I'd say.
We call pgstat_send_subworker_xact_stats() in the LogicalRepApplyLoop's loop.
For the apply worker, this should be the first call for normal operation,
before any call of the apply_dispatch(and subsequent commit-like functions which
calls pgstat_send_subworker_xact_stats() in the end).
In the first call, existing v18's codes (without your suggested change) just initializes
'last_report' variable because of the diff between 0 and 'now' and returns
because of no stats values in commit_count and abort_count in the function.
After this, 'last_report' will not be 0 for the apply worker.
On the other hand, in the case I add your change, in the first call of
pgstat_send_subworker_xact_stats(), similarly 'last_report' is initialized but
without one call of TimestampDifferenceExceeds(), which might be the optimization
effect and the function returns with no stats again. Here the 'last_report' will
not be 0 after this. But, then we'll have to check the condition in the apply worker
in the every loop. Besides, after the first setting of 'last_report',
every call of the pgstat_send_subworker_xact_stats() calculates the time subtraction.
This means the one skipped call of the function looks less worth in the case of frequent
calls of the function. So, I'm not sure it' a good idea to incorporate this change.
Kindly let me know if I miss something.
At present, I'll keep the code as it is.
> (5) pgstat_send_subworker_xact_stats()
> 
> I think that the comment:
> 
> +   * Clear out the statistics buffer, so it can be re-used.
> 
> should instead say:
> 
> +   * Clear out the supplied statistics.
> 
> because the current comment infers that repWorker is pointed at the
> MyLogicalRepWorker buffer (which it might be but it shouldn't be relying on
> that) Also, I think that the function header should mention something like:
> "The supplied repWorker statistics are cleared upon return, to assist re-use by
> the caller."
Fixed.
Attached the new patch v19.
Best Regards,
	Takamichi Osumi
| Attachment | Content-Type | Size | 
|---|---|---|
| v19-0002-Extend-pg_stat_subscription_workers-to-include-g.patch | application/octet-stream | 27.3 KB | 
| v19-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch | application/octet-stream | 7.4 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | osumi.takamichi@fujitsu.com | 2021-12-22 10:24:05 | RE: Optionally automatically disable logical replication subscriptions on error | 
| Previous Message | Dilip Kumar | 2021-12-22 09:14:37 | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |