Failed transaction statistics to measure the logical replication progress

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Failed transaction statistics to measure the logical replication progress
Date: 2021-07-08 06:54:45
Message-ID: OSBPR01MB48887CA8F40C8D984A6DC00CED199@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers

When the current HEAD fails during logical decoding, the failure
increments txns count in pg_stat_replication_slots - [1] and adds
the transaction size to the sum of bytes in the same repeatedly
on the publisher, until the problem is solved.
One of the good examples is duplication error on the subscriber side
and this applies to both streaming and spill cases as well.

This update prevents users from grasping the exact number and size of
successful and unsuccessful transactions. Accordingly, we need to
have new columns of failed transactions that will work to differentiate
both of them for all types, which means spill, streaming and normal
transactions. This will help users to measure the exact status of
logical replication.

Attached file is the POC patch for this.
Current design is to save failed stats data in the ReplicationSlot struct.
This is because after the error, I'm not able to access the ReorderBuffer object.
Thus, I chose the object where I can interact with at the ReplicationSlotRelease timing.

Below is one example that I can get on the publisher,
after the duplication error on the subscriber caused by insert is solved.

postgres=# select * from pg_stat_replication_slots;
-[ RECORD 1 ]-------+------
slot_name | mysub
spill_txns | 0
spill_count | 0
spill_bytes | 0
failed_spill_txns | 0
failed_spill_bytes | 0
stream_txns | 0
stream_count | 0
stream_bytes | 0
failed_stream_txns | 0
failed_stream_bytes | 0
total_txns | 4
total_bytes | 528
failed_total_txns | 3
failed_total_bytes | 396
stats_reset |

Any ideas and comments are welcome.

[1] - https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
failed_transaction_stats_POC_v01.patch application/octet-stream 17.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-07-08 07:27:19 Re: [PATCH] Pull general SASL framework out of SCRAM
Previous Message Daniel Gustafsson 2021-07-08 06:23:58 Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.