RE: Failed transaction statistics to measure the logical replication progress

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'vignesh C' <vignesh21(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Failed transaction statistics to measure the logical replication progress
Date: 2021-07-13 06:59:56
Message-ID: OSBPR01MB48881C48F510320E4B5F7DBDED149@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, July 13, 2021 2:50 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > When the current HEAD fails during logical decoding, the failure
> > increments txns count in pg_stat_replication_slots - [1] and adds the
> > transaction size to the sum of bytes in the same repeatedly on the
> > publisher, until the problem is solved.
> > One of the good examples is duplication error on the subscriber side
> > and this applies to both streaming and spill cases as well.
> >
> > This update prevents users from grasping the exact number and size of
> > successful and unsuccessful transactions. Accordingly, we need to have
> > new columns of failed transactions that will work to differentiate
> > both of them for all types, which means spill, streaming and normal
> > transactions. This will help users to measure the exact status of
> > logical replication.
> >
> > Attached file is the POC patch for this.
> > Current design is to save failed stats data in the ReplicationSlot struct.
> > This is because after the error, I'm not able to access the ReorderBuffer
> object.
> > Thus, I chose the object where I can interact with at the
> ReplicationSlotRelease timing.
> > Any ideas and comments are welcome.
...
> +1 for having logical replication failed statistics. Currently if
> there is any transaction failure in the subscriber after sending the decoded
> data to the subscriber like constraint violation, object not exist, the statistics
> will include the failed decoded transaction info and there is no way to identify
> the actual successful transaction data. This patch will help in measuring the
> actual decoded transaction data.
Yeah, we can apply this improvement to other error cases.
Thank you for sharing ideas to make this enhancement more persuasive.

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ronan Dunklau 2021-07-13 07:19:37 Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Previous Message Peter Smith 2021-07-13 06:59:08 Re: row filtering for logical replication