RE: Failed transaction statistics to measure the logical replication progress

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Dilip Kumar' <dilipbalaut(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Failed transaction statistics to measure the logical replication progress
Date: 2021-11-10 09:12:34
Message-ID: OSZPR01MB83698EDBABF6533576F2FF38ED939@OSZPR01MB8369.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, November 10, 2021 3:43 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Tue, Nov 9, 2021 at 5:05 PM osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > On Tuesday, November 9, 2021 12:08 PM Greg Nancarrow
> <gregn4422(at)gmail(dot)com> wrote:
> > > On Fri, Nov 5, 2021 at 7:11 PM osumi(dot)takamichi(at)fujitsu(dot)com
> > > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > > >
> > >
> > > I did a quick scan through the latest v8 patch and noticed the following
> things:
> > I appreciate your review !
> I have reviewed some part of the patch and I have a few comments
I really appreciate your attention and review.

> 1.
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>error_count</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of transactions that failed to be applied by the table
> + sync worker or main apply worker in this subscription.
> + </para></entry>
> + </row>
>
> The error_count, should be number of transaction failed to applied? or it should
> be number of error?
I thought those were same and currently it gets incremented when an error of apply occurs.
Then, it equals to the number of total error. May I have the case when we
get different values between those two ? I can be missing something.

> 2.
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>error_bytes</structfield> <type>bigint</type>
> + </para>
>
> How different is error_bytes from the abort_bytes?
By the error_bytes, you can see the consumed resources that
were acquired during apply but the applying processing stopped by some error.
On the other hand, abort_bytes displays bytes used for ROLLBACK PREPARED
and stream_abort processing. That's what I intended.

> 3.
> + {
> + size += *extra_data->stream_write_len;
> + add_apply_error_context_xact_size(size);
> + return;
> + }
>
> From apply_handle_insert(), we are calling update_apply_change_size(), and
> inside this function we are dereferencing *extra_data->stream_write_len.
> Basically, stream_write_len is in integer pointer and the caller hasn't allocated
> memory for that and inside update_apply_change_size, we are directly
> dereferencing the pointer, how this can be correct.
I'm so sorry to make you confused.

I'll just delete the top part that handles streaming
bytes calculation in the update_apply_change_size().
It's because now that there is a specific structure to recognize each streaming xid
and save transaction size there, which makes the top part in question useless.

> And I also see that in the
> whole patch stream_write_len, is never used as lvalue so without storing
> anything into this why are we trying to use this as rvalue here? This is clearly
> an issue.
As described above, I'll fix this part and
related codes mainly streaming related codes in the next version.

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-11-10 09:14:00 Re: Parallel vacuum workers prevent the oldest xmin from advancing
Previous Message Daniel Gustafsson 2021-11-10 09:11:56 Re: On login trigger: take three