Re: Failed transaction statistics to measure the logical replication progress

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Failed transaction statistics to measure the logical replication progress
Date: 2021-11-19 14:10:57
Message-ID: CAD21AoC_p=gQKhhGwhDLzEU9rGwCWSzTk+a89XGG1JQKn-PiKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 18, 2021 at 12:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Nov 17, 2021 at 7:12 PM osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, November 17, 2021 10:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > Can you please tell us why you think the names in your proposed patch are
> > > better than the existing names proposed in Sawada-San's patch? Is it because
> > > those fields always contain the information of the last or latest error that
> > > occurred in the corresponding subscription worker?
> > This is one reason.
> >
> > Another big reason comes from the final alignment when we list up all columns of both patches.
> > The patches in this thread is trying to introduce a column that indicates
> > cumulative count of error to show all error counts that the worker got in the past.
> >
>
> Okay, I see your point and it makes sense to rename columns after
> these other stats. I am not able to come up with any better names than
> what is being used here. Sawada-San, do you agree with this, or do let
> us know if you have any better ideas?
>

I'm concerned that these new names will introduce confusion; if we
have last_error_relid, last_error_command, last_error_message,
last_error_time, and last_error_xid, I think users might think that
first_error_time is the timestamp at which an error occurred for the
first time in the subscription worker. Also, I'm not sure
last_error_count is not clear to me (it looks like showing something
count but the only "last" one?). An alternative idea would be to add
total_error_count by this patch, resulting in having both error_count
and total_error_count. Regarding commit_count and abort_count, I
personally think xact_commit and xact_rollback would be better since
they’re more consistent with pg_stat_database view, although we might
already have discussed that.

Besides that, I’m not sure how useful commit_bytes, abort_bytes, and
error_bytes are. I originally thought these statistics track the size
of received data, i.g., how much data is transferred from the
publisher and processed on the subscriber. But what the view currently
has is how much memory is used in the subscription worker. The
subscription worker emulates ReorderBufferChangeSize() on the
subscriber side but, as the comment of update_apply_change_size()
mentions, the size in the view is not accurate:

+ * The byte size of transaction on the publisher is calculated by
+ * ReorderBufferChangeSize() based on the ReorderBufferChange structure.
+ * But on the subscriber, consumed resources are not same as the
+ * publisher's decoding processsing and required to be computed in
+ * different way. Therefore, the exact same byte size is not restored on
+ * the subscriber usually.

Also, it seems to take into account the size of FlushPosition that is
not taken into account by ReorderBufferChangeSize().

I guess that the purpose of these values is to compare them to
total_bytes, stream_byte, and spill_bytes but if the calculation is
not accurate, does it mean that the more stats are updated, the more
the stats will be getting inaccurate?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2021-11-19 14:49:37 Pasword expiration warning
Previous Message Amit Langote 2021-11-19 13:56:59 Re: simplifying foreign key/RI checks