Re: Skipping logical replication transactions on subscriber side

From: Alexey Lesovsky <lesovsky(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-07-06 10:13:32
Message-ID: CAGnetYfRbP3F_AMpXDbUEqR6xaM-bKiSoK5PCK+zvFPoeE6QDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 6, 2021 at 10:58 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> > Also, I'd like to suggest thinking twice about the view name (and
> function used in view DDL) - "pg_stat_logical_replication_error" contains
> very common "logical replication" words, but the view contains errors
> related to subscriptions only. In the future there could be other kinds of
> errors related to logical replication, but not related to subscriptions -
> what will you do?
>

> Is pg_stat_subscription_errors or
> pg_stat_logical_replication_apply_errors better?
>

It seems to me 'pg_stat_subscription_conflicts' proposed by Amit Kapila is
the most suitable, because it directly says about conflicts occurring on
the subscription side. The name 'pg_stat_subscription_errors' is also good,
especially in case of further extension if some kind of similar errors will
be tracked.

> > 3. Add a counter field with total number of errors - it helps to
> calculate errors rates and aggregations (sum), and don't lose information
> about errors between view checks.
>
> Do you mean to increment the error count if the error (command, xid,
> and relid) is the same as the previous one? or to have the total
> number of errors per subscription? And what can we infer from the
> error rates and aggregations?
>

To be honest, I hurried up when I wrote the first email, and read only
about stats view. Later, I read the starting email about the patch and
rethought this note.

As I understand, when the conflict occurs, replication stops (until
conflict is resolved), an error appears in the stats view. Now, no new
errors can occur in the blocked subscription. Hence, there are impossible
situations when many errors (like spikes) have occurred and a user didn't
see that. If I am correct in my assumption, there is no need for counters.
They are necessary only when errors might occur too frequently (like
pg_stat_database.deadlocks). But if this is possible, I would prefer the
total number of errors per subscription, as also proposed by Amit.

Under "error rates and aggregations" I also mean in the context of when a
high number of errors occured in a short period of time. If a user can
read the "total errors" counter and keep this metric in his monitoring
system, he will be able to calculate rates over time using functions in the
monitoring system. This is extremely useful.

I also would like to clarify, when conflict is resolved - the error record
is cleared or kept in the view? If it is cleared, the error counter is
required (because we don't want to lose all history of errors). If it is
kept - the flag telling about the error is resolved is needed (or set xid
to NULL). I mean when the user is watching the view, he should be able to
identify if the error has already been resolved or not.

--
Regards, Alexey

On Tue, Jul 6, 2021 at 10:58 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Mon, Jul 5, 2021 at 7:33 PM Alexey Lesovsky <lesovsky(at)gmail(dot)com> wrote:
> >
> > Hi,
> > Have a few notes about pg_stat_logical_replication_error from the DBA
> point of view (which will use this view in the future).
>
> Thank you for the comments!
>
> > 1. As I understand it, this view might contain many errors related to
> different subscriptions. It is better to name
> "pg_stat_logical_replication_errors" using the plural form (like this done
> for stat views for tables, indexes, functions).
>
> Agreed.
>
> > Also, I'd like to suggest thinking twice about the view name (and
> function used in view DDL) - "pg_stat_logical_replication_error" contains
> very common "logical replication" words, but the view contains errors
> related to subscriptions only. In the future there could be other kinds of
> errors related to logical replication, but not related to subscriptions -
> what will you do?
>
> Is pg_stat_subscription_errors or
> pg_stat_logical_replication_apply_errors better?
>
> > 2. Add a field with database name or id - it helps to quickly understand
> to which database the subscription belongs.
>
> Agreed.
>
> > 3. Add a counter field with total number of errors - it helps to
> calculate errors rates and aggregations (sum), and don't lose information
> about errors between view checks.
>
> Do you mean to increment the error count if the error (command, xid,
> and relid) is the same as the previous one? or to have the total
> number of errors per subscription? And what can we infer from the
> error rates and aggregations?
>
> > 4. Add text of last error (if it will not be too expensive).
>
> Agreed.
>
> > 5. Rename the "action" field to "command", as I know this is right from
> terminology point of view.
>
> Okay.
>
> Regards,
>
> --
> Masahiko Sawada
> EDB: https://www.enterprisedb.com/
>

--
С уважением Алексей В. Лесовский

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-07-06 10:24:07 Re: Can a child process detect postmaster death when in pg_usleep?
Previous Message zwj 2021-07-06 09:58:15 Why is XLOG_FPI_FOR_HINT always need backups?