Re: Skipping logical replication transactions on subscriber side

From: Alexey Lesovsky <lesovsky(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-07-09 03:32:19
Message-ID: CAGnetYcAEYZueZ9TvL+=DbVPDHE5wZcpUzDK2Ob5YtPnOAmVFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 9, 2021 at 5:43 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Tue, Jul 6, 2021 at 7:13 PM Alexey Lesovsky <lesovsky(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 6, 2021 at 10:58 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >>
> >> > Also, I'd like to suggest thinking twice about the view name (and
> function used in view DDL) - "pg_stat_logical_replication_error" contains
> very common "logical replication" words, but the view contains errors
> related to subscriptions only. In the future there could be other kinds of
> errors related to logical replication, but not related to subscriptions -
> what will you do?
> >>
> >>
> >> Is pg_stat_subscription_errors or
> >> pg_stat_logical_replication_apply_errors better?
> >
> >
> > It seems to me 'pg_stat_subscription_conflicts' proposed by Amit Kapila
> is the most suitable, because it directly says about conflicts occurring on
> the subscription side. The name 'pg_stat_subscription_errors' is also good,
> especially in case of further extension if some kind of similar errors will
> be tracked.
>
> I personally prefer pg_stat_subscription_errors since
> pg_stat_subscription_conflicts could be used for conflict resolution
> features in the future. This stats view I'm proposing is meant to
> focus on errors that happened during applying logical changes. So
> using the term 'errors' seems to make sense to me.
>

Agreed

> >
> >>
> >> > 3. Add a counter field with total number of errors - it helps to
> calculate errors rates and aggregations (sum), and don't lose information
> about errors between view checks.
> >>
> >> Do you mean to increment the error count if the error (command, xid,
> >> and relid) is the same as the previous one? or to have the total
> >> number of errors per subscription? And what can we infer from the
> >> error rates and aggregations?
> >
> >
> > To be honest, I hurried up when I wrote the first email, and read only
> about stats view. Later, I read the starting email about the patch and
> rethought this note.
> >
> > As I understand, when the conflict occurs, replication stops (until
> conflict is resolved), an error appears in the stats view. Now, no new
> errors can occur in the blocked subscription. Hence, there are impossible
> situations when many errors (like spikes) have occurred and a user didn't
> see that. If I am correct in my assumption, there is no need for counters.
> They are necessary only when errors might occur too frequently (like
> pg_stat_database.deadlocks). But if this is possible, I would prefer the
> total number of errors per subscription, as also proposed by Amit.
>
> Yeah, the total number of errors seems better.
>

Agreed

> >
> > Under "error rates and aggregations" I also mean in the context of when
> a high number of errors occured in a short period of time. If a user can
> read the "total errors" counter and keep this metric in his monitoring
> system, he will be able to calculate rates over time using functions in the
> monitoring system. This is extremely useful.
>
> Thanks for your explanation. Agreed. But the rate depends on
> wal_retrieve_retry_interval so is not likely to be high in practice.
>

Agreed

> > I also would like to clarify, when conflict is resolved - the error
> record is cleared or kept in the view? If it is cleared, the error counter
> is required (because we don't want to lose all history of errors). If it is
> kept - the flag telling about the error is resolved is needed (or set xid
> to NULL). I mean when the user is watching the view, he should be able to
> identify if the error has already been resolved or not.
>
> With the current patch, once the conflict is resolved by skipping the
> transaction in question, its entry on the stats view is cleared. As
> you suggested, if we have the total error counts in that view, it
> would be good to keep the count and clear other fields such as xid,
> last_failure, and command etc.
>

Ok, looks nice. But I am curious how this will work in the case when there
are two (or more) errors in the same subscription, but different relations?
After resolution all these records are kept or they will be merged into a
single record (because subscription was the same for all errors)?

> Regards,
>
> --
> Masahiko Sawada
> EDB: https://www.enterprisedb.com/
>

--
Regards, Alexey Lesovsky

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-07-09 03:53:32 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message houzj.fnst@fujitsu.com 2021-07-09 03:28:30 RE: Added schema level support for publication.