Re: Design of pg_stat_subscription_workers vs pgstats

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Date: 2022-02-15 18:17:42
Message-ID: 20220215181742.372brts5t7q3gkpr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-04 09:23:06 +0530, Amit Kapila wrote:
> On Thu, Feb 3, 2022 at 3:25 PM Peter Eisentraut
> <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
> >
> > On 02.02.22 07:54, Amit Kapila wrote:
> >
> > > Sure, but is this the reason you want to store all the error info in
> > > the system catalog? I agree that providing more error info could be
> > > useful and also possibly the previously failed (apply) xacts info as
> > > well but I am not able to see why you want to have that sort of info
> > > in the catalog. I could see storing info like err_lsn/err_xid that can
> > > allow to proceed to apply worker automatically or to slow down the
> > > launch of errored apply worker but not all sort of other error info
> > > (like err_cnt, err_code, err_message, err_time, etc.). I want to know
> > > why you are insisting to make all the error info persistent via the
> > > system catalog?
> >
> > Let's flip this around and ask, why not?
> >
>
> Because we don't necessarily need all this information after the crash
> and neither is this information about any system object which we
> require for performing operations on objects.

I find this not particularly convincing. IMO data that leads the user to
compromise "replication integrity" is pretty crucial.

And skipped data needs to be logged somewhere persistent, so that there's a
chance to analyze / recover.

We also should utilize more detailed knowledge about errors to influence at
which interval replication is retried. Serialization error: retry soon. Other
errors: retry with increasing backoff.

> In walreceiver (for standby), we don't store the errors/conflicts in any
> table, they are either reported in logs or shared via stats.

That's imo quite different - they're fundamentally time-limited problems. And
they aren't leading the user / DBA to skip transactions etc.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-15 18:20:32 adding 'zstd' as a compression algorithm
Previous Message Andres Freund 2022-02-15 18:10:34 Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work