Re: Add last failed connection error message to pg_stat_wal_receiver

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Cary Huang <cary(dot)huang(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add last failed connection error message to pg_stat_wal_receiver
Date: 2022-10-06 06:06:11
Message-ID: CALj2ACWd8RH5wNZScT5Wfyre918+GU7h-bVZF1XHm9NmgqnzBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 18, 2022 at 9:01 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> PANIC is not something you'd care about as the system would go down as
> and shared memory would be reset (right?) even if restart_on_crash is
> enabled. Perhaps it would help here to use something like a macro to
> catch and save the error, in a style similar to what's in hba.c for
> example, which is the closest example I can think of, even if on ERROR
> we don't really care about the error string anyway as there is nothing
> to report back to the SQL views used for the HBA/ident files.
>
> FATAL may prove to be tricky though, because I'd expect the error to
> be saved in shared memory in this case. This is particularly critical
> as this takes the WAL receiver process down, actually.

Hm, we can use error callbacks or pg try/catch blocks to save the
error message into walreceiver shared memory.

> Anyway, outside the potential scope of the proposal, there are more
> things that I find strange with the code:
> - Why isn't the string reset when the WAL receiver is starting up?
> That surely is not OK to keep a past state not referring to what
> actually happens with a receiver currently running.

I agree that it's not a good way to show some past failure state when
things are fine currently. Would naming the column name as
last_connectivity_error or something better and describing it in the
docs clearly help here?

Otherwise, we can have another simple function that just returns the
last connection failure of walreceiver and if required PID.

> - pg_stat_wal_receiver (system view) reports no rows if pid is NULL,
> which would be the state stored in shared memory after a connection.
> This means that one would never be able to see last_conn_error except
> when calling directly the SQL function pg_stat_get_wal_receiver().
>
> One could say that we should report a row for this view all the time,
> but this creates a compatibility breakage: existing application
> assuming something like (one row <=> WAL receiver running) could
> break.

-1.

We can think of having a separate infrastructure for reporting all
backend or process specific errors similar to pg_stat_activity and
pg_stat_get_activity, but that needs some shared memory and all of
that - IMO, it's an overkill.

I'm fine to withdraw this thread, if none of the above thoughts is
sensible enough to pursue further.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-10-06 06:39:41 Re: Query Jumbling for CALL and SET utility statements
Previous Message Andrey Lepikhov 2022-10-06 06:02:07 Re: document the need to analyze partitioned tables