Re: Add last failed connection error message to pg_stat_wal_receiver

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Cary Huang <cary(dot)huang(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add last failed connection error message to pg_stat_wal_receiver
Date: 2022-08-18 03:31:00
Message-ID: Yv2ydB/7bLUEY/tr@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 04, 2022 at 03:27:11PM +0530, Bharath Rupireddy wrote:
> Good point. The walreceiver can exit for any reason. We can either 1)
> store for all the error messages or 2) think of using sigsetjmp but
> that only catches the ERROR kinds, leaving FATAL and PANIC messages.
> The option (1) is simple but there are problems - we may miss storing
> future error messages, good commenting and reviewing may help here and
> all the error messages now need to be stored in string, which is
> complex. The option (2) seems reasonable but we will miss FATAL and
> PANIC messages (we have many ERRORs, 2 FATALs, 3 PANICs). Maybe a
> combination of option (1) for FATALs and PANICs, and option (2) for
> ERRORs helps.
>
> Thoughts?

PANIC is not something you'd care about as the system would go down as
and shared memory would be reset (right?) even if restart_on_crash is
enabled. Perhaps it would help here to use something like a macro to
catch and save the error, in a style similar to what's in hba.c for
example, which is the closest example I can think of, even if on ERROR
we don't really care about the error string anyway as there is nothing
to report back to the SQL views used for the HBA/ident files.

FATAL may prove to be tricky though, because I'd expect the error to
be saved in shared memory in this case. This is particularly critical
as this takes the WAL receiver process down, actually.

Anyway, outside the potential scope of the proposal, there are more
things that I find strange with the code:
- Why isn't the string reset when the WAL receiver is starting up?
That surely is not OK to keep a past state not referring to what
actually happens with a receiver currently running.
- pg_stat_wal_receiver (system view) reports no rows if pid is NULL,
which would be the state stored in shared memory after a connection.
This means that one would never be able to see last_conn_error except
when calling directly the SQL function pg_stat_get_wal_receiver().

One could say that we should report a row for this view all the time,
but this creates a compatibility breakage: existing application
assuming something like (one row <=> WAL receiver running) could
break.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-08-18 03:35:15 Re: MultiXact\SLRU buffers configuration
Previous Message David Rowley 2022-08-18 03:17:33 Re: shadow variables - pg15 edition