Re: Fetching timeline during recovery

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, sfrost(at)snowman(dot)net, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fetching timeline during recovery
Date: 2020-02-11 18:51:10
Message-ID: 20200211195110.0751aa6b@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 31 Jan 2020 15:12:30 +0900
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Jan 23, 2020 at 05:54:08PM +0100, Jehan-Guillaume de Rorthais wrote:
> > Please find the new version of the patch in attachment.
>
> To be honest, I find the concept of this patch confusing.
> pg_stat_wal_receiver is just a one-one mapping with the shared memory
> state of the WAL receiver itself and show data *if and only if* a WAL
> receiver is running and iff it is ready to display any data, so I'd
> rather not change its nature

If you are talking about the pg_stat_wal_receiver view, I don't have a strong
opinion on this anyway as I vote 0 when discussing it. My current patch
doesn't alter its nature.

> and it has nothing to do with the state of WAL being applied by the startup
> process.

Indeed, I was feeling this was a bad design to add these columns, as stated in
my last mail. So I withdraw this.

> So this gets a -1 from me.

OK.

[...]
> Isn't what you are looking for here a different system view which maps
> directly to XLogCtl so as you can retrieve the status of the applied
> WAL at recovery anytime

My main objective is received LSN/TLI. This is kept by WalRcv for streaming.
That's why pg_stat_wal_receiver was the very good place for my need. But again,
you are right, I shouldn't have add the replied bits to it.

> say pg_stat_recovery?

I finally dig this path. I was in the hope we could find something
simpler and lighter, but other solutions we studied so far (thanks all for your
time) were all discarded [1].

A new pg_stat_get_recovery() view might be useful for various monitoring
purpose. After poking around in the code, it seems the patch would be bigger
than previous solutions, so I prefer discussing the specs first.

At a first glance, I would imagine the following columns as a minimal patch:

* source: stream, archive or pg_wal
* write/flush/replayed LSN
* write/flush/replayed TLI

This has already some heavy impact in the code. Source might be taken from
xlog.c:currentSource, so it should probably be included in XLogCtl to be
accessible from any backend.

As replayed LSN/TLI comes from XLogCtl too, we might probably need a new
dedicated function to gather these fields plus currentSource under the same
info_lck.

Next, write lsn/tli is not accessible from WalRcv, only flush. So either we do
not include it, or we would probably need to replace WalRcv->receivedUpto with
existing LogstreamResult.

Next, there's no stats about wal shipping recovery. Restoring a WAL from
archive do not increment anything about write/flush LSN/TLI. I wonder if both
wal_receiver stats and WAL shipping stats might be merged together in the same
refactored structure in shmem, as they might share a fair number of field
together? This would be pretty invasive in the code, but I feel it's heavier to
add another new struct in shmem just to track WAL shipping stats whereas WalRcv
already exists there.

Now, I think the following additional field might be useful for monitoring. But
as this is out my my original scope, I prefer discussing how useful this might
be:

* start_time: start time of the current source
* restored_count: total number of WAL restored. We might want to split this
counter to track each method individually.
* last_received_time: last time we received something from the current source
* last_fail_time: last failure time, whatever the source

Thanks for reading up to here!

Regards,

[1] even if I still hope the pg_stat_get_wal_receiver might still gather some
more positive vote :)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-02-11 19:09:29 Re: Portal->commandTag as an enum
Previous Message marcelo zen 2020-02-11 17:04:59 Re: Just for fun: Postgres 20?