Re: Fetching timeline during recovery

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: Fetching timeline during recovery
Date: 2019-12-11 05:20:02
Message-ID: 20191211052002.GK72921@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 26, 2019 at 07:20:46PM +0200, Jehan-Guillaume de Rorthais wrote:
> If this solution is accepted, some other function of the same family might be
> good candidates as well, for the sake of homogeneity:
>
> * pg_current_wal_insert_lsn
> * pg_current_wal_flush_lsn
> * pg_last_wal_replay_lsn
>
> However, I'm not sure how useful this would be.
>
> Thanks again for your time, suggestions and review!

+{ oid => '3435', descr => 'current wal flush location',
+ proname => 'pg_last_wal_receive_lsn', provolatile => 'v',
proisstrict => 'f',
This description is incorrect.

And please use OIDs in the range of 8000~9999 for patches in
development. You could just use src/include/catalog/unused_oids which
would point out a random range.

+ if (recptr == 0) {
+ nulls[0] = 1;
+ nulls[1] = 1;
+ }
The indendation of the code is incorrect, these should use actual
booleans and recptr should be InvalidXLogRecPtr (note also the
existence of the macro XLogRecPtrIsInvalid). Just for the style.

As said in the last emails exchanged on this thread, I don't see how
you cannot use multiple functions which have different meaning
depending on if the cluster is a primary or a standby knowing that we
have two different concepts of WAL when at recovery: the received
LSN and the replayed LSN, and three concepts for primaries (insert,
current, flush). I agree as well with the point of Fujii-san about
not returning the TLI and the LSN across different functions as this
opens the door for a risk of inconsistency for the data received by
the client.

+ * When the first parameter (variable 'with_tli') is true, returns the current
+ * timeline as second field. If false, second field is null.
I don't see much the point of having this input parameter which
determines the NULL-ness of one of the result columns, and I think
that you had better use a completely different function name for each
one of them instead of enforcing the functions. Let's remember that a
lot of tools use the existing functions directly in the SELECT clause
for LSN calculations, which is just a 64-bit integer *without* a
timeline assigned to it. However your patch mixes both concepts by
using pg_current_wal_lsn.

So we could do more with the introduction of five new functions which
allow to grab the LSN and the TLI in use for replay, received, insert,
write and flush positions:
- pg_current_wal_flush_info
- pg_current_wal_insert_info
- pg_current_wal_info
- pg_last_wal_receive_info
- pg_last_wal_replay_info

I would be actually tempted to do the following: one single SRF
function, say pg_wal_info which takes a text argument in input with
the following values: flush, write, insert, receive, replay. Thinking
more about it that would be rather neat, and more extensible than the
rest discussed until now. See for example PostgresNode::lsn.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-12-11 05:30:21 Re: Wrong assert in TransactionGroupUpdateXidStatus
Previous Message Kyotaro Horiguchi 2019-12-11 04:26:22 Re: Session WAL activity