Re: Fetching timeline during recovery

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: Fetching timeline during recovery
Date: 2019-12-19 22:41:36
Message-ID: 20191219234136.160113ac@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 11 Dec 2019 14:20:02 +0900
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Sep 26, 2019 at 07:20:46PM +0200, Jehan-Guillaume de Rorthais wrote:
> > If this solution is accepted, some other function of the same family might
> > be good candidates as well, for the sake of homogeneity:
> >
> > * pg_current_wal_insert_lsn
> > * pg_current_wal_flush_lsn
> > * pg_last_wal_replay_lsn
> >
> > However, I'm not sure how useful this would be.
> >
> > Thanks again for your time, suggestions and review!
>
> +{ oid => '3435', descr => 'current wal flush location',
> + proname => 'pg_last_wal_receive_lsn', provolatile => 'v',
> proisstrict => 'f',
> This description is incorrect.

Indeed. And the one for pg_current_wal_lsn(bool) as well.

> And please use OIDs in the range of 8000~9999 for patches in
> development. You could just use src/include/catalog/unused_oids which
> would point out a random range.

Thank you for this information, I wasn't aware.

> + if (recptr == 0) {
> + nulls[0] = 1;
> + nulls[1] = 1;
> + }
> The indendation of the code is incorrect, these should use actual
> booleans and recptr should be InvalidXLogRecPtr (note also the
> existence of the macro XLogRecPtrIsInvalid). Just for the style.

Fixed on my side. Thanks.

> As said in the last emails exchanged on this thread, I don't see how
> you cannot use multiple functions which have different meaning
> depending on if the cluster is a primary or a standby knowing that we
> have two different concepts of WAL when at recovery: the received
> LSN and the replayed LSN, and three concepts for primaries (insert,
> current, flush).

As I wrote in my previous email, existing functions could be overloaded
as well for the sake of homogeneity. So the five of them would have similar
behavior/API.

> I agree as well with the point of Fujii-san about
> not returning the TLI and the LSN across different functions as this
> opens the door for a risk of inconsistency for the data received by
> the client.

My last patch fixed that, indeed.

> + * When the first parameter (variable 'with_tli') is true, returns the
> current
> + * timeline as second field. If false, second field is null.
> I don't see much the point of having this input parameter which
> determines the NULL-ness of one of the result columns, and I think
> that you had better use a completely different function name for each
> one of them instead of enforcing the functions. Let's remember that a
> lot of tools use the existing functions directly in the SELECT clause
> for LSN calculations, which is just a 64-bit integer *without* a
> timeline assigned to it. However your patch mixes both concepts by
> using pg_current_wal_lsn.

Sorry, I realize I was not clear enough about implementation details.
My latest patch does **not** introduce regression for existing tools. If you do
not pass any parameter, the behavior is the same, only one column:

# primary
$ cat <<EOQ|psql -XAtp 5432
select * from pg_current_wal_lsn();
select * from pg_current_wal_lsn(NULL);
select * from pg_current_wal_lsn(true);
EOQ
0/15D5BA0
0/15D5BA0|
0/15D5BA0|1

# secondary
$ cat <<EOQ|psql -XAtp 5433
select * from pg_last_wal_receive_lsn();
select * from pg_last_wal_receive_lsn(NULL);
select * from pg_last_wal_receive_lsn(true);
EOQ
0/15D5BA0
0/15D5BA0|
0/15D5BA0|1

It's kind of the same approach than when parameters has been added to
eg. pg_stat_backup() to change its behavior between exclusive and
non-exclusive backups. But I admit I know no function changing its return type
based on the given parameter. I understand your concern.

> So we could do more with the introduction of five new functions which
> allow to grab the LSN and the TLI in use for replay, received, insert,
> write and flush positions:
> - pg_current_wal_flush_info
> - pg_current_wal_insert_info
> - pg_current_wal_info
> - pg_last_wal_receive_info
> - pg_last_wal_replay_info

I could go this way if you prefer, maybe using _tli as suffix instead of _info
as this is the only new info added. I think it feels redundant with original
funcs, but it might be the simplest solution.

> I would be actually tempted to do the following: one single SRF
> function, say pg_wal_info which takes a text argument in input with
> the following values: flush, write, insert, receive, replay. Thinking
> more about it that would be rather neat, and more extensible than the
> rest discussed until now. See for example PostgresNode::lsn.

I'll answer in your other mail that summary other possibilities.

Thanks!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2019-12-19 22:48:39 Re: Add support for automatically updating Unicode derived files
Previous Message Magnus Hagander 2019-12-19 20:31:50 Re: global / super barriers (for checksums)