Re: Fetching timeline during recovery

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Fetching timeline during recovery
Date: 2019-09-09 10:44:10
Message-ID: CAHGQGwGSSVsOSpYDgY6pOAMuNbcpfrGM42CxGx6R=onJreDO-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 7, 2019 at 12:06 AM Jehan-Guillaume de Rorthais
<jgdr(at)dalibo(dot)com> wrote:
>
> On Wed, 4 Sep 2019 00:32:03 +0900
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> > On Mon, Jul 29, 2019 at 7:26 PM Jehan-Guillaume de Rorthais
> > <jgdr(at)dalibo(dot)com> wrote:
> > >
> > > On Fri, 26 Jul 2019 18:22:25 +0200
> > > Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> > >
> > > > On Fri, 26 Jul 2019 10:02:58 +0200
> > > > Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> [...]
> > > > Please, find in attachment a new version of the patch. It now creates two
> > > > new fonctions:
> > > >
> > > > pg_current_wal_tl()
> > > > pg_last_wal_received_tl()
> > >
> > > I just found I forgot to use PG_RETURN_INT32 in pg_last_wal_received_tl().
> > > Please find the corrected patch in attachment:
> > > 0001-v3-Add-functions-to-get-timeline.patch
> >
> > Thanks for the patch! Here are some comments from me.
>
> Thank you for your review!
>
> Please, find in attachment the v4 of the patch:
> 0001-v4-Add-functions-to-get-timeline.patch

Thanks for updating the patch!

Should we add regression tests for these functions? For example,
what about using these functions to check the timeline switch case,
in src/test/recovery/t/004_timeline_switch.pl?

>
> Answers bellow.
>
> > You need to write the documentation explaining the functions
> > that you're thinking to add.
>
> Done.

Thanks!

+ <entry>Get current write-ahead log timeline</entry>

I'm not sure if "current write-ahead log timeline" is proper word.
"timeline ID of current write-ahead log" is more appropriate?

+ <entry><type>int</type></entry>
+ <entry>Get last write-ahead log timeline received and sync to disk by
+ streaming replication.

Same as above. I think that "timeline ID of last write-ahead log received
and sync to disk ..." is better here.

Like pg_last_wal_receive_lsn(), something like "If recovery has
completed this will remain static at the value of the last WAL
record received and synced to disk during recovery.
If streaming replication is disabled, or if it has not yet started,
the function returns NULL." should be in this description?

>
> > +/*
> > + * Returns the current timeline on a production cluster
> > + */
> > +Datum
> > +pg_current_wal_tl(PG_FUNCTION_ARGS)

I think that "tl" in the function name should be "tli". "tli" is used
used for other functions and views related to timeline, e.g.,
pg_stat_wal_receiver.received_tli. Thought?

> >
> > The timeline ID that this function returns seems almost
> > the same as pg_control_checkpoint().timeline_id,
> > when the server is in production. So I'm not sure
> > if it's worth adding that new function.
>
> pg_control_checkpoint().timeline_id is read from the controldata file on disk
> which is asynchronously updated with the real status of the local cluster.
> Right after a promotion, fetching the TL from pg_control_checkpoint() is wrong
> and can cause race conditions on client side.

Understood.

> > The timeline ID that this function returns is the same as
> > pg_stat_wal_receiver.received_tli while walreceiver is running.
> > But when walreceiver is not running, pg_stat_wal_receiver returns
> > no record, and pg_last_wal_received_tl() would be useful to
> > get the timeline only in this case. Is this my understanding right?
>
> Exactly.

I'm just imaging that some users want to use pg_last_wal_receive_lsn() and
pg_last_wal_receive_tli() together to, e.g., get the name of WAL file received
last. But there can be a corner case where the return values of
pg_last_wal_receive_lsn() and of pg_last_wal_receive_tli() are inconsistent.
This can happen because those values are NOT gotten within single lock.
That is, each function takes each lock to get each value.

So, to avoid that corner case and get consistent WAL file name,
we might want to have the function that gets both LSN and
timeline ID of the last received WAL record within single lock
(i.e., just uses GetWalRcvWriteRecPtr()) and returns them.
Thought?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2019-09-09 11:08:15 Re: block-level incremental backup
Previous Message Amit Khandekar 2019-09-09 10:36:58 Re: Minimal logical decoding on standbys