Re: Fetching timeline during recovery

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Fetching timeline during recovery
Date: 2019-09-06 15:06:34
Message-ID: 20190906170634.089d1dce@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 4 Sep 2019 00:32:03 +0900
Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Mon, Jul 29, 2019 at 7:26 PM Jehan-Guillaume de Rorthais
> <jgdr(at)dalibo(dot)com> wrote:
> >
> > On Fri, 26 Jul 2019 18:22:25 +0200
> > Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> >
> > > On Fri, 26 Jul 2019 10:02:58 +0200
> > > Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
[...]
> > > Please, find in attachment a new version of the patch. It now creates two
> > > new fonctions:
> > >
> > > pg_current_wal_tl()
> > > pg_last_wal_received_tl()
> >
> > I just found I forgot to use PG_RETURN_INT32 in pg_last_wal_received_tl().
> > Please find the corrected patch in attachment:
> > 0001-v3-Add-functions-to-get-timeline.patch
>
> Thanks for the patch! Here are some comments from me.

Thank you for your review!

Please, find in attachment the v4 of the patch:
0001-v4-Add-functions-to-get-timeline.patch

Answers bellow.

> You need to write the documentation explaining the functions
> that you're thinking to add.

Done.

> +/*
> + * Returns the current timeline on a production cluster
> + */
> +Datum
> +pg_current_wal_tl(PG_FUNCTION_ARGS)
>
> The timeline ID that this function returns seems almost
> the same as pg_control_checkpoint().timeline_id,
> when the server is in production. So I'm not sure
> if it's worth adding that new function.

pg_control_checkpoint().timeline_id is read from the controldata file on disk
which is asynchronously updated with the real status of the local cluster.
Right after a promotion, fetching the TL from pg_control_checkpoint() is wrong
and can cause race conditions on client side.

This is the main reason I am working on this patch.

> + currentTL = GetCurrentTimeLine();
> +
> + PG_RETURN_INT32(currentTL);
>
> Is GetCurrentTimeLine() really necessary? Seems ThisTimeLineID can be
> returned directly since it indicates the current timeline ID in production.

Indeed. I might have over-focused on memory state. ThisTimeLineID seems to be
updated soon enough during the promotion, in fact, even before
XLogCtl->ThisTimeLineID:

if (ArchiveRecoveryRequested)
{
[...]
ThisTimeLineID = findNewestTimeLine(recoveryTargetTLI) + 1;
[...]
}

/* Save the selected TimeLineID in shared memory, too */
XLogCtl->ThisTimeLineID = ThisTimeLineID;

> +pg_last_wal_received_tl(PG_FUNCTION_ARGS)
> +{
> + TimeLineID lastReceivedTL;
> + WalRcvData *walrcv = WalRcv;
> +
> + SpinLockAcquire(&walrcv->mutex);
> + lastReceivedTL = walrcv->receivedTLI;
> + SpinLockRelease(&walrcv->mutex);
>
> I think that it's smarter to use GetWalRcvWriteRecPtr() to
> get the last received TLI, like pg_last_wal_receive_lsn() does.

I has been hesitant between the current implementation and using
GetWalRcvWriteRecPtr(). I choose the current implementation to avoid unnecessary
operations during the spinlock and make it as fast as possible.

However, maybe I'm scratching nothing or just dust here in comparison to
calling GetWalRcvWriteRecPtr() and avoiding minor code duplication.

Being hesitant, v4 of the patch use GetWalRcvWriteRecPtr() as suggested.

> The timeline ID that this function returns is the same as
> pg_stat_wal_receiver.received_tli while walreceiver is running.
> But when walreceiver is not running, pg_stat_wal_receiver returns
> no record, and pg_last_wal_received_tl() would be useful to
> get the timeline only in this case. Is this my understanding right?

Exactly.

> > Also, TimeLineID is declared as a uint32. So why do we use
> > PG_RETURN_INT32/Int32GetDatum to return a timeline and not PG_RETURN_UINT32?
> > See eg. in pg_stat_get_wal_receiver().
>
> pg_stat_wal_receiver.received_tli is declared as integer.

Oh, right. Thank you.

Thanks,

Attachment Content-Type Size
0001-v4-Add-functions-to-get-timeline.patch text/x-patch 5.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-09-06 15:09:02 Re: pgsql: Use data directory inode number, not port, to select SysV resour
Previous Message Alvaro Herrera from 2ndQuadrant 2019-09-06 15:03:38 Re: Unix-domain socket support on Windows