Re: Switching XLog source from archive to streaming when primary available

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Switching XLog source from archive to streaming when primary available
Date: 2024-03-18 06:08:02
Message-ID: ZffaQt7UbM2Q9kYh@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote:
> Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a
> conflict in meson.build. Please see the attached v23 patch.

I've been reading this patch, and this is a very tricky one. Please
be *very* cautious.

+#streaming_replication_retry_interval = 0 # time after which standby
+ # attempts to switch WAL source from archive to
+ # streaming replication in seconds; 0 disables

This stuff allows a minimal retry interval of 1s. Could it be useful
to have more responsiveness here and allow lower values than that?
Why not switching the units to be milliseconds?

+ if (streaming_replication_retry_interval <= 0 ||
+ !StandbyMode ||
+ currentSource != XLOG_FROM_ARCHIVE)
+ return SWITCH_TO_STREAMING_NONE;

Hmm. Perhaps this should mention why we don't care about the
consistent point.

+ /* See if we can switch WAL source to streaming */
+ if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE)
+ wal_source_switch_state = SwitchWALSourceToPrimary();

Rather than a routine that returns as result the value to use for the
GUC, I'd suggest to let this routine set the GUC as there is only one
caller of SwitchWALSourceToPrimary(). This can also include a check
on SWITCH_TO_STREAMING_NONE, based on what I'm reading that.

- if (lastSourceFailed)
+ if (lastSourceFailed ||
+ wal_source_switch_state == SWITCH_TO_STREAMING)

Hmm. This one may be tricky. I'd recommend a separation between the
failure in reading from a source and the switch to a new "forced"
source.

+ if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING)
+ readFrom = XLOG_FROM_PG_WAL;
+ else
+ readFrom = currentSource == XLOG_FROM_ARCHIVE ?
+ XLOG_FROM_ANY : currentSource;

WALSourceSwitchState looks confusing here, and are you sure that this
is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read
from the archives even with a streaming pending.

By the way, I am not convinced that what you have is the best
interface ever. This assumes that we'd always want to switch to
streaming more aggressively. Could there be a point in also
controlling if we should switch to pg_wal/ or just to archiving more
aggressively as well, aka be able to do the opposite switch of WAL
source? This design looks somewhat limited to me. The origin of the
issue is that we don't have a way to control the order of the sources
consumed by WAL replay. Perhaps something like a replay_source_order
that uses a list would be better, with elements settable to archive,
pg_wal and streaming?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-03-18 06:13:41 Re: Weird test mixup
Previous Message Andrey M. Borodin 2024-03-18 05:50:25 Re: Weird test mixup