Re: Switching XLog source from archive to streaming when primary available

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Japin Li <japinli(at)hotmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Switching XLog source from archive to streaming when primary available
Date: 2024-03-05 18:08:37
Message-ID: CALj2ACVYVgq+QxeLmhY5DO+D6+7PwerdsVTeBxPuSeFyrgOPtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> cfbot claims that this one needs another rebase.

Yeah, the conflict was with the new TAP test file name in
src/test/recovery/meson.build.

> I've spent some time thinking about this one. I'll admit I'm a bit worried
> about adding more complexity to this state machine, but I also haven't
> thought of any other viable approaches,

Right. I understand that the WaitForWALToBecomeAvailable()'s state
machine is a complex piece.

> and this still seems like a useful
> feature. So, for now, I think we should continue with the current
> approach.

Yes, the feature is useful like mentioned in the docs as below:

+ Reading WAL from archive may not always be as efficient and fast as
+ reading from primary. This can be due to the differences in disk types,
+ IO costs, network latencies etc. All of these can impact the recovery
+ performance on standby, and can increase the replication lag on
+ primary. In addition, the primary keeps accumulating WAL needed for the
+ standby while the standby reads WAL from archive, because the standby
+ replication slot stays inactive. To avoid these problems, one can use
+ this parameter to make standby switch to stream mode sooner.

> + fails to switch to stream mode, it falls back to archive mode. If this
> + parameter value is specified without units, it is taken as
> + milliseconds. Default is <literal>5min</literal>. With a lower value
>
> Does this really need to be milliseconds? I would think that any
> reasonable setting would at least on the order of seconds.

Agreed. Done that way.

> + attempts. To avoid this, it is recommended to set a reasonable value.
>
> I think we might want to suggest what a "reasonable value" is.

It really depends on the WAL generation rate on the primary. If the
WAL files grow faster, the disk runs out of space sooner, so setting a
value to make frequent WAL source switch attempts can help. It's hard
to suggest a one-size-fits-all value. Therefore, I've tweaked the docs
a bit to reflect the fact that it depends on the WAL generation rate.

> + static bool canSwitchSource = false;
> + bool switchSource = false;
>
> IIUC "canSwitchSource" indicates that we are trying to force a switch to
> streaming, but we are currently exhausting anything that's present in the
> pg_wal directory,

Right.

> while "switchSource" indicates that we should force a
> switch to streaming right now.

It's not indicating force switch, it says "previously I was asked to
switch source via canSwitchSource, now that I've exhausted all the WAL
from the pg_wal directory, I'll make a source switch attempt now".

> Furthermore, "canSwitchSource" is static
> while "switchSource" is not.

This is because the WaitForWALToBecomeAvailable() has to remember the
decision (that streaming_replication_retry_interval has occurred)
across the calls. And, switchSource is decided within
WaitForWALToBecomeAvailable() for every function call.

> Is there any way to simplify this? For
> example, would it be possible to make an enum that tracks the
> streaming_replication_retry_interval state?

I guess the way it is right now looks simple IMHO. If the suggestion
is to have an enum like below; it looks overkill for just two states.

typedef enum
{
CAN_SWITCH_SOURCE,
SWITCH_SOURCE
} XLogSourceSwitchState;

> /*
> * Don't allow any retry loops to occur during nonblocking
> - * readahead. Let the caller process everything that has been
> - * decoded already first.
> + * readahead if we failed to read from the current source. Let the
> + * caller process everything that has been decoded already first.
> */
> - if (nonblocking)
> + if (nonblocking && lastSourceFailed)
> return XLREAD_WOULDBLOCK;
>
> Why do we skip this when "switchSource" is set?

It was leftover from the initial version of the patch - I was then
encountering some issue and had that piece there. Removed it now.

> + /* Reset the WAL source switch state */
> + if (switchSource)
> + {
> + Assert(canSwitchSource);
> + Assert(currentSource == XLOG_FROM_STREAM);
> + Assert(oldSource == XLOG_FROM_ARCHIVE);
> + switchSource = false;
> + canSwitchSource = false;
> + }
>
> How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE? Is
> there no way it could be XLOG_FROM_PG_WAL?

No. switchSource is set to true only when canSwitchSource is set to
true, which happens only when currentSource is XLOG_FROM_ARCHIVE (see
SwitchWALSourceToPrimary()).

> +#streaming_replication_retry_interval = 5min # time after which standby
> + # attempts to switch WAL source from archive to
> + # streaming replication
> + # in milliseconds; 0 disables
>
> I think we might want to turn this feature off by default, at least for the
> first release.

Agreed. Done that way.

Please see the attached v21 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v21-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch application/x-patch 19.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-03-05 18:09:47 Re: [PATCH] updates to docs about HOT updates for BRIN
Previous Message Jeff Davis 2024-03-05 18:02:00 Re: pgsql: Fix search_path to a safe value during maintenance operations.