Re: Switching XLog source from archive to streaming when primary available

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Japin Li <japinli(at)hotmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Switching XLog source from archive to streaming when primary available
Date: 2024-03-05 02:04:52
Message-ID: 20240305020452.GA3373526@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

cfbot claims that this one needs another rebase.

I've spent some time thinking about this one. I'll admit I'm a bit worried
about adding more complexity to this state machine, but I also haven't
thought of any other viable approaches, and this still seems like a useful
feature. So, for now, I think we should continue with the current
approach.

+ fails to switch to stream mode, it falls back to archive mode. If this
+ parameter value is specified without units, it is taken as
+ milliseconds. Default is <literal>5min</literal>. With a lower value

Does this really need to be milliseconds? I would think that any
reasonable setting would at least on the order of seconds.

+ attempts. To avoid this, it is recommended to set a reasonable value.

I think we might want to suggest what a "reasonable value" is.

+ static bool canSwitchSource = false;
+ bool switchSource = false;

IIUC "canSwitchSource" indicates that we are trying to force a switch to
streaming, but we are currently exhausting anything that's present in the
pg_wal directory, while "switchSource" indicates that we should force a
switch to streaming right now. Furthermore, "canSwitchSource" is static
while "switchSource" is not. Is there any way to simplify this? For
example, would it be possible to make an enum that tracks the
streaming_replication_retry_interval state?

/*
* Don't allow any retry loops to occur during nonblocking
- * readahead. Let the caller process everything that has been
- * decoded already first.
+ * readahead if we failed to read from the current source. Let the
+ * caller process everything that has been decoded already first.
*/
- if (nonblocking)
+ if (nonblocking && lastSourceFailed)
return XLREAD_WOULDBLOCK;

Why do we skip this when "switchSource" is set?

+ /* Reset the WAL source switch state */
+ if (switchSource)
+ {
+ Assert(canSwitchSource);
+ Assert(currentSource == XLOG_FROM_STREAM);
+ Assert(oldSource == XLOG_FROM_ARCHIVE);
+ switchSource = false;
+ canSwitchSource = false;
+ }

How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE? Is
there no way it could be XLOG_FROM_PG_WAL?

+#streaming_replication_retry_interval = 5min # time after which standby
+ # attempts to switch WAL source from archive to
+ # streaming replication
+ # in milliseconds; 0 disables

I think we might want to turn this feature off by default, at least for the
first release.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-03-05 02:15:27 Re: pgsql: Fix search_path to a safe value during maintenance operations.
Previous Message Hayato Kuroda (Fujitsu) 2024-03-05 02:00:08 RE: Some shared memory chunks are allocated even if related processes won't start