Re: Switching XLog source from archive to streaming when primary available

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, cary(dot)huang(at)highgo(dot)ca, pgsql-hackers(at)lists(dot)postgresql(dot)org, satyanarlapuram(at)gmail(dot)com
Subject: Re: Switching XLog source from archive to streaming when primary available
Date: 2022-10-09 09:09:47
Message-ID: CALj2ACUaELLztrsdD_YOA3YEqJt7RagBL4pffzK5rc0CWKmLxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 9, 2022 at 3:22 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> As I mentioned upthread [0], I'm still a little concerned that this patch
> will cause the state machine to go straight from archive recovery to
> streaming replication, skipping recovery from pg_wal.
>
> [0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13

Yes, it goes straight to streaming replication skipping recovery from
pg_wal with the patch.

> I wonder if this
> could be resolved by moving the standby to the pg_wal phase instead.
> Concretely, this line
>
> + if (switchSource)
> + break;
>
> would instead change currentSource from XLOG_FROM_ARCHIVE to
> XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI(). I suspect the
> behavior would be basically the same, but it would maintain the existing
> ordering.

We can give it a chance to restore from pg_wal before switching to
streaming to not change any behaviour of the state machine. But, not
definitely by setting currentSource to XLOG_FROM_WAL, we basically
never explicitly set currentSource to XLOG_FROM_WAL, other than when
not in archive recovery i.e. InArchiveRecovery is false. Also, see the
comment [1].

Instead, the simplest would be to just pass XLOG_FROM_WAL to
XLogFileReadAnyTLI() when we're about to switch the source to stream
mode. This doesn't change the existing behaviour.

> However, I do see the following note elsewhere in xlogrecovery.c:
>
> * The segment can be fetched via restore_command, or via walreceiver having
> * streamed the record, or it can already be present in pg_wal. Checking
> * pg_wal is mainly for crash recovery, but it will be polled in standby mode
> * too, in case someone copies a new segment directly to pg_wal. That is not
> * documented or recommended, though.
>
> Given this information, the present behavior might not be too important,
> but I don't see a point in changing it without good reason.

Yeah, with the attached patch we don't skip pg_wal before switching to
streaming mode.

I've also added a note in the 'Standby Server Operation' section about
the new feature.

Please review the v8 patch further.

Unrelated to this patch, the fact that the standby polls pg_wal is not
documented or recommended, is not true, it is actually documented [2].
Whether or not we change the docs to be something like [3], is a
separate discussion.

[1]
/*
* We just successfully read a file in pg_wal. We prefer files in
* the archive over ones in pg_wal, so try the next file again
* from the archive first.
*/

[2] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION
The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory. That typically happens after a
server restart, when the standby replays again WAL that was streamed
from the primary before the restart, but you can also manually copy
files to pg_wal at any time to have them replayed.

[3]
The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory. That typically happens after a
server restart, when the standby replays again WAL that was streamed
from the primary before the restart, but you can also manually copy
files to pg_wal at any time to have them replayed. However, copying of
WAL files manually is not recommended.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v8-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch application/x-patch 19.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton A. Melnikov 2022-10-09 09:24:23 Re: [BUG] Logical replica crash if there was an error in a function.
Previous Message Japin Li 2022-10-09 03:26:27 Re: Remove unnecessary commas for goto labels