Re: Switching XLog source from archive to streaming when primary available

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Subject: Re: Switching XLog source from archive to streaming when primary available
Date: 2022-10-18 02:01:33
Message-ID: CALj2ACUq_JebY-TSK=Z6xxj8u5oGDfO+5WE3rYYeaZRL_PbcaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote:
> > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >> I wonder if it would be better to simply remove this extra polling of
> >> pg_wal as a prerequisite to your patch. The existing commentary leads me
> >> to think there might not be a strong reason for this behavior, so it could
> >> be a nice way to simplify your patch.
> >
> > I don't think it's a good idea to remove that completely. As said
> > above, it might help someone, we never know.
>
> It would be great to hear whether anyone is using this functionality. If
> no one is aware of existing usage and there is no interest in keeping it
> around, I don't think it would be unreasonable to remove it in v16.

It seems like exhausting all the WAL in pg_wal before switching to
streaming after failing to fetch from archive is unremovable. I found
this after experimenting with it, here are my findings:
1. The standby has to recover initial WAL files in the pg_wal
directory even for the normal post-restart/first-time-start case, I
mean, in non-crash recovery case.
2. The standby received WAL files from primary (walreceiver just
writes and flushes the received WAL to WAL files under pg_wal)
pretty-fast and/or standby recovery is slow, say both the standby
connection to primary and archive connection are broken for whatever
reasons, then it has WAL files to recover in pg_wal directory.

I think the fundamental behaviour for the standy is that it has to
fully recover to the end of WAL under pg_wal no matter who copies WAL
files there. I fully understand the consequences of manually copying
WAL files into pg_wal, for that matter, manually copying/tinkering any
other files into/under the data directory is something we don't
recommend and encourage.

In summary, the standby state machine in WaitForWALToBecomeAvailable()
exhausts all the WAL in pg_wal before switching to streaming after
failing to fetch from archive. The v8 patch proposed upthread deviates
from this behaviour. Hence, attaching v9 patch that keeps the
behaviour as-is, that means, the standby exhausts all the WAL in
pg_wal before switching to streaming after fetching WAL from archive
for at least streaming_replication_retry_interval milliseconds.

Please review the v9 patch further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v9-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch application/octet-stream 20.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-10-18 02:35:47 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Michael Paquier 2022-10-18 01:59:49 Re: New "single-call SRF" APIs are very confusingly named