Re: Switching XLog source from archive to streaming when primary available

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Subject: Re: Switching XLog source from archive to streaming when primary available
Date: 2023-01-19 00:50:14
Message-ID: 20230119005014.GA3838170@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote:
> On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> With your patch, we might replay one of these "old" files in pg_wal instead
>> of the complete version of the file from the archives,
>
> That's true even today, without the patch, no? We're not changing the
> existing behaviour of the state machine. Can you explain how it
> happens with the patch?

My point is that on HEAD, we will always prefer a complete archive file.
With your patch, we might instead choose to replay an old file in pg_wal
because we are artificially advancing the state machine. IOW even if
there's a complete archive available, we might not use it. This is a
behavior change, but I think it is okay.

>> Would you mind testing this scenario?
>
> How about something like below for testing the above scenario? If it
> looks okay, I can add it as a new TAP test file.
>
> 1. Generate WAL files f1 and f2 and archive them.
> 2. Check the replay lsn and WAL file name on the standby, when it
> replays upto f2, stop the standby.
> 3. Set recovery to fail on the standby, and stop the standby.
> 4. Generate f3, f4 (partially filled) on the primary.
> 5. Manually copy f3, f4 to the standby's pg_wal.
> 6. Start the standby, since recovery is set to fail, and there're new
> WAL files (f3, f4) under its pg_wal, it must replay those WAL files
> (check the replay lsn and WAL file name, it must be f4) before
> switching to streaming.
> 7. Generate f5 on the primary.
> 8. The standby should receive f5 and replay it (check the replay lsn
> and WAL file name, it must be f5).
> 9. Set streaming to fail on the standby and set recovery to succeed.
> 10. Generate f6 on the primary.
> 11. The standby should receive f6 via archive and replay it (check the
> replay lsn and WAL file name, it must be f6).

I meant testing the scenario where there's an old file in pg_wal, a
complete file in the archives, and your new GUC forces replay of the
former. This might be difficult to do in a TAP test. Ultimately, I just
want to validate the assumptions discussed above.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-19 01:00:48 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Previous Message Andres Freund 2023-01-19 00:37:18 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation