Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()
Date: 2023-03-03 08:08:38
Message-ID: CALj2ACXLFcAGap=wVUehfM9pwKaZWLhsmdbq2taK=G4OBQR3aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 1, 2023 at 1:46 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Jan 03, 2023 at 02:53:10PM +0530, Bharath Rupireddy wrote:
> > The proposed patch makes the inherent state change to pg_wal after
> > failure to read from archive in XLogFileReadAnyTLI() to explicit by
> > setting currentSource to XLOG_FROM_PG_WAL in the state machine. I
> > think it doesn't alter the existing state machine or add any new extra
> > lookups in pg_wal.
>
> Well, did you notice 4d894b41? It introduced this change:
>
> - readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource);
> + readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
> + currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
> + currentSource);
>
> And this patch basically undoes that, meaning that we would basically
> look at the archives first for all the expected TLIs, but only if no
> files were found in pg_wal/.
>
> The change is subtle, see XLogFileReadAnyTLI(). On HEAD we go through
> each timeline listed and check both archives and then pg_wal/ after
> the last source that failed was the archives. The patch does
> something different: it checks all the timelines for the archives,
> then all the timelines in pg_wal/ with two separate calls to
> XLogFileReadAnyTLI().

Thanks. Yeah, the patch proposed here just reverts that commit [1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4d894b41cd12179b710526eba9dc62c2b99abc4d.

That commit fixes an issue - "If there is a WAL segment with same ID
but different TLI present in both the WAL archive and pg_xlog, prefer
the one with higher TLI.".

I will withdraw the patch proposed in this thread.

[1]
commit 4d894b41cd12179b710526eba9dc62c2b99abc4d
Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Date: Fri Feb 14 15:15:09 2014 +0200

Change the order that pg_xlog and WAL archive are polled for WAL segments.

If there is a WAL segment with same ID but different TLI present in both
the WAL archive and pg_xlog, prefer the one with higher TLI. Before this
patch, the archive was polled first, for all expected TLIs, and only if no
file was found was pg_xlog scanned. This was a change in behavior from 9.3,
which first scanned archive and pg_xlog for the highest TLI, then archive
and pg_xlog for the next highest TLI and so forth. This patch reverts the
behavior back to what it was in 9.2.

The reason for this is that if for example you try to do archive recovery
to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is
not archived yet, we would replay past the timeline switch point on
timeline 1 using the archived files, before even looking timeline 2's files
in pg_xlog

Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior
was changed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-03-03 08:46:20 RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message Anthonin Bonnefoy 2023-03-03 08:06:23 Re: Flush SLRU counters in checkpointer process