last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)
Date: 2022-02-15 21:20:50
Message-ID: 831b6ab9-45d6-7ab0-ea9c-6efd4383afb8@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 14/02/2022 22:43, Heikki Linnakangas wrote:
> On 14/02/2022 16:41, Tom Lane wrote:
>> Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> writes:
>>> Add test case for an archive recovery corner case.
>>
>> hoverfly seems not to like this:
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2022-02-14%2012%3A36%3A12
>
> Hmm, only hoverfly - and even that succeeded on next run. Some kind of a
> flakyness I guess. I'll try to run the test in a loop and see if I can
> reproduce it.

That was interesting: the order that WAL segments are archived when a
standby is promoted is not fully deterministic. The test polled
pg_stat_archiver.last_archived_wal to wait until a particular WAL
segment was archived, but it could happen that a lower-numbered WAL
segment was archived *after* the waited-for segment, and
pg_stat_archiver.last_archived_wal therefore displayed the
lower-numbered WAL segment. So the test incorrectly thought that the
higher-numbered segment that it waits for hadn't been archived yet.

I realized that this test doesn't really need to wait for the segment to
be archived, because it will stop the standby server immediately after
that, and stopping a server implicitly waits for all the WAL to be
archived before the archiver process exits. So I just removed it.

During normal operations the WAL segments are archived in order. But I'm
not sure if there are some other corner cases, aside from promoting a
standby server, when this could happen. After crash restart, maybe, if
some .ready/done files are lost.

I find it a bit surprising that pg_stat_archiver.last_archived_wal is
not necessarily the highest-numbered segment that was archived. I
propose that we mention that in the docs, as in the attached patch.

I'll commit this soon, to silence the occasional failures on the
buildfarm, but let me know if you have any better suggestions or thoughts.

- Heikki

Attachment Content-Type Size
0001-Fix-race-condition-in-028_pitr_timelines.pl-test-add.patch text/x-patch 4.6 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2022-02-15 21:28:27 Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)
Previous Message Thomas Munro 2022-02-15 20:18:53 Re: pgsql: Track LLVM 15 changes.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-02-15 21:21:23 PGEventProcs must not be allowed to break libpq
Previous Message Justin Pryzby 2022-02-15 20:33:32 Re: adding 'zstd' as a compression algorithm