Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-08-16 06:17:58
Message-ID: ZNxqFgo243DOar4L@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Aug 16, 2023 at 03:29:49PM +1200, Thomas Munro wrote:
> I hacked on this idea for quite a long time yesterday and today and
> came up with a set of tests for the main end-of-WAL conditions:
>
> ▶ 1/1 - xl_tot_len zero OK
> ▶ 1/1 - xl_tot_len short OK
> ▶ 1/1 - xl_prev bad OK
> ▶ 1/1 - xl_crc bad OK
> ▶ 1/1 - xlp_magic zero OK
> ▶ 1/1 - xlp_magic bad OK
> ▶ 1/1 - xlp_pageaddr bad OK
> ▶ 1/1 - xlp_info bad OK
> ▶ 1/1 - xlp_info lacks XLP_FIRST_IS_CONTRECORD OK
> ▶ 1/1 - xlp_rem_len bad OK
> ▶ 1/1 - xlp_magic zero (split record header) OK
> ▶ 1/1 - xlp_pageaddr bad (split record header) OK
> ▶ 1/1 - xlp_rem_len bad (split record header) OK
> 1/1 postgresql:recovery / recovery/038_end_of_wal OK
> 5.79s 13 subtests passed

Nice.

> It took me a while to come up with a workable way to get into the
> record-header-splitting zone. Based on some of your clues about
> flushing, I eventually realised I needed transactional messages, and I
> built a kind of self-calibrating Rube Goldberg function around that.
> It's terrible, and I'm sure we can do better.

So that's advance_to_record_splitting_zone(), with the idea to loop
across multiple records. The use of transactional messages ensures
the flush and the stability of the test when the server is stopped,
but it makes the records a bit longer than the non-transactional
messages.

> I wonder what people think about putting internal details of the WAL
> format into a Perl test like this. Obviously it requires maintenance,
> since it knows the size and layout of a few things. I guess it'd be
> allowed to fish a couple of those numbers out of the source.

The format of the WAL header does not change a lot across the years,
so I'd be OK with that. If that's an issue, the scope of the test
could always be restricted a bit.

> Work in progress... I'm sure more useful checks could be added. One
> thing that occurred to me while thinking about all this it that the
> 'treat malloc failure as end of WAL' thing you highlighted in another
> thread is indeed completely bananas -- I didn't go digging, but
> perhaps it was an earlier solution to the very same garbage xl_tot_len
> problem, before 70b4f82a4b5 and now this xl_rem_len-based solution?

It is bananas. Still, the problem of the other thread is similar to
what's here, and different at the same time because it can happen with
a valid record if the host is under memory pressure, and we should
still allow a standby to loop and continue in this case. Perhaps also
a startup process doing only crash recovery. This thread's problem is
dependent on this one, which is why this one needs to happen first: we
need to clarify the boundary between a real OOM and garbage indicating
the end of WAL.

+my $RECORD_HEADER_SIZE = 24;
+my $XLP_PAGE_MAGIC = 0xd113;
+my $XLP_FIRST_IS_CONTRECORD = 0x1;

This could use something like check_pg_config() to find the header
contents for the last two ones, but there's no easy way to extract the
size of the header struct, is there? Or perhaps perl has a tool for
that.. Anything I can read on the matter tells to use pack() to make
the size estimations predictible.

+ my $page_offset = $end_lsn % $WAL_BLOCK_SIZE;
+ while ($page_offset >= $WAL_BLOCK_SIZE - 2000)
+ {
+ $end_lsn = emit_message($node, 2000)

The hardcoded 2000 feels magic here, but I get that you just want to
get away from the page border, as well. I'd be OK with more comments.

+sub lsn_to_segment_and_offset
+{
+ my $lsn = shift;
+ return ($lsn / $WAL_SEGMENT_SIZE, $lsn % $WAL_SEGMENT_SIZE);
+}

Now the backend also has pg_walfile_name_offset(). I tend to rely on
the backend logic as much as I can for tests.

+sub start_of_page

Not sure this one is needed, only being called in
start_of_next_page().

+sub get_insert_lsn

Name is confusing with Cluster::lsn('insert') available, perhaps this
should be named get_insert_lsn_in_bytes.

As a whole, I find that pretty cool. Not sure if I would backpatch
the test, so my opinion would be to use that on HEAD and let it
stabilize there.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-08-16 06:32:40 BUG #18058: Http Extension Postgres
Previous Message Thomas Munro 2023-08-16 03:29:49 Re: BUG #17928: Standby fails to decode WAL on termination of primary