Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-09-20 00:51:39
Message-ID: ZQpCG5QanJQXNVH-@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Sep 20, 2023 at 10:51:12AM +1200, Thomas Munro wrote:
> Ahh, yeah, there was a second problem screwing up the LSN accounting.
> I was trying to write correct xl_prev links, but also emitting
> optional padding records to get into the right position (depending on
> initial conditions varying between branches etc), and I forgot about
> the extra COMMIT records that emit_message() generates. Which, I
> guess, comes back to Michael's observation that this would all be a
> bit easier if we had a way to emit and flush a single record...

Unfortunately this part is not going to be backpatched based on the
current consensus, so this is goint to need an alternate and/or fluffy
solution to force the flushes at given points:
https://www.postgresql.org/message-id/20230816003333.7hn2rx5m2l7una3d@awork3.anarazel.de

The test could be largely simplified on HEAD once the flush option is
available. I'm just waiting a bit more on the other thread, keeping
an eye on the temperature, but it looks like nobody would complain to
make that optional, with the default being what we did previously, at
least..

> The solution in this version is to call get_insert_len() instead of
> using the result of emit_message() for the values returned by the
> advance_XXX() functions. The result of emit_message() is actually the
> LSN of the following COMMIT record so can't be used directly for
> building xl_prev chains.

And by doing so the test paths would enter once again in the inner
loops generating the records so as we'll only get out once we know
that we are at the border wanted, enough to stabilize the tests.
Smart.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2023-09-20 06:00:00 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message Thomas Munro 2023-09-19 22:51:12 Re: BUG #17928: Standby fails to decode WAL on termination of primary