Quick Links

Re: pgsql: Improve runtime and output of tests for replication slots checkp

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-committers(at)lists(dot)postgresql(dot)org
Subject:	Re: pgsql: Improve runtime and output of tests for replication slots checkp
Date:	2025-06-23 10:24:30
Message-ID:	CAPpHfdsO9s5he3xHWNFtwvXtvsftu3nNz=PV9fdN32UOh-Z3tA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers

On Mon, Jun 23, 2025 at 3:29 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > Yeah, that's what I think too. The unintentional omission of a
> > pre-shutdown delay in the 046 test has exposed some pre-existing
> > fragility in pg_logical_slot_get_changes(). So I'm not in favor
> > of changing 046 till we understand this better.
>
> Yes, better to understand what's going on before changing the test,
> and perhaps change 046 so as it provide stable coverage for this case,
> even if discovered accidentally.
>
> So, is it only pg_logical_slot_get_changes_guts() that's messed up in
> this context, because XLogDecodeNextRecord() is trying to read pages
> as it has a logic too permissive? How did any of you reproduce the
> failure? Just by running the test in a loop? It is one of these
> patterns where a hardcoded sleep should do the trick and help with a
> bisect.

Personally I just run the test in the loop. It takes about ~30 times to
reproduce. Works both with -O0 and -O2.

cd src/test/recovery
while PROVE_TESTS="t/046_checkpoint_logical_slot.pl t/
047_checkpoint_physical_slot.pl" make check; do :; done

> By the way, At the bottom of test 046_checkpoint_logical_slot.pl, if
> you expect the checkpoint to complete before sending the immediate
> shutdown request, I would suggest to use a wait_for_log() based on
> "checkpoint complete" or an equivalent loop. What you are doing in
> the test is unstable as written.

Exactly. I've proposed the fix with wait_for_log() in [1]. Nevertheless,
both cases (immediate stop before checkpoint completion, and immediate stop
after checkpoint completion) must work without hang.

Links.
1.
https://www.postgresql.org/message-id/CAPpHfdurV-j_e0pb%3DUFENAy3tyzxfF%2ByHveNDNQk2gM82WBU5A%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

In response to

Re: pgsql: Improve runtime and output of tests for replication slots checkp at 2025-06-23 00:28:53 from Michael Paquier

Responses

Re: pgsql: Improve runtime and output of tests for replication slots checkp at 2025-06-23 11:59:47 from Jelte Fennema-Nio

Browse pgsql-committers by date

	From	Date	Subject
Next Message	John Naylor	2025-06-23 11:05:59	pgsql: Properly fix AVX-512 CRC calculation bug
Previous Message	Michael Paquier	2025-06-23 00:28:53	Re: pgsql: Improve runtime and output of tests for replication slots checkp