Re: Implement waiting for wal lsn replay: reloaded

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2025-11-13 20:32:31
Message-ID: b72097b0-0839-4191-95dc-5e4038e33de3@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/5/25 10:51, Alexander Korotkov wrote:
> Hi!
>
> On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
>>> On 2025-Nov-03, Alexander Korotkov wrote:
>>>
>>>> I'd like to give this subject another chance for pg19. I'm going to
>>>> push this if no objections.
>>>
>>> Sure. I don't understand why patches 0002 and 0003 are separate though.
>>
>> FWIW, I appreciate such splits. Even if the functionality isn't usable
>> independently, it's still different type of code that's affected. And the
>> patches are each big enough to make that worthwhile for easier review.
>
> Thank you for the feedback, pushed.
>

Hi,

The new TAP test 049_wait_for_lsn.pl introduced by this commit, because
it takes a long time - about 65 seconds on my laptop. That's about 25%
of the whole src/test/recovery, more than any other test.

And most of the time there's nothing happening - these are the two log
messages showing the 60-second wait:

2025-11-13 21:12:39.949 CET checkpointer[562597] LOG: checkpoint
complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s)
added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907
s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB,
estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060

2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl
ERROR: recovery is not in progress

So there's a checkpoint, 60 seconds of nothing, and then a failure. I
haven't looked into why it waits for 1 minute exactly, but adding 60
seconds to check-world is somewhat annoying.

While at it, I noticed a couple comments refer to WaitForLSNReplay, but
but I think that got renamed simply to WaitForLSN.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2025-11-13 20:34:23 Re: Patch: dumping tables data in multiple chunks in pg_dump
Previous Message Hannu Krosing 2025-11-13 20:26:46 Re: Patch: dumping tables data in multiple chunks in pg_dump