Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tomas(at)vondra(dot)me" <tomas(at)vondra(dot)me>
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date: 2025-06-18 10:03:17
Message-ID: CALDaNm21=mTJLrXxKYZ_07S9tAWMGEfxSnRXu_+t-k7jb5Kcyw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 18 Jun 2025 at 14:35, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
>
> Dear Hayato,
>
> > To confirm, can you tell me the theory why the walsender received old LSN?
> > It is sent by the walreceiver, so is there a case that LogstreamResult.Flush can go backward?
> > Not sure we can accept the situation.
>
> I can't say anything about the origin of the issue, but it can be easily reproduced
> on the master branch:
>
> 1. Add an assert in PhysicalConfirmReceivedLocation (apply the attached patch)
> 2. Compile & install with tap tests and assertions enabled
> 3. cd src/bin/pg_basebackup/
> 3. PROVE_TESTS=t/020_pg_receivewal.pl gmake check

Thanks for the steps, I was able to reproduce the issue with the
suggested steps.

> The test will fail because of the assertion. I plan to investigate the issue
> but I need some more time for it. Once, it happens on the original master
> branch, I think, this problem already exists. The proposed patch seems
> to be not guilty.

This issue occurs even prior to this commit, I was able to reproduce
it on a version just before it. I’ll also look into analyzing the root
cause further.

> It may be the same problem as discussed in:
> https://www.postgresql.org/message-id/CALDaNm2uQbhEVJzvnja6rw7Q9AYu9FpVmET%3DTbwLjV3DcPRhLw%40mail.gmail.com

This issue was related to confirmed_flush and was addressed in commit
d1ffcc7fa3c54de8b2a677a3e503fc808c7b419c. It is not related to
restart_lsn.

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2025-06-18 10:18:15 Re: Improve CRC32C performance on SSE4.2
Previous Message shveta malik 2025-06-18 09:52:59 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart