Re: Fix possible 'unexpected data beyond EOF' on replica restart

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix possible 'unexpected data beyond EOF' on replica restart
Date: 2025-12-17 07:26:15
Message-ID: CAAJ_b94FEArRqXd4q+drOh=yS4_2sPrdHTGXTVH+wHRP0EPuCw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 16, 2025 at 6:38 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
>
> [...]
> The patch fixes the issue by moving smgr_cached_nblocks updates in mdtruncate and only updating the cached value if truncate was successful.
>

Thanks for detailed reproducible steps, I can see the reported issue
and proposed patch fixes the same. Patch looks good to me except
following changes in smgrtruncate():

- /* Make the cached size is invalid if we encounter an error. */
- reln->smgr_cached_nblocks[forknum[i]] = InvalidBlockNumber;
-
smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i],
old_nblocks[i], nblocks[i]);

The deleted code you moved to mdtruncate() should be kept where it
was. We cannot ensure that every extension using this interface will
adhere to that requirement what the comment describes. Furthermore,
an extension's routine might miss updating smgr_cached_nblocks.
To ensure that updates smgr_cached_nblocks properly, we should also
add an assertion after the call, like this:

/*
* Ensure that the local smgr_cached_nblocks value is updated.
*/
Assert(reln->smgr_cached_nblocks[forknum[i]] != InvalidBlockNumber);

Regards,
Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2025-12-17 07:30:35 Re: relfilenode statistics
Previous Message Nazir Bilal Yavuz 2025-12-17 07:13:15 Re: meson and check-tests