| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, Amul Sul <sulamul(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Fix possible 'unexpected data beyond EOF' on replica restart |
| Date: | 2025-12-18 14:18:46 |
| Message-ID: | 02e720a8-761b-4309-b26f-f41c288f7990@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 17/12/2025 10:40, Anthonin Bonnefoy wrote:
> On Wed, Dec 17, 2025 at 8:26 AM Amul Sul <sulamul(at)gmail(dot)com
> <mailto:sulamul(at)gmail(dot)com>> wrote:
>
>> The deleted code you moved to mdtruncate() should be kept where it
>> was. We cannot ensure that every extension using this interface will
>> adhere to that requirement what the comment describes.
>
> Yeah, I've overlooked the case of extensions. I've moved back the
> InvalidBlockNumber assignment.
The smgr interface isn't really an extension point, so we don't need to
worry about that. (I wish it was, but that's a different story [1])
I don't think mdtruncate() should modify smgr_cached_nblocks, we should
keep that in smgrtruncate(). smgr.c is responsible for all other updates
of smgr_cached_nblocks, so doing it in mdtruncate() would be a layering
violation.
I'm thinking that we should do the attached. Untested, and we should
also add a comment to smgrtruncate() and mdtruncate() to explain how
they behave if nblocks > curnblk.
I wonder if we should move the whole "if (nblocks > curnblk)" check and
ereport() from mdtruncate() to smgrtruncate(). That logic doesn't really
depend on anything specific to md.c. If you'd imagine a different smgr
implementation, it'd need to just copy-paste that check. It's the
caller's mistake if it passes nblocks > curnblk, when not in recovery.
Then again, we do have other places in md.c too that behave differently
when InRecovery.
What do you think?
> I wonder how critical it is to have an up to date value of
> smgr_cached_nblocks after smgr_truncate. Leaving InvalidBlockNumber was
> also an option as the next user will ask the kernel for the real size.
> There are some functions like DropRelationBuffers which rely only on the
> cached value, so it's probably safer to keep the same behaviour.
Leaving it invalid should work. But as the comment says, we might as
well update the cached value since we have the value at hand. It's just
that we were doing it wrong.
- Heikki
| Attachment | Content-Type | Size |
|---|---|---|
| v3-Fix-unexpected-data-beyond-EOF-on-replica-restart.patch | text/x-patch | 562 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2025-12-18 14:26:31 | Re: [BUG] [PATCH] pg_basebackup produces wrong incremental files after relation truncation in segmented tables |
| Previous Message | Konstantin Knizhnik | 2025-12-18 13:57:40 | Re: index prefetching |