Re: FSM Corruption (was: Could not read block at end of the relation)

From: Noah Misch <noah(at)leadboat(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: FSM Corruption (was: Could not read block at end of the relation)
Date: 2024-04-13 17:15:28
Message-ID: 20240413171528.59.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Apr 11, 2024 at 08:38:43AM -0700, Noah Misch wrote:
> On Thu, Apr 11, 2024 at 09:36:50AM +0200, Ronan Dunklau wrote:
> > Le dimanche 7 avril 2024, 00:30:37 CEST Noah Misch a écrit :
> > > Your v3 has the right functionality. As further confirmation of the fix, I
> > > tried reverting the non-test parts of commit 917dc7d "Fix WAL-logging of FSM
> > > and VM truncation". That commit's 008_fsm_truncation.pl fails with 917dc7d
> > > reverted from master, and adding this patch makes it pass again. I ran
> > > pgindent and edited comments. I think the attached version is ready to go.
> >
> > Thank you Noah, the updated comments are much better. I think it should be
> > backported at least to 16 since the chances of tripping on that behaviour are
> > quite high here, but what about previous versions ?
>
> It should be reachable in all branches, just needing concurrent extension lock
> waiters to reach before v16. Hence, my plan is to back-patch it all the way.
> It applies with negligible conflicts back to v12.

While it applied, it doesn't build in v12 or v13, due to smgr_cached_nblocks
first appearing in c5315f4. Options:

1. Back-patch the addition of smgr_cached_nblocks or equivalent.
2. Stop the back-patch of $SUBJECT at v14.
3. Incur more lseek() in v13 and v12.

Given the lack of reports before v16, (3) seems too likely to be a cure worse
than the disease. I'm picking (2) for today. We could do (1) tomorrow, but I
lean toward (2) until someone reports the problem on v13 or v12. The
problem's impact is limited to DML giving ERROR when it should have succeeded,
and I expect VACUUM FULL is a workaround. Without those mitigating factors, I
would choose (1).

Pushed that way, as 9358297.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-04-13 22:58:39 Re: BUG #18429: Inconsistent results on similar queries with join lateral
Previous Message Eric Atkin 2024-04-12 23:22:48 Re: BUG #18430: syntax error when using aggregate function in where clause of subquery