Re: FSM Corruption (was: Could not read block at end of the relation)

From: Noah Misch <noah(at)leadboat(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: FSM Corruption (was: Could not read block at end of the relation)
Date: 2024-03-13 16:55:23
Message-ID: 20240313165523.6a@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Mar 13, 2024 at 10:37:21AM +0100, Ronan Dunklau wrote:
> Le mardi 12 mars 2024, 23:08:07 CET Noah Misch a écrit :
> > > I would argue this is fine, as corruptions don't happen often, and FSM is
> > > not an exact thing anyway. If we record a block as full, it will just be
> > > unused until the next vacuum adds it back to the FSM with a correct
> > > value.
> > If this happened only under FSM corruption, it would be fine. I was
> > thinking about normal extension, with an event sequence like this:
> >
> > - RelationExtensionLockWaiterCount() returns 10.
> > - Backend B1 extends by 10 pages.
> > - B1 records 9 pages in the FSM.
> > - B2, B3, ... B11 wake up and fetch from the FSM. In each of those
> > backends, fsm_does_block_exist() returns false, because the cached relation
> > size is stale. Those pages remain unused until VACUUM re-records them in
> > the FSM.
> >
> > PostgreSQL added the multiple-block extension logic (commit 719c84c) from
> > hard-won experience bottlenecking there. If fixing $SUBJECT will lose 1% of
> > that change's benefit, that's probably okay. If it loses 20%, we should
> > work to do better. While we could rerun the 2016 benchmark to see how the
> > patch regresses it, there might be a simple fix. fsm_does_block_exist()
> > could skip the cache when blknumber>=cached, like this:
> >
> > return((BlockNumberIsValid(smgr->smgr_cached_nblocks[MAIN_FORKNUM])
> &&
> > blknumber < smgr-
> >smgr_cached_nblocks[MAIN_FORKNUM]) ||
> > blknumber < RelationGetNumberOfBlocks(rel));
> >
> > How do you see it? That still leaves us with an avoidable lseek in B2, B3,
> > ... B11, but that feels tolerable. I don't know how to do better without
> > switching to the ReadBufferMode approach.

> That makes sense. I'm a bit worried about triggering additional lseeks when we
> in fact have other free pages in the map, that would pass the test. I'm not
> sure how relevant it is given the way we search the FSM with fp_next_slot
> though...

That's a reasonable thing to worry about. We could do wrong by trying too
hard to use an FSM slot, and we could do wrong by not trying hard enough.

> To address that, I've given a bit of thought about enabling / disabling the
> auto-repair behaviour with a flag in GetPageWithFreeSpace to distinguish the
> cases where we know we have a somewhat up-to-date value compared to the case
> where we don't (as in, for heap, try without repair, then get an uptodate
> value to try the last block, and if we need to look at the FSM again then ask
> for it to be repaired) but it brings way too much complexity and would need
> careful thought for each AM.
>
> So please find attached a patch with the change you propose.
>
> Do you have a link to the benchmark you mention though to evaluate the patch
> against it ?

Here is one from the thread that created commit 719c84c:
https://www.postgresql.org/message-id/flat/CA%2BTgmob7xED4AhoqLspSOF0wCMYEomgHfuVdzNJnwWVoE_c60g%40mail.gmail.com

There may be other benchmarks earlier in that thread.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2024-03-13 17:08:09 Re: BUG #18387: Erroneous permission checks and/or misleading error messages with refresh materialized view
Previous Message Tom Lane 2024-03-13 14:12:10 Re: BUG #18390: exponentiation produces float datatype, but nth-root produces integer