Re: The ability of postgres to determine loss of files of the main fork

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Michael Banck <mbanck(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)tigerdata(dot)com>, Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: The ability of postgres to determine loss of files of the main fork
Date: 2025-10-01 11:20:48
Message-ID: CAKZiRmwHKY=KTjBEL3S2cVQpo1OjHyky4BdgJm4Hkv1-ig9PfQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 1, 2025 at 9:02 AM Michael Banck <mbanck(at)gmx(dot)net> wrote:
>
> Hi,
>
> wow, this is one of the most terrifying threads I've ever seen...

Same.

> On Tue, Sep 30, 2025 at 12:41:29PM -0400, Tom Lane wrote:
> > Aleksander Alekseev <aleksander(at)tigerdata(dot)com> writes:
> > >> Therefore, I would like to request an enhancement: add an option to
> > >> verify_heapam() that causes the primary key index to be scanned and makes
> > >> sure that all line pointers in the index point to existing tuples.
> >
> > > ... IMO there is little value in adding a check for the existence of
> > > the segments for a single table. And the *real* check will not differ
> > > much from something like SELECT * FROM my_table, or from making a
> > > complete backup of the database.
> >
> > As Frits mentioned, neither of those actions will really notice if a
> > table has been truncated via loss of a segment.
>
> Is there a valid case for a missing segment? If not, couldn't this be
> caught somewhere in the storage manager?
>

I've took a look on PG17 and in _mfd_openseg() there's if fd < 0
return NULL after open(), but out of it's callers only _mdfd_getseg()
seems to be alerting on that NULL. To me this seems like a bug,
because i've seen way too many times people and software deleting
files randomly. Even simple crashes (with e2fsck, xfs_repair) could
put orphaned inodes into /lost+found. IMHO all files should be opened
at least on startup to check integrity, because the non-zero return
code (during such SELECT) for openat(2) seems o be coming out of
RelationGetNumberOfBlocksInFork()->table_block_relation_size()->smgrnblocks()->mdnblocks()->_mdfd_openseg().
Now if the 1st seg file would be missing we would complain in
mdopenfork(). mdnblocks() says even "all active segments of the
relation are opened...", but even that apparently is not true.

The bigger context seems to be be that 049469e7e7cfe0c69 (2015) could
be culprit here as well, as it is stated there that mdnblocks() could
earlier create zero-length files back in day and it removed that
ereport(ERROR) when unable to access that file.

Another idea (than this being a bug) is that Thomas had a large
relation patchset back in [1], but I wouldn't be a fan of us operating
on 31-32TB files ;)

-J.

[1] - https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2025-10-01 11:45:59 Re: The ability of postgres to determine loss of files of the main fork
Previous Message Hayato Kuroda (Fujitsu) 2025-10-01 11:00:54 RE: POC: enable logical decoding when wal_level = 'replica' without a server restart