Quick Links

Re: The ability of postgres to determine loss of files of the main fork

From:	Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc:	Michael Banck <mbanck(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)tigerdata(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: The ability of postgres to determine loss of files of the main fork
Date:	2025-10-01 12:15:06
Message-ID:	A2681026-3AE4-4F3E-9BD7-5F78DF2D1E34@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thank you for looking into this Jakub, Aleksander, Michael, Tom,

There are more scenario's where files might be missing: how about a backup somehow missing files?
But also deliberate sabotage, you can hide data in this way, and it would be hard to detect, and even
harder to understand what happened, why it happened and how it happened. I dare to say quite ideal
if you want a name to be removed from a database a covert way, and know something about the
database and have access, or somehow can manipulate in the data directory.

Without having the exact sourcecode files and functions handy, the way I think using the segments
works in postgres, is that the database, because of the absence of any indicator of size, will scan
the relfilenode, and when it reaches the set limit (1GB), it will try to open the next segment. If that
segment doesn't exist, the database assumes it's the end of the segment. That is all. That is why there
is no error: it's exactly alike when there truly is no more data, and there is no metadata to understand
there is supposed to be more data.

And therefore I talked about the primary key: that contains "indirect" data to be able to detect if a
table has a missing segment. But I think really what is needed is a max page number to indicate
the known size.

Frits Hoogland

> On 1 Oct 2025, at 13:20, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> On Wed, Oct 1, 2025 at 9:02 AM Michael Banck <mbanck(at)gmx(dot)net> wrote:
>>
>> Hi,
>>
>> wow, this is one of the most terrifying threads I've ever seen...
>
> Same.
>
>> On Tue, Sep 30, 2025 at 12:41:29PM -0400, Tom Lane wrote:
>>> Aleksander Alekseev <aleksander(at)tigerdata(dot)com> writes:
>>>>> Therefore, I would like to request an enhancement: add an option to
>>>>> verify_heapam() that causes the primary key index to be scanned and makes
>>>>> sure that all line pointers in the index point to existing tuples.
>>>
>>>> ... IMO there is little value in adding a check for the existence of
>>>> the segments for a single table. And the *real* check will not differ
>>>> much from something like SELECT * FROM my_table, or from making a
>>>> complete backup of the database.
>>>
>>> As Frits mentioned, neither of those actions will really notice if a
>>> table has been truncated via loss of a segment.
>>
>> Is there a valid case for a missing segment? If not, couldn't this be
>> caught somewhere in the storage manager?
>>
>
> I've took a look on PG17 and in _mfd_openseg() there's if fd < 0
> return NULL after open(), but out of it's callers only _mdfd_getseg()
> seems to be alerting on that NULL. To me this seems like a bug,
> because i've seen way too many times people and software deleting
> files randomly. Even simple crashes (with e2fsck, xfs_repair) could
> put orphaned inodes into /lost+found. IMHO all files should be opened
> at least on startup to check integrity, because the non-zero return
> code (during such SELECT) for openat(2) seems o be coming out of
> RelationGetNumberOfBlocksInFork()->table_block_relation_size()->smgrnblocks()->mdnblocks()->_mdfd_openseg().
> Now if the 1st seg file would be missing we would complain in
> mdopenfork(). mdnblocks() says even "all active segments of the
> relation are opened...", but even that apparently is not true.
>
> The bigger context seems to be be that 049469e7e7cfe0c69 (2015) could
> be culprit here as well, as it is stated there that mdnblocks() could
> earlier create zero-length files back in day and it removed that
> ereport(ERROR) when unable to access that file.
>
> Another idea (than this being a bug) is that Thomas had a large
> relation patchset back in [1], but I wouldn't be a fan of us operating
> on 31-32TB files ;)
>
> -J.
>
> [1] - https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA%40mail.gmail.com

In response to

Re: The ability of postgres to determine loss of files of the main fork at 2025-10-01 11:20:48 from Jakub Wartak

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Banck	2025-10-01 12:15:52	Re: The ability of postgres to determine loss of files of the main fork
Previous Message	Nazir Bilal Yavuz	2025-10-01 12:09:27	Re: split func.sgml to separated individual sgml files