From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Aleksander Alekseev <aleksander(at)tigerdata(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: The ability of postgres to determine loss of files of the main fork |
Date: | 2025-10-08 13:04:46 |
Message-ID: | CA+TgmoaB-rQzb6Hx94hWVVE4+HUw=p0qqUm_68bOiqW8t6fzGg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 1, 2025 at 11:25 AM Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com> wrote:
> What would be a achievable way of making postgres under the relation size?
> How about a field in pg_class that keeps the final data page, so that the catalog
> keeps the size, which then allows utilities and the database itself to understand how
> many segments should exist?
I think that would definitely be impractical. Your idea of having an
option for amcheck that is the reverse of heapallindexed
(indexallheaped?) seems perfectly reasonable as a debugging tool and
probably not that hard to implement, but actually noticing organically
would be tricky, both in terms of code complexity and also in terms of
performance.
Updating pg_class every time we extend any relation in the system by a
block is definitely going to be painfully slow -- and there's also the
problem that you can't very well track the length of pg_class itself
by updating pg_class, because you might not be able to update pg_class
without extending it. What seems more practical is to store metadata
in a metapage within each relation or in some separate storage.
However, even that is far from problem-free. Even in the best case
where there are no other problems, you're talking about emitting WAL
records upon relation extension, which I suspect would cause a quite
noticeable impact if you did it for every block.
An idea that I had was to keep track of the number of segments rather
than the entire length of the relation. That's not as good, because
then you can't detect truncation of the last file, but it would be
good enough to detect the disappearance of entire files, and it would
mean that the metadata only needs to be updated once per GB of the
relation rather than every time you extend.
But even this has a lot of engineering challenges. To really be able
to do the cross-checks in a meaningful way, you'd want the md*
functions to have access to the information -- and I'm having some
difficulty imagining how we would arrange for that. For instance, if
mdread() is asked for a block and first needs to know whether that
block (or the containing segment) should exist, it's not going to have
access to the relcache to check some cached data. We could possibly
cache something in the SMgrRelation, but if the cache is not
populated, then we'd have to read the data from the original source.
But surely we can't have mdread() calling ReadBuffer(); that would be
a huge layering violation and would likely cause some very unpleasant
problems.
I expect there is some way to rejigger things so that the md.c layer
has to be told by the higher layers how many segments can exist and
then to figure out a way to bootstrap things, but it's probably all
quite complicated so I am definitely not volunteering to be the one to
do the work...
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2025-10-08 13:07:06 | [PATCH] Remove unused #include's in src/backend/commands/* |
Previous Message | David Bidoc | 2025-10-08 12:44:14 | Re: oid2name : add objects file path |