From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: better page-level checksums |
Date: | 2022-06-14 16:10:03 |
Message-ID: | CAH2-Wznckc7EMLVZVR5eRWQVhP0VG-EGxG4UrBcPXAG17SuBeA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 14, 2022 at 8:48 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jun 13, 2022 at 6:26 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Anyway, I can see how it would be useful to be able to know the offset
> > of a nonce or of a hash digest on any given page, without access to a
> > running server. But why shouldn't that be possible with other designs,
> > including designs closer to what I've outlined?
>
> I don't know what you mean by this. As far as I'm aware, the only
> design you've outlined is one where the space wasn't at the same
> offset on every page.
I am skeptical of that particular aspect, yes. Though I would define
it the other way around (now the true special area struct isn't
necessarily at the same offset for a given AM, at least across data
directories).
My main concern is maintaining the ability to interpret much about the
contents of a page without context, and to not make it any harder to
grow the special area dynamically -- which is a broader concern.
Your patch isn't going to be the last one that wants to do something
with the special area. This needs to be carefully considered.
I see a huge amount of potential for adding new optimizations that use
subsidiary space on the page, presumably implemented via a special
area that can grow dynamically. For example, an ad-hoc compression
technique for heap pages that temporarily "absorbs" some extra
versions in the event of opportunistic pruning running and failing to
free enough space. Such a design would operate on similar principles
to deduplication in unique indexes, where the goal is to buy time
rather than buy space. When we fail to keep the contents of a heap
page together today, we often barely fail, so I expect something like
this to have an outsized impact on some workloads.
> In general, I was imagining that you'd need to look at the control
> file to understand how much space had been reserved per page in this
> particular cluster. I agree that's a bit awkward, especially for
> pg_filedump. However, pg_filedump and I think also some code internal
> to PostgreSQL try to figure out what kind of page we've got by looking
> at the *size* of the special space. It's only good luck that we
> haven't had a collision there yet, and continuing to rely on that
> seems like a dead end. Perhaps we should start including a per-AM
> magic number at the beginning of the special space.
It's true that that approach is just a hack -- we probably can do
better. I don't think that it's okay to break it, though. At least not
without providing a comparable alternative, that doesn't rely on
context from the control file.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-06-14 16:20:56 | Re: Small TAP improvements |
Previous Message | Tom Lane | 2022-06-14 16:09:16 | Re: Postgres NOT IN vs NOT EXISTS optimization |