Re: better page-level checksums

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: better page-level checksums
Date: 2022-06-14 16:10:03
Message-ID: CAH2-Wznckc7EMLVZVR5eRWQVhP0VG-EGxG4UrBcPXAG17SuBeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 14, 2022 at 8:48 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jun 13, 2022 at 6:26 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Anyway, I can see how it would be useful to be able to know the offset
> > of a nonce or of a hash digest on any given page, without access to a
> > running server. But why shouldn't that be possible with other designs,
> > including designs closer to what I've outlined?
>
> I don't know what you mean by this. As far as I'm aware, the only
> design you've outlined is one where the space wasn't at the same
> offset on every page.

I am skeptical of that particular aspect, yes. Though I would define
it the other way around (now the true special area struct isn't
necessarily at the same offset for a given AM, at least across data
directories).

My main concern is maintaining the ability to interpret much about the
contents of a page without context, and to not make it any harder to
grow the special area dynamically -- which is a broader concern.
Your patch isn't going to be the last one that wants to do something
with the special area. This needs to be carefully considered.

I see a huge amount of potential for adding new optimizations that use
subsidiary space on the page, presumably implemented via a special
area that can grow dynamically. For example, an ad-hoc compression
technique for heap pages that temporarily "absorbs" some extra
versions in the event of opportunistic pruning running and failing to
free enough space. Such a design would operate on similar principles
to deduplication in unique indexes, where the goal is to buy time
rather than buy space. When we fail to keep the contents of a heap
page together today, we often barely fail, so I expect something like
this to have an outsized impact on some workloads.

> In general, I was imagining that you'd need to look at the control
> file to understand how much space had been reserved per page in this
> particular cluster. I agree that's a bit awkward, especially for
> pg_filedump. However, pg_filedump and I think also some code internal
> to PostgreSQL try to figure out what kind of page we've got by looking
> at the *size* of the special space. It's only good luck that we
> haven't had a collision there yet, and continuing to rely on that
> seems like a dead end. Perhaps we should start including a per-AM
> magic number at the beginning of the special space.

It's true that that approach is just a hack -- we probably can do
better. I don't think that it's okay to break it, though. At least not
without providing a comparable alternative, that doesn't rely on
context from the control file.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-06-14 16:20:56 Re: Small TAP improvements
Previous Message Tom Lane 2022-06-14 16:09:16 Re: Postgres NOT IN vs NOT EXISTS optimization