Re: better page-level checksums

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: better page-level checksums
Date: 2022-06-09 21:33:55
Message-ID: CAH2-Wz=4r2k4-yJvDSh2tYu_2_=0LwNTr9m4ukfYY27=MgT0AA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 9, 2022 at 2:13 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm interested in assessing the feasibility of a "better page-level
> checksums" feature. I have a few questions, and a few observations.
> One of my questions is what algorithm(s) we'd want to support. I did a
> quick Google search and found that brtfs supports CRC-32C, XXHASH,
> SHA256, and BLAKE2B. I don't know that we want to support that many
> options (but maybe we do) and I don't think CRC-32C makes any sense
> here, for two reasons. First, we've already got a 16-bit checksum, and
> a 32-bit checksum doesn't seem like it's gaining enough to be worth
> the implementation complexity.

Why not? The only problems that it won't solve are all related to
crypto. Which is perfectly fine, but it seems like there is a
terminology issue here. ISTM that you're really talking about adding a
cryptographic hash function, not a checksum. These are rather
different things.

> Even if we only offer one new kind of checksum, making space for a
> wider checksum makes the page format variable in a way that it
> currently isn't.

I believe that the page special area was designed to be
variable-sized, and even anticipates dynamic resizing of the special
area. At least in index AMs, where it's not that hard to make extra
space in the special area by shifting the tuples back, and then fixing
line pointers to point to the new offsets. So you have a dynamic
variable-sized array that's a little like a second line pointer array
(though probably not added to all that often).

My preference is for an approach that builds on that, or at least
doesn't significantly complicate it. So a cryptographic hash or nonce
can go in the special area proper (structs like BTPageOpaqueData don't
need any changes), but at a page offset before the special area proper
-- not after.

What disadvantages does that approach have, if any, from your point of view?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-06-09 21:35:30 Re: better page-level checksums
Previous Message Finnerty, Jim 2022-06-09 21:20:00 Re: Collation version tracking for macOS