Re: better page-level checksums

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: better page-level checksums
Date: 2022-06-10 14:23:04
Message-ID: CA+TgmoabE38p2wPqhQ4Q_r-n6KYz-RxNBnPUtCcY3B7C89j_iQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2022 at 9:36 AM Peter Eisentraut
<peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
> I think there ought to be a bit more principled analysis here than just
> "let's add a lot more bits". There is probably some kind of information
> to be had about how many CRC bits are useful for a given block size, say.
>
> And then there is the question of performance. When data checksum were
> first added, there was a lot of concern about that. CRC is usually
> baked directly into hardware, so it's about as cheap as we can hope for.
> SHA not so much.

That's all pretty fair. I have to admit that SHA checksums sound quite
expensive, and also that I'm no expert on what kinds of checksums
would be best for this sort of application. Based on the earlier
discussions around TDE, I do think that people want tamper-resistant
checksums here too -- like maybe something where you can't recompute
the checksum without access to some secret. I could propose naive ways
to do that, like prepending a fixed chunk of secret bytes to the
beginning of every block and then running SHA512 or something over the
result, but I'm sure that people with actual knowledge of cryptography
have developed much better and more robust ways of doing this sort of
thing.

I've really been devoting most of my mental energy here to
understanding what problems there are at the PostgreSQL level - i.e.
when we carve out bytes for a wider checksum, what breaks? The only
research that I did to try to understand what algorithms might make
sense was a quick Google search, which led me to the list of
algorithms that btrfs uses. I figured that was a good starting point
because, like a filesystem, we're encrypting fixed-size blocks of
data. However, I didn't intend to present the results of that quick
look as the definitive answer to the question of what might make sense
for PostgreSQL, and would be interested in hearing what you or anyone
else thinks about that.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hsu, John 2022-06-10 15:12:16 Re: A proposal to force-drop replication slots to make disabling async/sync standbys or logical replication faster in production environments
Previous Message Robert Haas 2022-06-10 13:58:40 Re: better page-level checksums