Re: better page-level checksums

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: better page-level checksums
Date: 2022-06-14 17:42:55
Message-ID: CA+Tgmobu2sUKDCiYKtgs-6XeGzaXaQR3DXgf1AB=suZpGCHnNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 14, 2022 at 11:08 AM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> I agree with the premise of one only needing one such blob on the
> page, yet I don't think that putting it on the exact end of the page
> is the best option.
>
> PageGetSpecialPointer is much simpler when you can rely on the
> location of the special area. As special areas can be accessed N times
> each time a buffer is loaded from disk, and yet the 'storage system
> extra blob' only twice (once read, once write), I think the special
> area should have priority when handing out page space.

Hmm, but on the other hand, if you imagine a scenario in which the
"storage system extra blob" is actually a nonce for TDE, you need to
be able to find it before you've decrypted the rest of the page. If
pd_checksum gives you the offset of that data, you need to exclude it
from what gets encrypted, which means that you need encrypt three
separate non-contiguous areas of the page whose combined size is
unlikely to be a multiple of the encryption algorithm's block size.
That kind of sucks (and putting it at the end of the page makes it way
better).

That said, I certainly agree that finding the special space needs to
be fast. The question in my mind is HOW fast it needs to be, and what
techniques we might be able to use to dodge the problem. For instance,
suppose that, during the startup sequence, we look at the control
file, figure out the size of the 'storage system extra blob', and
based on that each AM figures out the byte-offset of its special space
and caches that in a global variable. Then, instead of
PageGetSpecialSpace(page) it does PageGetBtreeSpecialSpace(page) or
whatever, where the implementation is ((char*) page) +
the_afformentioned_global_variable. Is that going to be too slow?

If it is, then I think this whole effort may be in more trouble than I
can get it out of, because it's not just the location of the special
space that is an issue here, and indeed from what I can see that's not
even the most important issue. There's tons of constants that are
computed based on the amount of usable space in the page, and I don't
have a better idea than turning those constants into global variables
that are computed once ... well, perhaps in some cases we could
multiply compile hot bits of code, once per possible value of the
compile-time constant, but I'm pretty sure we don't want to do that
for the entire index AM.

There's going to have to be some compromise here. On the one hand
you're going to have people who want to be able to do run-time
conversions between page formats even at the cost of extra runtime
overhead on top of what the basic feature necessarily implies. On the
other hand you're going to have people who don't think any overhead at
all is acceptable, even if it's purely nominal and only visible on a
microbenchmark. Such arguments can easily become holy wars. I think we
should take a pragmatic approach: big slowdowns are categorically
unacceptable, and every effort must be made to minimize overhead, but
if the only permissible amount of overhead is exactly zero, then
there's no hope of ever implementing any of these kinds of features. I
don't think that's actually what most people want.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-06-14 17:43:26 Re: better page-level checksums
Previous Message Mark Wong 2022-06-14 17:40:06 Re: real/float example for testlibpq3