[PATCHES] Post-special page storage TDE support

From: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>
Subject: [PATCHES] Post-special page storage TDE support
Date: 2022-10-24 17:55:53
Message-ID: CAOxo6XKWHHUr1agOZxEHuL-UW8Me3YndUsJ=09tcDiw+Ld8YEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi -hackers,

An additional piece that I am working on for improving infra for TDE
features is allowing the storage of additional per-page data. Rather
than hard-code the idea of a specific struct, this is utilizing a new,
more dynamic structure to associate page offsets with a particular
feature that may-or-may-not be present for a given cluster. I am
calling this generic structure a PageFeature/PageFeatureSet (better
names welcome), which is defined for a cluster at initdb/bootstrap
time, and reserves a given amount of trailing space on the Page which
is then parceled out to the consumers of said space.

While the immediate need that this feature fills is storage of
encryption tags for XTS-based encryption on the pages themselves, this
can also be used for any optional features; as an example I have
implemented expanded checksum support (both 32- and 64-bit), as well
as a self-description "wasted space" feature, which just allocates
trailing space from the page (obviously intended as illustration
only).

There are 6 commits in this series:

0001 - adds `reserved_page_space` global, making various size
calculations and limits dynamic, adjusting access methods to offset
special space, and ensuring that we can safely reserve allocated space
from the end of pages.

0002 - test suite stability fixes - the change in number of tuples per
page means that we had some assumptions about the order from tests
that now break

0003 - the "PageFeatures" commit, the meat of this feature (see
following description)

0004 - page_checksum32 feature - store the full 32-bit checksum across
the existing pd_checksum field as well as 2 bytes from
reserved_page_space. This is more of a demo of what could be done
here than a practical feature.

0005 - wasted space PageFeature - just use up space. An additional
feature we can turn on/off to see how multiple features interact.
Only for illustration.

0006 - 64-bit checksums - fully allocated from reserved_page_space.
Using an MIT-licensed 64-bit checksum, but if we determined we'd want
to do this we'd probably roll our own.

From the commit message for PageFeatures:

Page features are a standardized way of assigning and using dynamic
space usage from the tail end of
a disk page. These features are set at cluster init time (so
configured via `initdb` and
initialized via the bootstrap process) and affect all disk pages.

A PageFeatureSet is effectively a bitflag of all configured features,
each of which has a fixed
size. If not using any PageFeatures, the storage overhead of this is 0.

Rather than using a variable location struct, an implementation of a
PageFeature is responsible for
an offset and a length in the page. The current API returns only a
pointer to the page location for
the implementation to manage, and no further checks are done to ensure
that only the expected memory
is accessed.

Access to the underlying memory is synonymous with determining whether
a given cluster is using an
underlying PageFeature, so code paths can do something like:

char *loc;

if ((loc = ClusterGetPageFeatureOffset(page, PF_MY_FEATURE_ID)))
{
// ipso facto this feature is enabled in this cluster *and* we
know the memory address
...
}

Since this is direct memory access to the underlying Page, ensure the
buffer is pinned. Explicitly
locking (assuming you stay in your lane) should only need to guard
against access from other
backends of this type if using shared buffers, so will be use-case dependent.

This does have a runtime overhead due to moving some offset
calculations from compile time to
runtime. It is thought that the utility of this feature will outweigh
the costs here.

Candidates for page features include 32-bit or 64-bit checksums,
encryption tags, or additional
per-page metadata.

While we are not currently getting rid of the pd_checksum field, this
mechanism could be used to
free up that 16 bits for some other purpose. One such purpose might be
to mirror the cluster-wise
PageFeatureSet, currently also a uint16, which would mean the entirety
of this scheme could be
reflected in a given page, opening up per-relation or even per-page
setting/metadata here. (We'd
presumably need to snag a pd_flags bit to interpret pd_checksum that
way, but it would be an
interesting use.)

Discussion is welcome and encouraged!

Thanks,

David

Attachment Content-Type Size
0005-A-second-page-feature-just-to-allocate-more-space.patch application/octet-stream 4.3 KB
0001-Add-reserved_page_space-to-Page-structure.patch application/octet-stream 18.9 KB
0004-Add-page_checksums32-page-feature.patch application/octet-stream 16.5 KB
0003-Add-cluster-wide-Page-Features.patch application/octet-stream 30.4 KB
0006-Add-64-bit-checksum-page-feature.patch application/octet-stream 31.1 KB
0002-Make-the-output-of-select_views-test-stable.patch application/octet-stream 218.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2022-10-24 18:38:52 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Zhihong Yu 2022-10-24 17:19:23 fixing typo in comment for restriction_is_or_clause