storing an explicit nonce

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: storing an explicit nonce
Date: 2021-05-25 16:46:45
Message-ID: CA+TgmoaD8wMN6i1mmuo+4ZNeGE3Hd57ys8uV8UZm7cneqy3W2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 18, 2021 at 2:59 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Ultimately, we need to make sure that LSNs aren't re-used. There's two
> > sources of LSNs today: those for relations which are being written into
> > the WAL and those for relations which are not (UNLOGGED relations,
> > specifically). The 'minimal' WAL level introduces complications with
>
> Well, the story is a little more complex than that --- we currently have
> four LSN uses:
>
> 1. real LSNs for WAL-logged relfilenodes
> 2. real LSNs for GiST indexes for non-WAL-logged relfilenodes of permanenet relations
> 3. fake LSNs for GiST indexes for relfilenodes of non-permanenet relations
> 4. zero LSNs for non-GiST non-permanenet relations
>
> This patch changes it so #4 gets fake LSNs, and slightly adjusts #2 & #3
> so the LSNs are always unique.

Hi!

This approach has a few disadvantages. For example, right now, we only
need to WAL log hints for the first write to each page after a
checkpoint, but in this approach, if the same page is written multiple
times per checkpoint cycle, we'd need to log hints every time. In some
workloads that could be quite expensive, especially if we log an FPI
every time.

Also, I think that all sorts of non-permanent relations currently get
zero LSNs, not just GiST. Every unlogged table and every temporary
table would need to use fake LSNs. Moreover, for unlogged tables, the
buffer manager would need changes, because it is otherwise going to
assume that anything it sees in the pd_lsn field other than a zero is
a real LSN.

So I would like to propose an alternative: store the nonce in the
page. Now the next question is where to put it. I think that putting
it into the page header would be far too invasive, so I propose that
we instead store it at the end of the page, as part of the special
space. That makes an awful lot of code not really notice that anything
is different, because it always thought that the usable space on the
page ended where the special space begins, and it doesn't really care
where that is exactly. The code that knows about the special space
might care a little bit, but whatever private data it's storing is
going to be at the beginning of the special space, and the nonce would
be stored - in this proposal - at the end of the special space. So it
turns out that it doesn't really care that much either.

Attached are a few WIP/POC patches from my colleague Bharath
implementing this. There are certainly some implementation
deficiencies here, which can be corrected if we decide this approach
is worth pursuing, but I think they are sufficient to show that the
approach is viable and also some of the consequences of going this
way. One thing that happens is that a bunch of values that used to be
constant - like TOAST_INDEX_TARGET and GinDataPageMaxDataSize - become
non-constant. I suggested to Bharath that he handle this by changing
those macros to take the nonce size as an argument, which is what the
patch does, although it missed pushing that idea down all the way in
some obscure case (e.g. SIGLEN_MAX). That has the down side that we
will now have more computation to do at runtime vs. compile-time. I am
unclear whether there would be enough impact to get exercised about,
but I'm hopeful that the answer is "no".

As written, the patch makes initdb take a --tde-nonce-size argument,
but that's really just for demonstration purposes. I assume that, if
we decide to go this way, we'd have an initdb option that selects
whether to use encryption, or perhaps the specific encryption
algorithm to be used, and then the nonce size would be computed based
on that, or else set to 0 if encryption is not in use.

Comments?

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Provide-TDE-nonce-size-as-an-initdb-option.patch application/octet-stream 8.2 KB
v1-0003-Adjust-tests-for-configurable-TDE-nonce-size.patch application/octet-stream 19.2 KB
v1-0002-Add-TDE-nonce-bytes-to-page-pd_special-structure.patch application/octet-stream 128.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-05-25 17:00:22 Re: Test of a partition with an incomplete detach has a timing issue
Previous Message Andy Fan 2021-05-25 16:25:41 Re: How can the Aggregation move to the outer query