Re: storing an explicit nonce

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>
Subject: Re: storing an explicit nonce
Date: 2021-05-26 02:23:46
Message-ID: CAOuzzgoLLmHhvXDS6n8q46Tyri_GZJJEaNDaB5ezrGyPfQqfNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

On Tue, May 25, 2021 at 22:11 Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, May 25, 2021 at 09:58:22PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > > On Tue, May 25, 2021 at 09:42:48PM -0400, Stephen Frost wrote:
> > > > The nonce needs to be a new one, if we include the hint bits in the
> set
> > > > of data which is encrypted.
> > > >
> > > > However, what I believe folks are getting at here is that we could
> keep
> > > > the LSN the same, but increase the nonce when the hint bits change,
> but
> > > > *not* WAL log either the nonce change or the hint bit change (unless
> > > > it's being logged for some other reason, in which case log both),
> thus
> > > > reducing the amount of WAL being produced. What would matter is that
> > > > both the hint bit change and the new nonce hit disk at the same
> time, or
> > > > neither do, or we replay back to some state where the nonce and the
> hint
> > > > bits 'match up' so that the page decrypts (and the integrity check
> > > > works).
> > >
> > > How do we prevent torn pages if we are writing the page with a new
> > > nonce, and no WAL-logged full page image?
> >
> > err, we'd still WAL the FPI, same as we do for checksums, that's what I
> > would expect and would think we'd need. As long as the FPI is in the
> > WAL since the last checkpoint, later changes to hint bits or the nonce
> > wouldn't matter- we'll replay the FPI and that'll have the right nonce
> > for the hint bits that were part of the FPI.
> >
> > Any subsequent changes to the hint bits wouldn't be WAL'd though and
> > neither would the changes to the nonce and that all should be fine
> > because we'll blow away the entire page on crash recovery to push it
> > back to what it was when we first wrote the page after the last
> > checkpoint. Naturally, other changes which have to be WAL'd would still
> > be done but those would be replayed in shared buffers on top of the
> > prior FPI and the nonce set to some $new value (one which we know
> > couldn't have been used prior, by incrementing by some value) when we go
> > to write out that new page.
>
> OK, I see what you are saying. If we use a nonce that is not the full
> page write LSN then we can use it for hint bit changes _after_ the first
> full page write during the checkpoint, and we don't need to WAL log that
> since it isn't a real LSN and we can throw it away on crash recovery.
> This is not possible if we are using the LSN for the full page write LSN
> for the hint bit nonce, though we could use a dummy WAL record to
> generate an LSN for this, right?

Yes, think you’ve got it. To do it using LSNs and ensure that we always
have a unique nonce we’d have to generated dummy WAL, in order to get new
LSNs to make sure the nonce is always unique and that wouldn’t be great.

Andres mentioned other possible cases where the LSN doesn’t change even
though we change the page and, as he’s probably right, we would have to
figure out a solution in those cases too (potentially including cases like
crash recovery or replay on a replica where we can’t really just go around
creating dummy WAL records to get new LSNs..). If the nonce isn’t the LSN
then suddenly those cases are fine and the LSN can stay the same and it
doesn’t matter that the nonce is changed when we write out the page during
crash recovery because it’s not tied to the WAL/LSN stream.

If I’ve got it right, that does mean that the nonces on the replica might
differ from those on the primary though and I’m not completely sure how I
feel about that. We might wish to explicitly document that, due to such
risk, users should use unique and distinct keys on each replica that are
different from the primary and each other (not a bad idea in general
anyway, but would be quite important with this strategy).

Thanks,

Stephen

>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-05-26 02:39:16 Re: storing an explicit nonce
Previous Message Andy Fan 2021-05-26 02:19:23 Re: Hybrid Hash/Nested Loop joins and caching results from subplans