Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Toshi Harada <harada(dot)toshi(at)po(dot)ntt-tx(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Date: 2019-03-14 13:28:50
Message-ID: 14646.1552570130@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for feedback!

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> However, this is also quite invasive. It changes a lot of code and it
> doesn't do so in a very predictable way. It's not like you went
> through and replaced every call to write() with a call to
> SpecialEncyptionMagicWrite(). Rather, there are new #ifdef
> USE_ENCRYPTION blocks all over the code base. I think it will be
> important to look for ways to refactor this functionality to reduce
> that sort of thing to the absolute minimum possible. Encryption needs
> to be something that the code needs to know about here and there, but
> not everywhere.

The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code
needs to compile even w/o OpenSSL (of course the encryption won't be enabled
in that case). I'll try to reduce the use of this construct only to the code
blocks that really reference the OpenSSL functions.

> Some of the changes are things that could perhaps be done as separate,
> preparatory patches. For instance, pgstat.c gets a heavy refactoring
> to make it do I/O in bigger chunks. There's no reason that you
> couldn't separate some of those changes out, clean them up, and get
> them committed separately. It would be more efficient even as things
> stand. OK, well, there is one reason: we may be getting a
> shared-memory stats collector soon, and if we do, then the need for
> all of this will go away. But the general point is valid: whatever is
> a useful cleanup apart from this patch should be done separately from
> this patch.

We'll omit the stats collector related changes so far because the patch [1] is
under active development, and that would require us to rebase our patch
often. And if the stats files will eventually go away, we won't need to
encrypt the statistics.

> As far as possible, we need to try to do things in the same way in the
> encrypted and non-encrypted cases. For example, it's pretty hard to
> feel good about code like this:
>
> + sz_hdr = sizeof(ReorderBufferDiskChange);
> + if (data_encrypted)
> +#ifdef USE_ENCRYPTION
> + sz_hdr = TYPEALIGN(ENCRYPTION_BLOCK, sz_hdr);
> +#else
> + ENCRYPTION_NOT_SUPPORTED_MSG;
> +#endif /* USE_ENCRYPTION */
>
> You won't be able to have much confidence that this code works both
> with and without encryption unless you test it both ways, because the
> file format is different in those two cases, and not just by being
> encrypted. That means that anyone who modifies this code in the
> future has two ways of breaking it. They can break the normal case,
> or they can break the encrypted case. If encryption were available as
> a sort of service that everyone could use, then that would probably be
> fine, but like I said above, I think something as invasive as this
> currently is will lead to a lot of complaints.

The problem I tried to solve here is that the whole encryption block (16
bytes) needs to be read and decrypted even if you need only part of its
data. In the snippet above I tried to ensure that the whole blocks are always
read, but I agree this approach is fragile.

Since buffile.c needs to handle this kind of problem (and it already does in
the last patch version), I think even other than temporary files could be
handled by this module. The patch below adds functions BufFileOpenTransient(),
BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(),
etc. respectively. Once we implement the encryption, these functions will also
hide handling of the encryption blocks from user. In this (preliminary) patch
As an example, I applied this approach to ReorderBufferSerializeChange() and
the function seems a bit simpler now. (ReorderBufferRestoreChange() would need
more work to adopt this approach.)

> The code needs a lot more documentation, not only SGML documentation
> but also code comments and probably a README file someplace.

ok, we'll do that.

> The interaction of this capability with certain tricks that PostgreSQL
> plays needs some thought -- and documentation, at least developer
> documentation, maybe user documentation. One area is recovery.
> Writing WAL relies on the fact that if you are in the midst of
> rewriting a block and the power fails, bytes that are the same in both
> the old and new block images will be undisturbed; encryption will make
> that false. How's that going to work?

Yes, if only a single bit is different, decryption will turn the whole
encryption block (16 bytes) into garbage. So we need to enforce
full_page_writes=on if encryption is enabled. The last version of the patch
does not do that.

> Hint bits also rely on this principle. I thought there might be some
> interaction between this work and wal_log_hints for this reason, but I see
> nothing of that sort in the patch.

I'm not sure if the hint bit is still a problem if we first copy the shared
buffer to backend-local memory and encrypt it there. That's what the patch
does.

> Full page writes seem related too; I don't know how something like this can
> be safe without both full_page_writes and wal_log_hints forced on. I don't
> know whether using encryption should forcibly override those settings or
> whether it should just refuse to start unless they are set properly, but I
> think it probably needs to do one or the other.

Maybe it's more polite to refuse to start so that user knows what's going
on. I'm not sure if PG ever changes any configuration variable forcibly.

> Hope this helps in some way. I think it would be good if this effort
> went forward in some way.

Sure, it helps! Thanks.

[1] https://commitfest.postgresql.org/22/1708/

--
Antonin Houska
https://www.cybertec-postgresql.com

Attachment Content-Type Size
buf_file_transient.patch text/x-diff 25.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2019-03-14 13:32:49 Re: Index Skip Scan
Previous Message Andrew Dunstan 2019-03-14 12:54:49 Re: pgsql: Add support for hyperbolic functions, as well as log10().