Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Toshi Harada <harada(dot)toshi(at)po(dot)ntt-tx(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Date: 2019-05-22 15:16:56
Message-ID: CA+TgmoaUw3QaHoj_QnkyFVMFLnfsDX32ctOz_zLPReQ-RjAioQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Attached is the next version. It tries to address various problems pointed out
> upthread, including documentation.

It's not immediately evident to me, perhaps because I haven't looked
at this stuff in a while, what the purpose of 0001 and 0002 is. I
think it'd be helpful if you would generate these patches with 'git
format-patch' and include a meaningful commit message in each patch of
the series.

+ <para>
+ This setting specifies path to executable file that returns
+ either <literal>encryption_key=</literal> for key
+ and <literal>encryption_password=</literal> for password.
+ </para>

This is incomprehensible. Executables don't return strings. Also,
you can return "either a or b" or you can return "both a and b," but
you can't return "either a and b".

- Build with support for <acronym>SSL</acronym> (encrypted)
- connections. This requires the <productname>OpenSSL</productname>
- package to be installed. <filename>configure</filename> will check
- for the required header files and libraries to make sure that
+ Build with support for on-disk data encryption
+ or <acronym>SSL</acronym> (encrypted) connections. This requires
+ the <productname>OpenSSL</productname> package to be
+ installed. <filename>configure</filename> will check for the
+ required header files and libraries to make sure that

I think we should instead just remove the word 'connections'. We
don't want to have multiple places in the documentation listing all
the things for which OpenSSL is required.

Generally, the documentation changes here have a lot of stuff about
key-or-password, which is quite confusing. It seems like the idea is
that you accept either a key or a password from which a key is
derived, but I don't think that's explained super-clearly, and I
wonder whether it's unnecessary complexity. Why not just require a
key always?

Another general point is that the documentation, comments, and README
need some work on the English. They contain lots of information, but
they read somewhat awkwardly and are hard to understand in some
places.

Wherever you are adding a large hunk of code into the middle of an
existing function (e.g. XLogWrite), please consider putting the logic
in a new function and calling it from the original function, instead
of just inlining the entire thing.

xloginsert.c adds a new #include even though there are no other
changes to that file. That should not be necessary. If it is, you've
broken the principle the header files are compilable independently of
other header files (except postgres.h, which is a prerequisite for
every header file).

-XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
- Size count)
+XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
Size count)

Spurious hunk. Please try to avoid unnecessary changes.

+ * XXX Currently the function is only called with startptr at page
+ * boundary. If it should change, encryption needs to reflect this fact,
+ * i.e. read data from the beginning of the page. Actually this is a
+ * reason to adjust and use walsender.c:XLogRead().

This comment and the similar comment at the end of the function are
not very clear. It sorta sounds, though, like maybe the decryption
logic should move from here to the caller or something. You seem to
be saying that, after your changes, this function depends on the
caller using it in a certain way, which is often a sign that you
haven't abstracted things in the right way. And if walsender.c's
version of the logic is better, why not also do that stuff here?

@@ -52,7 +54,6 @@

uint32 bootstrap_data_checksum_version = 0; /* No checksum */

-
#define ALLOC(t, c) \
((t *) MemoryContextAllocZero(TopMemoryContext, (unsigned)(c) * sizeof(t)))

Another useless hunk.

+ {
+ RelFileNode fromNode = {srctablespace, src_dboid, InvalidOid};
+ RelFileNode toNode = {dsttablespace, dboid, InvalidOid};
+
+ copydir(srcpath, dstpath, &fromNode, &toNode);
+ }

I suggest instead declaring these variables in the surrounding scope
and initializing them here. This style is unlike most PostgreSQL
code.

+ case WAIT_EVENT_KDF_FILE_READ:
+ event_name = "KDFFileRead";
+ break;

New wait events need documentation.

+are encrypted differently. Furthermore, as the Postgres page starts with LSN,
+even if only the last encryption block of the page changes, the whole cipher
+text of the page will be different after the next encryption.

But for a hint-bit-only change, this would not be true, unless we
force wal_log_hints. Sounds like we're doing that, but it might be
worth mentioning it here also.

There's also the question of unlogged tables, which do not have an
LSN. This logic wouldn't apply there, which might be a problem.

+If the encryption is enabled, the following requirements need to be taken into
+account:
+
+1. The file buffer cannot be positioned at arbitrary offset of the file. If

I think it's no good at all to think about enforcing requirements like
this only when data_encryption is used. That's just going to lead to
bugs. I think we need to either find a way to mask this restriction
from the users of the buffile facility, or else revise all those users
so that they respect this new rule and enforce it with an
unconditional assert.

That brings me to another point, which is that a complex patch like
this really needs to be divided up into smaller patches for review. I
see you've started that a little bit (but see comments on commit
messages above) but you should try to go further with it. For
example, changing the buffile contract so that there is some new rule
about positioning should be done as a separate patch (or not done at
all). That patch should be written so that it applies before the main
encryption patch, have comments and asserts explaining the new rules,
etc.

+To store other kinds of data encrypted than the ones above, developers are
+advised to use BufFileWriteTransient() and BufFileReadTransient() functions

The previous paragraph is about BufFileWrite() and BufFileRead(), and
that section is titled "temporary files." Now here you say that for
other kinds of data (not relations, xlog, or temporary files) we
should use BufFileWriteTransient(). That seems pretty weird, because
data that it is not temporary files would presumably be, uh, permanent
files? So then why is Transient in the name? I imagine there are
good reasons here but it needs to be made clearer.

+pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with
+the --link option.)

That seems pretty unfortunate. Any way to do better?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-05-22 15:21:21 Re: PG 12 draft release notes
Previous Message Tom Lane 2019-05-22 15:11:20 Re: Patch to fix write after end of array in hashed agg initialization