Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Sehrope Sarkuni <sehrope(at)jackdb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-08-07 23:40:05
Message-ID: CAH7T-arosbP_JODsbh_qcLy_N-bo=1JnpqOLMsajh+=8XcLwKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 7, 2019 at 1:39 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Wed, Aug 7, 2019 at 11:41:51AM -0400, Sehrope Sarkuni wrote:
> > On Wed, Aug 7, 2019 at 7:19 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> > On Wed, Aug 7, 2019 at 05:13:31PM +0900, Masahiko Sawada wrote:
> > > I understood. IIUC in your approach postgres processes encrypt WAL
> > > records when inserting to the WAL buffer. So WAL data is encrypted
> > > even on the WAL buffer.
> >
> >
> > I was originally thinking of not encrypting the shared WAL buffers but
> that may
> > have issues. If the buffers are already encrypted and contiguous in
> shared
> > memory, it's possible to write out many via a single pg_pwrite(...) call
> as is
> > currently done in XLogWrite(...).
>
> The shared buffers will not be encrypted --- they are encrypted only
> when being written to storage. We felt encrypting shared buffers will
> be too much overhead, for little gain. I don't know if we will encrypt
> while writing to the WAL buffers or while writing the WAL buffers to
> the file system.
>

My mistake on the wording. By "shared WAL buffers" I meant the shared
memory used for WAL buffers, XLogCtl->pages. Not the shared buffers for
pages.

> > If they're not encrypted you'd need to do more work in that critical
> section.
> > That'd involve allocating a commensurate amount of memory to hold the
> encrypted
> > pages and then encrypting them all prior to the single pg_pwrite(...)
> call.
> > Reusing one buffer is possible but it would require encrypting and
> writing the
> > pages one by one. Both of those seem like a bad idea.
>
> Well, right now the 8k pages is part of the WAL stream, so I don't know
> it would be any more overhead than other WAL writes.

The total work is the same but when it happens, memory usage, or number of
syscalls could change.

Right now the XLogWrite(...) code can write many WAL pages at once via a
single call to pg_pwrite(...):
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=f55352385732c6b0124eff5265462f3883fe7435;hb=HEAD#l2491

If the blocks are not encrypted then you either need to allocate and
encrypt everything (could be up to wal_buffers max size) to do it as one
write, or encrypt chunks of WAL and do multiple writes. I'm not sure how
big an issue this would be in practice as it'd be workload specific.

> I am hoping we can
> generate the encryption bit stream in chunks earlier so we can just to
> the XOR was we are writing the data to the WAL buffers.
>

For pure CTR that sounds doable as it'd be the same as doing an XOR with
encrypted zero. Anything with a built-in MAC like GCM would not work though
(I'm not proposing we use that, just keeping it in mind).

You'd also increase your memory requirements (one allocation for the
encrypted stream and one for the encrypted data right?).

> > Better to pay the encryption cost at the time of WAL record creation and
> keep
> > the writing process as fast and simple as possible.
>
> Yes, I don't think we know at the time of WAL record creation what
> _offset_ the records will have when then are written to WAL, so I am
> thinking we need to do it later, and as I said, I am hoping we can
> generate the encryption bit stream earlier.
>
> > > It works but I think the implementation might be complex; For
> example
> > > using openssl, we would use EVP functions to encrypt data by
> > > AES-256-CTR. We would need to make IV and pass it to them and these
> > > functions however don't manage the counter value of nonce as long
> as I
> > > didn't miss. That is, we need to calculate the correct counter
> value
> > > for each encryption and pass it to EVP functions. Suppose we
> encrypt
> > > 20 bytes of WAL. The first 16 bytes is encrypted with nonce of
> > > (segment_number, 0) and the next 4 bytes is encrypted with nonce of
> > > (segment_number, 1). After that suppose we encrypt 12 bytes of
> WAL. We
> > > cannot use nonce of (segment_number, 2) but should use nonce of
> > > (segment_number , 1). Therefore we would need 4 bytes padding and
> to
> > > encrypt it and then to throw that 4 bytes away .
> >
> > Since we want to have per-byte control over encryption, for both
> > heap/index pages (skip LSN and CRC), and WAL (encrypt to the last
> byte),
> > I assumed we would need to generate a bit stream of a specified size
> and
> > do the XOR ourselves against the data. I assume ssh does this, so we
> > would have to study the method.
> >
> >
> > The lower level non-EVP OpenSSL functions allow specifying the offset
> within
> > the 16-byte AES block from which the encrypt/decrypt should proceed.
> It's the
> > "num" parameter of their encrypt/decrypt functions. For a continuous
> encrypted
> > stream such as a WAL file, a "pread(...)" of a possibly non-16-byte
> aligned
> > section would involve determining the 16-byte counter (byte_offset / 16)
> and
> > the intra-block offset (byte_offset % 16). I'm not sure how one handles
> > initializing the internal encrypted counter and that might be one more
> step
> > that would need be done. But it's definitely possible to read / write
> less than
> > a block via those APIs (not the EVP ones).
> >
> > I don't think the EVP functions have parameters for the intra-block
> offset but
> > you can mimic it by initializing the IV/block counter and then skipping
> over
> > the intra-block offset by either reading or writing a dummy partial
> block. The
> > EVP read and write functions both deal with individual bytes so once
> you've
> > seeked to your desired offset you can read or write the real individual
> bytes.
>
> Can we generate the bit stream in 1MB chunks or something and just XOR
> as needed?
>

With the provisos above, yes I think that would work though I don't think
it's a good idea. Better to start off using the functions directly and then
look into optimizing only if they're a bottleneck. As a first pass I'd
break it up as separate writes with the encryption happening at write time.
If that works fine there's no need to complicate things further.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-07 23:51:54 Re: crash 11.5~ (and 11.4)
Previous Message David G. Johnston 2019-08-07 22:01:31 Re: Documentation clarification re: ANALYZE