Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-08-06 16:10:08
Message-ID: CAD21AoCGK_P_-kNgSrz0KdnubLdtO7vKpkWqA8C8N9aPCkGqiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Aug 7, 2019, 00:31 Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> Hi Bruce,
> (off-list)
>
> I think I'm missing something about basic of encryption. Please let me
> question about it on off-list.
>

Sorry for the noise, it was not off-list. I made a mistake.

> On Tue, Aug 6, 2019 at 11:36 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> > On Tue, Aug 6, 2019 at 12:00:27PM +0900, Masahiko Sawada wrote:
> > > What I'm thinking about WAL encryption is that WAL records on WAL
> > > buffer is not encrypted. When writing to the disk we copy the contents
> > > of 8k WAL page to a temporary buffer and encrypt it, and then write
> > > it. And according to the current behavior, every time we write WAL we
> > > write WAL per 8k WAL pages rather than WAL records.
> > >
> > > The nonce for WAL encryption is {segment number, counter}. Suppose we
> > > write 100 bytes WAL at beginning of the first 8k WAL page in WAL
> > > segment 50. We encrypt the entire 8k WAL page with the nonce starting
> > > from {50, 0} and write to the disk. After that, suppose we append 200
> > > bytes WAL to the same WAL page. We again encrypt the entire 8k WAL
> > > page with the nonce staring from {50, 0} and write to the disk. The
> > > two 8k WAL pages we wrote to the disk are different but we encrypted
> > > them with the same nonce, which I think it's bad.
> >
> > OK, I think you are missing something. Let me go over the details.
> > First, I think we are all agreed we are using CTR for heap/index pages,
> > and for WAL, because CTR allows byte granularity, it is faster, and
> > might be more secure.
> >
> > So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
> > to encrypt each page. In CTR mode, we do that by creating an 8k bit
> > stream, which is created in 16-byte chunks with AES by incrementing the
> > counter used for each 16-byte chunk. Wee then XOR the bits with what we
> > want to encrypt, and skip the LSN and CRC parts of the page.
> >
> > For WAL, we effectively create a 16MB bitstream, though we can create it
> > in parts as needed. (Creating it in parts is easier in CTR mode.) The
> > nonce is the segment number, but each 16-byte chunk uses a different
> > counter. Therefore, even if you are encrypting the same 8k page several
> > times in the WAL, the 8k page would be different because of the LSN (and
> > other changes), and the bitstream you encrypt/XOR it with would be
> > different because the counter would be different for that offset in the
> > WAL.
>
> Well, so you mean that for example we encrypt only 100 bytes WAL
> record when append 100 bytes WAL records?
>
> For WAL encryption, if we encrypt the entire 8k WAL page and write the
> entire page, the encrypted-and-written page will contain 100 bytes WAL
> record data and (8192-100) bytes garbage (omitted WAL page header for
> simplify), although WAL data on WAL buffer is still not encrypted
> state. And then if we append 200 bytes again, the
> encrypted-and-written page will contain 300 bytes WAL record data and
> (8192-300)bytes garbage, data on WAL buffer is still not encrypted
> state though.
>
> In this case I think the first 100 bytes of two 8k WAL pages are the
> same because we encrypted both from the beginning of the page with the
> counter = 0. But the next 200 bytes are different; it's (encrypted)
> garbage in the former case but it's (encrypted) WAL record data in the
> latter case. I think that's a problem.
>
> On the other hand, if we encrypt 8k WAL page with the different
> counter of nonce after append 200 byes WAL record, the first 100 byte
> (and of course the entire 8k page also) will be different. However
> since it's the same thing doing as changing already-flushed WAL record
> on the disk it's bad.
>
> Also, if we encrypt only append data instead of entire 8k page, we
> would need to have the information in somewhere about how much byte
> the WAL page has valid values. Otherwise reading WAL would not work
> fine.
>
> Please advise me what I am missing.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>

Regards,

>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-08-06 16:35:35 Re: The unused_oids script should have a reminder to use the 8000-8999 OID range
Previous Message Stephen Frost 2019-08-06 16:07:06 Re: More issues with pg_verify_checksums and checksum verification in base backups