Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-08-06 14:35:58
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 6, 2019 at 12:00:27PM +0900, Masahiko Sawada wrote:
> What I'm thinking about WAL encryption is that WAL records on WAL
> buffer is not encrypted. When writing to the disk we copy the contents
> of 8k WAL page to a temporary buffer and encrypt it, and then write
> it. And according to the current behavior, every time we write WAL we
> write WAL per 8k WAL pages rather than WAL records.
> The nonce for WAL encryption is {segment number, counter}. Suppose we
> write 100 bytes WAL at beginning of the first 8k WAL page in WAL
> segment 50. We encrypt the entire 8k WAL page with the nonce starting
> from {50, 0} and write to the disk. After that, suppose we append 200
> bytes WAL to the same WAL page. We again encrypt the entire 8k WAL
> page with the nonce staring from {50, 0} and write to the disk. The
> two 8k WAL pages we wrote to the disk are different but we encrypted
> them with the same nonce, which I think it's bad.

OK, I think you are missing something. Let me go over the details.
First, I think we are all agreed we are using CTR for heap/index pages,
and for WAL, because CTR allows byte granularity, it is faster, and
might be more secure.

So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
to encrypt each page. In CTR mode, we do that by creating an 8k bit
stream, which is created in 16-byte chunks with AES by incrementing the
counter used for each 16-byte chunk. Wee then XOR the bits with what we
want to encrypt, and skip the LSN and CRC parts of the page.

For WAL, we effectively create a 16MB bitstream, though we can create it
in parts as needed. (Creating it in parts is easier in CTR mode.) The
nonce is the segment number, but each 16-byte chunk uses a different
counter. Therefore, even if you are encrypting the same 8k page several
times in the WAL, the 8k page would be different because of the LSN (and
other changes), and the bitstream you encrypt/XOR it with would be
different because the counter would be different for that offset in the

Bruce Momjian <bruce(at)momjian(dot)us>

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-08-06 14:58:15 Re: More issues with pg_verify_checksums and checksum verification in base backups
Previous Message Alvaro Herrera 2019-08-06 13:30:53 Re: Problem with default partition pruning