Re: Transparent column encryption

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent column encryption
Date: 2023-01-10 17:26:53
Message-ID: 39414634-723D-43E4-9EC6-7E955201733C@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Dec 31, 2022, at 6:17 AM, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>
> Another update, with some merge conflicts resolved.

Hi Peter, thanks for the patch!

I wonder if logical replication could be made to work more easily with this feature. Specifically, subscribers of encrypted columns will need the encrypted column encryption key (CEK) and the name of the column master key (CMD) as exists on the publisher, but getting access to that is not automated as far as I can see. It doesn't come through automatically as part of a subscription, and publisher's can't publish the pg_catalog tables where the keys are kept (because publishing system tables is not supported.) Is it reasonable to make available the CEK and CMK to subscribers in an automated fashion, to facilitate setting up logical replication with less manual distribution of key information? Is this already done, and I'm just not recognizing that you've done it?

Can we do anything about the attack vector wherein a malicious DBA simply copies the encrypted datum from one row to another? Imagine the DBA Alice wants to murder a hospital patient Bob by removing the fact that Bob is deathly allergic to latex. She cannot modify the Bob's encrypted and authenticated record, but she can easily update Bob's record with the encrypted record of a different patient Charlie. Likewise, if she want's Bob to pay Charlie's bill, she can replace Charlie's encrypted credit card number with Bob's, and once Bob is dead, he won't dispute the charges.

An encrypted-and-authenticated column value should be connected with its row in some way that Alice cannot circumvent. In the patch as you have it written, the client application can include row information in the patient record (specifically, the patient's name, ssn, etc) and verify when any patient record is retrieved that this information matches. But that's hardly "transparent" to the client. It's something all clients will have to do, and easy to forget to do in some code path. Also, for encrypted fixed-width columns, it is not an option. So it seems the client needs to "salt" (maybe not the right term for what I have in mind) the encryption with some relevant other columns, and that's something the libpq client would need to understand, and something the patch's syntax needs to support. Something like:

CREATE TABLE patient_records (
-- Cryptographically connected to the encrypted record
patient_id BIGINT NOT NULL,
patient_ssn CHAR(11),

-- The encrypted record
patient_record TEXT ENCRYPTED WITH (column_encryption_key = cek1,
column_encryption_salt = (patient_id, patient_ssn)),

-- Extra stuff, not cryptographically connected to anything
next_of_kin TEXT,
phone_number BIGINT,
...
);

I have not selected any algorithms that include such "salt"ing (again, maybe the wrong word) because I'm just trying to discuss the general feature, not get into the weeds about which cryptographic algorithm to select.

Thoughts?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-10 17:39:46 Re: can while loop in ClockSweepTick function be kind of infinite loop in some cases?
Previous Message Ted Yu 2023-01-10 17:26:17 Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL