Re: Internal key management system

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Moon, Insung" <tsukiwamoon(dot)pgsql(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Sehrope Sarkuni <sehrope(at)jackdb(dot)com>, cary huang <hcary328(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: Internal key management system
Date: 2020-06-23 13:46:14
Message-ID: CA+fd4k5RCNZCDW-VxStYA8yDkLHXBC5TnFCkWXMa5P5U0usx6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 19 Jun 2020 at 15:44, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
>
> Hello Masahiko-san,
>
> > What I referred to "only one key" is KEK.
>
> Ok, sorry, I misunderstood.
>
> >>> Yeah, it depends on KMS, meaning we need different extensions for
> >>> different KMS. A KMS might support an interface that creates key if not
> >>> exist during GET but some KMS might support CREATE and GET separately.
> >>
> >> I disagree that it is necessary, but this is debatable. The KMS-side
> >> interface code could take care of that, eg:
> >>
> >> if command is "get X"
> >> if (X does not exist in KMS)
> >> create a new key stored in KMS, return it;
> >> else
> >> return KMS-stored key;
> >> ...
> >>
> >> So you can still have a "GET" only interface which adapts to the "final"
> >> KMS. Basically, the glue code which implements the interface for the KMS
> >> can include some logic to adapt to the KMS point of view.
> >
> > Is the above code is for the extension side, right?
>
> Such a code could be in the command with which pg communicates (eg through
> its stdin/stdout, or whatever) to get keys.
>
> pg talks to the command, the command can do anything, such as storing keys
> or communicating with an external service to retrieve them, anything
> really, that is the point.
>
> I'm advocating defining the pg/command protocol, something along "GET xxx"
> as you wrote, and possibly provide a possible/reasonable command
> implementation, which would be part of the code you put in your patch,
> only it would be in the command instead of postgres.
>
> > For example, if users want to use a cloud KMS, say AWS KMS, to store
> > DEKs and KEK they need an extension that is loaded to postgres and can
> > communicate with AWS KMS. I imagine that such extension needs to be
> > written in C,
>
> Why? I could write it in bash, probably. Ok, maybe not so good for suid,
> but in principle it could be anything. I'd probably write it in C, though.
>
> > the communication between the extension uses AWS KMS API, and the
> > communication between postgres core and the extension uses text
> > protocol.
>
> I'm not sure of the word "extension" above. For me the postgres side could
> be an extension as in "CREATE EXTENSION". The command itself could be
> provided in the extension code, but would not be in the "CREATE
> EXTENSION", it would be something run independently.

Oh, I imagined extensions that can be installed by CREATE EXTENSION or
specifying it to shared_preload_libraries.

If the command runs in the background to talk with postgres there are
some problems that we need to deal with. For instance, what if the
process downs? Does the postmaster re-execute it? How does it work in
single-user mode? etc. It seems to me it will bring another
complexity.

>
> > When postgres core needs a DEK identified by KEY-A, it asks
> > for the extension to get the DEK by passing something like “GET KEY-A”
> > message, and then the extension asks the existence of that key to AWK
> > KMS, creates if not exist and returns it to the postgres core. Is my
> > understanding right?
>
> Yes. The command in the use-case you outline would just be an
> intermediary, but for another use-case it would do the storing. The point
> of aiming at extensibility if that from pg point of view the external
> commands provide keys, but what these commands really do to do this can be
> anything.
>
> > When we have TDE feature in the future, we would also need to change
> > frontend tools such as pg_waldump and pg_rewind that read database
> > files so that they can read encrypted files. It means that these
> > frond-end tools also somehow need to communicate with the external
> > place to get DEKs in order to decrypt encrypted database files. In
> > your idea, what do you think about how we can support it?
>
> Hmmm. My idea was that the natural interface would be to get things
> through postgres. For a debug tool such as pg_waldump, probably it needs
> to be adapted if it needs to decrypt data to operate.
>
> Now I'm not sure I understood, because of the DEK are managed by postgres
> in your patch, waldump and other external commands would have no access to
> the decrypted data anyway, so the issue would be the same?

With the current patch, we will be able to add
--cluster-passphase-command command-line option to front-end tools
that want to read encrypted database files. The front-end tools
execute the specified command to get KEK and unwrap DEKs. The
functions such as running passphrase command, verifying the passphrase
is correct, and getting wrapped DEKs from the database cluster are
implemented in src/common so both can use these functions.

>
> With file-level encryption, obviously all commands which have to read and
> understand the files have to be adapted if they are to still work, which
> is another argument to have some interface rather than integrated
> server-side stuff, because these external commands would need to be able
> to get keys and use them as well.
>
> Or I misunderstood something.
>
> >> I'd like an extensible design to have anything in core. As I said in an
> >> other mail, if you want to handle a somehow restricted use case, just
> >> provide an external extension and do nothing in core, please. Put in core
> >> something that people with a slightly different use case or auditor can
> >> build on as well. The current patch makes a dozen hard-coded decisions
> >> which it should not, IMHO.
> >
> > It might have confused you that I included key manager and encryption
> > SQL functions to the patches but this key manager has been designed
> > dedicated to only TDE.
>
> Hmmm. This is NOT AT ALL what the patch does. The documentation in your
> patch talks about "column level encryption", which is an application
> thing. Now you seem to say that it does not matter and can be removed
> because the use case is elsewhere.
>
> > It might be better to remove both SQL interface
> > and SQL key we discussed from the patch set as they are actually not
> > necessary for TDE purposes.
>
> The documentation part of the patch, at no point, talks about TDE
> (transparent data encryption), which is a file-level encryption as far as
> I understand it, i.e. whole files are encrypted.

I should have described the positioning of these patches.. The current
patch is divided into 7 patches but there are two patches for
different purposes.

The first patch is to add an internal key manager. This is
PostgreSQL’s internal component to manage cryptographic keys mainly
for TDE. Originally I proposed both key manager and TDE[1] but these
are actually independent each other and we can discuss them
separately. Therefore, I started a new thread to discuss only the key
manager. According to the discussion so far, since TDE doesn't need to
dynamically register DEKs the key manager doesn't have an interface to
register DEK for now. We cannot do anything with only the first patch
but the plan is to implement TDE on top of the key manager. So we can
implement the key manager patch and TDE patch separately but these
actually depend on each other.

The second patch is to make the key manager use even without TDE. The
idea of this patch is to help the incremental development of TDE.
There was a discussion that since the development of both the key
manager and TDE will take a long time, it’s better to make the key
manager work alone by providing SQL interfaces to it. This patch adds
new SQL functions: pg_wrap() and pg_unwrap(), to wrap and unwrap the
arbitrary user secret by the encryption key called SQL internal key
which is managed and stored by the key manager. What this patch does
is to register the SQL internal key to the key manager and add SQL
functions that wrap and unwrap the given string in the same way of key
wrapping used by the key manager. These functions renamed to
pg_encrypt() and pg_decrypt().

Given that the purpose of the key manager is to help TDE, discussing
the SQL interface part (i.g., the second patch) deviates from the
original purpose. I think we should discuss the design and
implementation of the key manager first and then other additional
interfaces. So I’ve attached a new version patch and removed the
second patch part so that we can focus on only the key manager part.

>
> I'm lost, because if you want to do that you cannot easily use
> padding/HMAC and so because they would change block sizes, and probably
> you would use CRT instead of CBC to be able to decrypt data selectively.
>

The patch introducing TDE will add CTR mode. The padding/HMAC is for
only wrapping DEKs by the key manager. Since this patch adds only key
manager it has only the encryption methods it needs.

> So you certainly succeeded in confusing me deeply:-)
>
> > Aside from the security risk you mentioned, it was a natural design
> > decision for me that we have our key manager component in postgres core
> > that is responsible for managing encryption keys for our TDE.
>
> The patch really needs a README to explain what it really does, and why,
> and how, and what is the thread model, what are the choices (there should
> be as few as possible), how it can/could be extended.
>
> I've looked at the whole patch, and I could not find the place where files
> are actually encrypted/decrypted at a low level, that I would expect for
> file encryption implementation.

As I explained above, the patch introduces only the key manager which
will be a building block of TDE.

>
> > To make the key manager and TDE simple as much as possible, we discussed
> > that we will have cluster-wide TDE and key manager that manages a few
> > encryption keys used by TDE (e.g. one key for table/index encryption and
> > another key for WAL encryption), as the first step.
>
> Hmmm. Ok. So in fact all that is for TDE, *but* the patch does not do TDE,
> but provides a column-oriented SQL-level encryption, which is unrelated to
> your actual objective, which is to do file-level encryption in the end.
>
> However, for TDE, it may that you cannot do it with a pg extension because
> for the extension to work the database must work, which would require some
> "data" files not to be encrypted in the first place. That seems like a
> good argument to actually have something in core.
>
> Probably for TDE you only want the configuration file not to be encrypted.

Yeah, for TDE, what we have discussed is to encrypt only tables,
indexes, temporary files, and WAL (and possibly other database files
that could have or help to infer user sensitive data). And each
table/index page first several bytes in the page header are not
encrypted.

>
> I'd still advocate to have the key management system possibly outside of
> pg, and have pg interact with it to get keys when needed. Probably key ids
> would be the relative file names in that case. The approach of
> externalizing encryption/decryption would be totally impractical for
> performance reasons, though.
>
> I see value in Cary Huang suggestion on the thread to have dynamically
> loaded functions implement an interface. That would at least allow to
> remove some hardcoded choices such as what cypher is actually used, key
> sizes, and so on. One possible implementation would be to manage things
> more or less internally as you do, another to fork an external command and
> talk with it to do the same.
>
> However, I do not share the actual interface briefly outlined: I do not
> thinkpg should have to care about key management functions such as
> rotation, generation or derivation, storage… the interest of pg should be
> limited to retrieving the keys it needs. That does not mean such functions
> do not have security value and should not be implemented, I'd say that it
> should not be visible/hardcoded in the pg/kms interface, especially if
> this interface is expected to be generic.

Since the current key manager is designed for only TDE or something
encryption feature inside PostgreSQL, the usage of the key manager is
limited. It’s minimum and simple but it has minimum extensible that
PostgreSQL internal modules such as TDE can register their
cryptographic key. I agree to have more an extensible feature if it is
expected to cover generic use cases but currently it isn't.

>
> As I see it, a pg/kms C-level loadable interface would provide the
> following function:
>
> // options would be supplied by a guc and allow to initialize the
> // interface with the relevant data, whatever the underlying
> // implementation needs.
> error kms_init(char *options);
>
> // associate opaque key identifier to a local id
> error kms_key(local_id int, int key_id_len, byte *key_id);
>
> or maybe something like:
>
> // would return the local id attributed to the key
> error/int kms_key(key_id_len, key_id);
>
> // the actual functions should be clarified
> // for TDE file-level, probably the encrypted length is the same as the
> // input, you cannot have padding, hmac or whatever added.
> // for SQL app-level, the rules could be different
> error kms_(en|de)crypt(local_id int, int mode, int len,
> byte *in, byte *out);
>
> // maybe
> error kms_key_forget(local_id int);
> error kms_destroy(…);
>
> // maybe, to allow extensibility and genericity
> // eg kms_command("rotate keys with new kek=123");
> error kms_command(char *cmd);

My first proposal implements a similar thing; having C-level loadable
interface to get, generate, and rotating KEK. But the concern was the
development cost and complexity vs benefit. And I don't think it's a
good idea to support both key management and encryption/decryption as
kms interface.

>
> I'm a little bit unsure that there should be only one KMS active ever,
> though: a file-level vs app-level encryption could have quite different
> constraints. Also, should the app-level encryption be able to access keys
> loaded for file-level encryption?
>

You mean app-level encryption also uses encryption keys get from postgres?

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO%3D8N%3Dnc2xVZPB0d9e-VjJ%3DYaRnw%40mail.gmail.com

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v12-0003-Documentation-update.patch application/octet-stream 22.1 KB
v12-0002-Add-key-management-module.patch application/octet-stream 58.8 KB
v12-0001-Add-encryption-functions-for-both-frontend-and-b.patch application/octet-stream 12.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-06-23 13:53:54 Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)
Previous Message Dilip Kumar 2020-06-23 13:30:32 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions