Temporary file access API

From: Antonin Houska <ah(at)cybertec(dot)at>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Temporary file access API
Date: 2022-02-08 12:24:58
Message-ID: 4987.1644323098@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here I'm starting a new thread to discuss a topic that's related to the
Transparent Data Encryption (TDE), but could be useful even without that. The
problem has been addressed somehow in the Cybertec TDE fork, and I can post
the code here if it helps. However, after reading [1] (and the posts
upthread), I've got another idea, so let's try to discuss it first.

It makes sense to me if we first implement the buffering (i.e. writing/reading
certain amount of data at a time) and make the related functions aware of
encryption later: as long as we use a block cipher, we also need to read/write
(suitably sized) chunks rather than individual bytes (or arbitrary amounts of
data). (In theory, someone might need encryption but reject buffering, but I'm
not sure if this is a realistic use case.)

For the buffering, I imagine a "file stream" object that user creates on the
top of a file descriptor, such as

FileStream *FileStreamCreate(File file, int buffer_size)

or

FileStream *FileStreamCreateFD(int fd, int buffer_size)

and uses functions like

int FileStreamWrite(FileStream *stream, char *buffer, int amount)

and

int FileStreamRead(FileStream *stream, char *buffer, int amount)

to write and read data respectively.

Besides functions to close the streams explicitly (e.g. FileStreamClose() /
FileStreamFDClose()), we'd need to ensure automatic closing where that happens
to the file. For example, if OpenTemporaryFile() was used to obtain the file
descriptor, the user expects that the file will be closed and deleted on
transaction boundary, so the corresponding stream should be freed
automatically as well.

To avoid code duplication, buffile.c should use these streams internally as
well, as it also performs buffering. (Here we'd also need functions to change
reading/writing position.)

Once we implement the encryption, we might need add an argument to the
FileStreamCreate...() functions that helps to generate an unique IV, but the
...Read() / ...Write() functions would stay intact. And possibly one more
argument to specify the kind of cipher, in case we support more than one.

I think that's enough to start the discussion. Thanks for feedback in advance.

[1] https://www.postgresql.org/message-id/CA%2BTgmoYGjN_f%3DFCErX49bzjhNG%2BGoctY%2Ba%2BXhNRWCVvDY8U74w%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2022-02-08 12:35:18 Re: storing an explicit nonce
Previous Message Joe Conway 2022-02-08 11:59:41 Re: [PATCH v2] use has_privs_for_role for predefined roles