Re: Temporary file access API

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Temporary file access API
Date: 2022-08-08 18:27:27
Message-ID: 5037.1659983247@antos.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Jul 29, 2022 at 11:36 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > Attached is a new version. It allows the user to set "elevel" (i.e. ERROR is
> > not necessarily thrown on I/O failure, if the user prefers to check the number
> > of bytes read/written) and to specify the buffer size. Also, 0015 adds a
> > function to copy data from one file descriptor to another.
>
> I basically agree with Peter Eisentraut's earlier comment: "I don't
> understand what this patch set is supposed to do." The intended
> advantages, as documented in buffile.c, are:
>
> + * 1. Less code is needed to access the file.
> + *
> + * 2.. Buffering reduces the number of I/O system calls.
> + *
> + * 3. The buffer size can be controlled by the user. (The larger the buffer,
> + * the fewer I/O system calls are needed, but the more data needs to be
> + * written to the buffer before the user recognizes that the file access
> + * failed.)
> + *
> + * 4. It should make features like Transparent Data Encryption less invasive.
>
> It does look to me like there is a modest reduction in the total
> amount of code from these patches, so I do think they might be
> achieving (1).
>
> I'm not so sure about (2), though. A number of these places are cases
> where we only do a single write to the file - like in patches 0010,
> 0011, 0012, and 0014, for example. And even in 0013, where we do
> multiple I/Os, it makes no sense to introduce a second buffering
> layer. We're reading from a file into a buffer several kilobytes in
> size and then shipping the data out. The places where you want
> buffering are places where we might be doing system calls for a few
> bytes of I/O at a time, not places where we're already doing
> relatively large I/Os. An extra layer of buffering probably makes
> things slower, not faster.

The buffering seemed to me like a good opportunity to plug-in the encryption,
because the data needs to be encrypted in memory before it's written to
disk. Of course it'd be better to use the existing buffering for that than to
add another layer.

> I don't think that (3) is a separate advantage from (1) and (2), so I
> don't have anything separate to say about it.

I thought that the uncontrollable buffer size is one of the things you
complaint about in

https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

> I find it really unclear whether, or how, it achieves (4). In general,
> I don't think that the large number of raw calls to read() and write()
> spread across the backend is a particularly good thing. TDE is an
> example of a feature that might affect every byte we read or write,
> but you can also imagine instrumentation and tracing facilities that
> might want to be able to do something every time we do an I/O without
> having to modify a zillion separate call sites. Such features can
> certainly benefit from consolidation, but I suspect it isn't helpful
> to treat every I/O in exactly the same way. For instance, let's say we
> settle on a design where everything in the cluster is encrypted but,
> just to make things simple for the sake of discussion, let's say there
> are no stored nonces anywhere. Can we just route every I/O in the
> backend through a wrapper layer that does encryption and decryption?

> Probably not. I imagine postgresql.conf and pg_hba.conf aren't going
> to be encrypted, for example, because they're intended to be edited by
> users. And Bruce has previously proposed designs where the WAL would
> be encrypted with a different key than the data files. Another idea,
> which I kind of like, is to encrypt WAL on a record level rather than
> a block level. Either way, you can't just encrypt every single byte
> the same way. There are probably other relevant distinctions: some
> data is temporary and need not survive a server restart or be shared
> with other backends (except maybe in the case of parallel query) and
> some data is permanent. Some data is already block-structured using
> blocks of size BLCKSZ; some data is block-structured using some other
> block size; and some data is not block-structured. Some data is in
> standard-format pages that are part of the buffer pool and some data
> isn't. I feel like some of those distinctions probably matter to TDE,
> and maybe to other things that we might want to do.
>
> For instance, to go back to my earlier comments, if the data is
> already block-structured, we do not need to insert a second layer of
> buffering; if it isn't, maybe we should, not just for TDE but for
> other reasons as well. If the data is temporary, TDE might want to
> encrypt it using a temporary key which is generated on the fly and
> only stored in server memory, but if it's data that needs to be
> readable after a server restart, we're going to need to use a key that
> we'll still have access to in the future. Or maybe that particular
> thing isn't relevant, but I just can't believe that every single I/O
> needs exactly the same treatment, so there have got to be some
> patterns here that do matter. And I can't really tell whether this
> patch set has a theory about that and, if so, what it is.

I didn't want to use this API for relations (file reads/writes for these only
affects a couple of places in md.c), nor for WAL. The intention was to use the
API for temporary files, and also for other kinds of "auxiliary" files where
the current coding is rather verbose but not too special.

To enable the encryption, we'd need to add an extra argument (structure) to
the appropriate function, which provides the necessary information on the
encryption, such as cipher kind (block/stream), the encryption key or the
nonce. I just don't understand whether the TDE alone (4) justifies patch like
this, so I tried to make more use of it.

On the other hand, if code simplification (1) should be the only benefit, the
patch might be too controversial as well.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2022-08-08 18:34:27 Re: bug on log generation ?
Previous Message Tom Lane 2022-08-08 18:06:00 Re: automatically generating node support functions