Re: Temporary file access API

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Temporary file access API
Date: 2022-07-05 13:24:25
Message-ID: 85651.1657027465@antos.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:

> On 04.07.22 18:35, Antonin Houska wrote:
> >> Attached is a new version of the patch, to evaluate what the API use in the
> >> backend could look like. I haven't touched places where the file is accessed
> >> in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
> >> called.
> > Rebased patch set is attached here, which applies to the current master.
> > (A few more opportunities for the new API implemented here.)
>
> I don't understand what this patch set is supposed to do. AFAICT, the thread
> originally forked off from a TDE discussion and, considering the thread
> subject, was possibly once an attempt to refactor temporary file access to
> make integrating encryption easier? The discussion then talked about things
> like saving on system calls and excising all OS-level file access API use,
> which I found confusing, and the thread then just became a general TDE-related
> mini-discussion.

Yes, it's an attempt to make the encryption less invasive, but there are a few
other objectives, at least: 1) to make the read / write operations "less
low-level" (there are common things like error handling which are often just
copy & pasted from other places), 2) to have buffered I/O with configurable
buffer size (the current patch version still has fixed buffer size though)

It's true that the discussion ends up to be encryption-specific, however the
scope of the patch is broader. The first meassage of the thread references a
related discussion

https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

which is more important for this patch than the suggestions about encryption.

> The patches at hand extend some internal file access APIs and then sprinkle
> them around various places in the backend code, but there is no explanation
> why or how this is better. I don't see any real structural savings one might
> expect from a refactoring patch. No information has been presented how this
> might help encryption.

At this stage I expected feedback from the developers who have already
contributed to the discussion, because I'm not sure myself if this version
fits their ideas - that's why I didn't elaborate the documentation yet. I'll
try to summarize my understanding in the next version, but I'd appreciate if I
got some feedback for the current version first.

> I also suspect that changing around the use of various file access APIs needs
> to be carefully evaluated in each individual case. Various code has subtle
> expectations about atomic write behavior, syncing, flushing, error recovery,
> etc. I don't know if this has been considered here.

I considered that, but could have messed up at some places. Right now I'm
aware of one problem: pgstat.c does not expect the file access API to raise
ERROR - this needs to be fixed.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brindle, Joshua 2022-07-05 13:27:06 Re: Patch proposal: New hooks in the connection path
Previous Message gkokolatos 2022-07-05 13:22:47 Re: Add LZ4 compression in pg_dump