Re: Temporary file access API

From: Antonin Houska <ah(at)cybertec(dot)at>
To: John Morris <john(at)precision-gps(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Temporary file access API
Date: 2022-09-23 13:45:58
Message-ID: 17018.1663940758@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

John Morris <john(at)precision-gps(dot)com> wrote:

> I’m a latecomer to the discussion, but as a word of introduction, I’m working with Stephen, and I have started looking over the temp file proposal with the idea of helping it move along.
>
> I’ll start by summarizing the temp file proposal and its goals.
>
> From a high level, the proposed code:
>
> * Creates an fread/fwrite replacement (BufFileStream) for buffering data to a single file.
>
> * Updates BufFile, which reads/writes a set of files, to use BufFileStream internally.
>
> * Does not impact the normal (page cached) database I/O.
>
> * Updates all the other places where fread/fwrite and read/write are used.

Not really all, just those where the change seemed reasonable (i.e. where it
does not make the code more complex)

> * Creates and removes transient files.

The "stream API" is rather an additional layer on top of files that user needs
to create / remove at lower level.

> I see the following goals:
>
> * Unify all the “other” file accesses into a single, consistent API.
>
> * Integrate with VFDs.
>
> * Integrate transient files with transactions and tablespaces.

If you mean automatic closing/deletion of files on transaction end, this is
also the lower level thing that I didn't try to change.

> * Create a consolidated framework where features like encryption and compression can be more easily added.
>
> * Maintain good streaming performance.
>
> Is this a fair description of the proposal?

Basically that's it.

> For myself, I’d like to map out how features like compression and encryption would fit into the framework, more as a sanity check than anything else, and I’d like to look closer at some of the implementation details. But at the moment, I want to make sure I have the
> proper high-level view of the temp file proposal.

I think the high level design (i.e. how the API should be used) still needs
discussion. In particular, I don't know whether it should aim at the
encryption adoption or not. If it does, then it makes sense to base it on
buffile.c, because encryption essentially takes place in memory. But if
buffering itself (w/o encryption) is not really useful at other places (see
Robert's comments in [1]), then we can design something simpler, w/o touching
buffile.c (which, in turn, won't be usable for encryption, compression or so).

So I think that code simplification and easy adoption of in-memory data
changes (such as encryption or compression) are two rather distinct goals.
admit that I'm running out of ideas how to develop a framework that'd be
useful for both.

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZWP8UtkNVLd75Qqoh9VGOVy_0xBK%2BSHZAdNvnfaikKsQ%40mail.gmail.com

> From: Robert Haas <robertmhaas(at)gmail(dot)com>
> Date: Wednesday, September 21, 2022 at 11:54 AM
> To: Antonin Houska <ah(at)cybertec(dot)at>
> Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
> Subject: Re: Temporary file access API
>
> On Mon, Aug 8, 2022 at 2:26 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > > I don't think that (3) is a separate advantage from (1) and (2), so I
> > > don't have anything separate to say about it.
> >
> > I thought that the uncontrollable buffer size is one of the things you
> > complaint about in
> >
> > https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com
>
> Well, I think that email is mostly complaining about there being no
> buffering at all in a situation where it would be advantageous to do
> some buffering. But that particular code I believe is gone now because
> of the shared-memory stats collector, and when looking through the
> patch set, I didn't see that any of the cases that it touched were
> similar to that one.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-23 15:25:44 Re: First draft of the PG 15 release notes
Previous Message Michael Paquier 2022-09-23 12:12:22 Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)