Re: Streaming I/O, vectored I/O (WIP)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Streaming I/O, vectored I/O (WIP)
Date: 2023-11-29 19:39:44
Message-ID: CA+hUKGJywvC5J+rcLs3+whK8YPBnKXVLn5o0JFJbNeb4CEdSFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 29, 2023 at 1:44 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> LGTM. I think this 0001 patch is ready for commit, independently of the
> rest of the patches.

Done.

> In v2-0002-Provide-vectored-variants-of-FileRead-and-FileWri-1.patch, fd.h:
>
> > +/* Filename components */
> > +#define PG_TEMP_FILES_DIR "pgsql_tmp"
> > +#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
> > +
>
> These seem out of place, we already have them in common/file_utils.h.

Yeah, they moved from there in f39b2658 and I messed up the rebase. Fixed.

> Other than that,
> v2-0002-Provide-vectored-variants-of-FileRead-and-FileWri-1.patch and
> v2-0003-Provide-vectored-variants-of-smgrread-and-smgrwri.patch look
> good to me.

One thing I wasn't 100% happy with was the treatment of ENOSPC. A few
callers believe that short writes set errno: they error out with a
message including %m. We have historically set errno = ENOSPC inside
FileWrite() if the write size was unexpectedly small AND the kernel
didn't set errno to a non-zero value (having set it to zero ourselves
earlier). In FileWriteV(), I didn't want to do that because it is
expensive to compute the total write size from the vector array and we
managed to measure an effect due to that in some workloads.

Note that the smgr patch actually handles short writes by continuing,
instead of raising an error. Short writes do already occur in the
wild on various systems for various rare technical reasons other than
ENOSPC I have heard (imagine transient failure to acquire some
temporary memory that the kernel chooses not to wait for, stuff like
that, though certainly many people and programs believe they should
not happen[1]), and it seems like a good idea to actually handle them
as our write sizes increase and the probability of short writes might
presumably increase.

With the previous version of the patch, we'd have to change a couple
of other callers not to believe that short writes are errors and set
errno (callers are inconsistent on this point). I don't really love
that we have "fake" system errors but I also want to stay focused
here, so in this new version V3 I tried a new approach: I realised I
can just always set errno without needing the total size, so that
(undocumented) aspect of the interface doesn't change. The point
being that it doesn't matter if you clobber errno with a bogus value
when the write was non-short. Thoughts?

[1] https://utcc.utoronto.ca/~cks/space/blog/unix/WritesNotShortOften

Attachment Content-Type Size
v3-0001-Provide-vectored-variants-of-FileRead-and-FileWri.patch text/x-patch 5.2 KB
v3-0002-Provide-vectored-variants-of-smgrread-and-smgrwri.patch text/x-patch 21.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-29 19:41:42 Re: remaining sql/json patches
Previous Message Andres Freund 2023-11-29 19:39:20 Re: Fix some memory leaks in ecpg.addons