Re: pg_preadv() and pg_pwritev()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_preadv() and pg_pwritev()
Date: 2020-12-22 11:06:50
Message-ID: CA+hUKG+TCoareSYcAu05yaMh9eUck6Hd8FQZt5VEvotNs3YOMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 21, 2020 at 11:40 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2020-12-20 16:26:42 +1300, Thomas Munro wrote:
> > > 1. port.h cannot assume that <limits.h> has already been included;
> > > nor do I want to fix that by including <limits.h> there. Do we
> > > really need to define a fallback value of IOV_MAX? If so,
> > > maybe the answer is to put the replacement struct iovec and
> > > IOV_MAX in some new header.
> >
> > Ok, I moved all this stuff into port/pg_uio.h.
>
> Can we come up with a better name than 'uio'? I find that a not exactly
> meaningful name.

Ok, let's try port/pg_iovec.h.

> Or perhaps we could just leave the functions in port/port.h, but extract
> the value of the define at configure time? That way only pread.c /
> pwrite.c would need it.

That won't work for the struct definition, so client code would need
to remember to do:

#ifdef HAVE_SYS_UIO_H
#include <sys/uio.h>
#endif

... which is a little tedious, or port.h would need to do that and
incur an overhead in every translation unit, which Tom objected to.
That's why I liked the separate header idea.

> > > 3. The patch as given won't prove anything except that the code
> > > compiles. Is it worth fixing at least one code path to make
> > > use of pg_preadv and pg_pwritev, so we can make sure this code
> > > is tested before there's layers of other new code on top?
> >
> > OK, here's a patch to zero-fill fresh WAL segments with pwritev().
>
> I think that's a good idea. However, I see two small issues: 1) If we
> write larger amounts at once, we need to handle partial writes. That's a
> large enough amount of IO that it's much more likely to hit a memory
> shortage or such in the kernel - we had to do that a while a go for WAL
> writes (which can also be larger), if memory serves.
>
> Perhaps we should have pg_pwritev/readv unconditionally go through
> pwrite.c/pread.c and add support for "continuing" partial writes/reads
> in one central place?

Ok, here's a new version with the following changes:

1. Define PG_IOV_MAX, a reasonable size for local variables, not
larger than IOV_MAX.
2 Use 32 rather than 1024, based on off-list complaint about 1024
potentially swamping the IO system unfairly.
3. Add a wrapper pg_pwritev_retry() to retry automatically on short
writes. (I didn't write pg_preadv_retry(), because I don't currently
need it for anything so I didn't want to have to think about EOF vs
short-reads-for-implementation-reasons.)
4. I considered whether pg_pwrite() should have built-in retry
instead of a separate wrapper, but I thought of an argument against
hiding the "raw" version: the AIO patch set already understands short
reads/writes and knows how to retry at a higher level, as that's
needed for native AIO too, so I think it makes sense to be able to
keep things the same and not solve the same problem twice. A counter
argument would be that you could get the retry underway sooner with a
tight loop, but I'm not expecting this to be common.

> > I'm drawing a blank on trivial candidate uses for preadv(), without
> > infrastructure from later patches.
>
> Can't immediately think of something either.

One idea I had for the future is for xlogreader.c to read the WAL into
a large multi-page circular buffer, which could wrap around using a
pair of iovecs, but that requires lots more work .

Attachment Content-Type Size
v3-0001-Add-pg_preadv-and-pg_pwritev.patch text/x-patch 14.9 KB
v3-0002-Use-vectored-I-O-to-zero-WAL-segments.patch text/x-patch 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2020-12-22 11:13:12 Re: Single transaction in the tablesync worker?
Previous Message Bharath Rupireddy 2020-12-22 10:32:24 Re: postgres_fdw - cached connection leaks if the associated user mapping/foreign server is dropped