From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Cc: | Antonin Houska <ah(at)cybertec(dot)at>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Subject: | Re: AIO v2.5 |
Date: | 2025-07-10 19:29:33 |
Message-ID: | 24mblqwtpzwncjcmfoqhpyuwzcejrnnyddska3h2z6fmmkh5t2@gldyx4346n3y |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote:
> On Wed, 9 Jul 2025 at 16:59, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > 3. I noticed that there is AIO code for writev-related operations
> > > (specifically, pgaio_io_start_writev is exposed, as is
> > > PGAIO_OP_WRITEV), but no practical way to excercise that code: it's
> > > not called from anywhere in the project, and there is no way for
> > > extensions to register the relevant callbacks required to make writev
> > > work well on buffered contents. Is that intentional?
> >
> > Yes. We obviously do want to support writes eventually, and it didn't seem
> > useful to not have the most basic code for writes in the AIO infrastructure.
> >
> > You could still use it to e.g. write out temporary file data or such.
>
> Yes, though IIUC that would require an implementation of at least
> PgAioTargetInfo for such a use case (it's definitely not a SMGR
> target), which currently isn't available and can't be registered
> dynamically by an extension. Or maybe did I miss something?
I can see some hacky ways around that, but they're just that, hacky...
> (PS. I'm not quite 100% sure that it is impossible to use, just that
> there are rather few handles available for using this part of the new
> tool, and it seems completely untested in the PG18 branch)
I'm not saying it's 100% ready to use without modifying core code, but for
something that's like 30 lines of code, as part of a considerably larger
subystem, I just don't see a problem with writev not yet being covered. It's
just incremental development.
> -----
>
> Something else I've just noticed is the use of int32 in
> PgAIOHandle->result. In sync and worker mode, pg_preadv and pg_pwritev
> return ssize_t, which most modern systems can't fit in int32 (the
> output was int before, then size_t, then ssize_t: [0]).
I don't think there's anything that can actually do IO that's large enough to
be problematic. What's the potential scenario where you'd want to read/write
more than 3GB of data within one syscall? That just doesn't seem to make
sense.
> While not directly an issue in default PG18 due to the use of 1GB relation
> segments capping the max IO size for SMGR-managed IOs (and various other
> code-level constraints), this may have more issues when an extension starts
> bulk-reading data on a system compiled with RELSEG_SIZE >= 2GB; I can't find
> any protective checks against overflows in downcasting the IO result.
I don't think the relation size is relevant piece here, it's just that it
doesn't make sense (and likely isn't possible) to read that much data at once.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Mityugov | 2025-07-10 20:00:00 | patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings |
Previous Message | Dean Rasheed | 2025-07-10 19:14:58 | Re: Improving and extending int128.h to more of numeric.c |