| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | Samuel Thibault <samuel(dot)thibault(at)gnu(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: GNU/Hurd portability patches |
| Date: | 2025-11-18 05:32:38 |
| Message-ID: | CA+hUKGKD0EOviHPG_W5gBZ+xiRFktvn=tBebrSBf8oQDecKnwA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Nov 18, 2025 at 12:31 PM Samuel Thibault
<samuel(dot)thibault(at)gnu(dot)org> wrote:
> On Mon, Nov 17, 2025 at 03:59:30PM +1300, Thomas Munro wrote:
> > . o O { An absurdly far-fetched thought while browsing glibc/hurd glue
> > code: if synchronous I/O is implemented as RPC on Mach ports, could
> > that mean that it's technically possible to submit now and consume
> > results later, for asynchronous I/O?
>
> Yes, it is completely possible.
Neat!
> > Possibly too private/undocumented anyway,
>
> It's not really documented much, but it's completely public. One
> can include <hurd/io_request.h> and call e.g. io_read_request(port,
> reply_port, offset, amount). One then has to run a msgserver loop on the
> reply_port to get the reply messages. An example can be seen in the hurd
> source in trans/streamio.c, for e.g. device_open_request() calls.
OK, to continue the thought experiment... someone could invent write
io_method=hurd, and it'd have to be more efficient than handing the
work off to I/O worker processes (what you get with the default
io_method=worker), since the worker process clearly has to do exactly
the same thing internally in a synchronous wrapper function anyway,
just with extra steps to reach it. At a guess, it could follow
io_method=io_uring's general design and have a reply port owned by
each backend (= process), and backends would almost always consume
replies from their own reply port. They'd need to be able to consume
from each other's reply port occasionally, but I assume that's
possible with an exclusive lock and a temporary transfer of receive
rights. Every process would have to receive duplicates of the full
set of ports after fork(), but at least that problem would go away in
an in-development multithreaded mode.
I doubt it'd be much good without a readv/writev operations, though.
It looks they aren't in io_request.defs yet? Does that also imply
that preadv() has to loop over the vectors sending tons of messages
and waiting for replies?
Standard POSIX AIO also lacks vectored I/O. It lacks many, many other
things one might want (though serious implementations in the old
commercial Unixen added unknown incompatible extensions negotiated
with database vendors, including reply ports), but scatter/gather
seems pretty fundamental for database buffer pool implementations:
we'd have to call aio_read()/aio_write() 16, 32 times when we could
just ask a helper process to call preadv() once (assuming it's really
one operation), to transfer a contiguous blocks range to/from
discontiguous buffers. Databases want to do that a lot. When
combined with direct I/O, that's actual IOPS out the window, but even
for buffered I/O it's a very high overhead for straight-line I/O. For
that reason we don't actually support pgaio implementations that don't
have readv/writev currently. When we tried it we had to inhibit I/O
combining at higher levels and it wasn't good.
(And then to get more and more pie-in-the-sky: (1) O_DIRECT is highly
desirable for zero-copy DMA to/from a user space buffer pool, (2)
starting more than one I/O with a single context switch and likewise
for consuming replies, (3) registering/locking memory pages and
descriptors with a port so they don't have to be pinned/unpinned by
the I/O subsystem all the time. And then, if Hurd works the way I
think it might, (4) to avoid chains of pipe-like scheduling overheads
when starting a direct I/O and maybe also some already-cached buffered
I/O, you'd ideally want ports to have a "fast" send path that behaves
like the old Spring/Solaris doors, where the caller's thread would
yield directly to a thread in the receiving server, forming a chain:
database -> file system -> driver -> device that is sort of
synchronous and then returns control, like a kind of dual of a system
call that reaches through the chain of user space service, and
presumably the same sort of thing on the way back from the interrupt
handler on completion. Idea (4) might well be Hurd/Mach heresy for
all I know, being totally out of the loop on this stuff; or perhaps
you already have something like that...)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2025-11-18 05:39:04 | Re: Extended Statistics set/restore/clear functions. |
| Previous Message | Peter Smith | 2025-11-18 05:31:12 | Re: CREATE/ALTER PUBLICATION improvements for syntax synopsis |