From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: Blocking I/O, async I/O and io_uring |
Date: | 2020-12-08 06:23:19 |
Message-ID: | 20201208062319.cto7qujfahayseuv@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-12-08 13:01:38 +0800, Craig Ringer wrote:
> Have you done much bpf / systemtap / perf based work on measurement and
> tracing of latencies etc? If not that's something I'd be keen to help with.
> I've mostly been using systemtap so far but I'm trying to pivot over to
> bpf.
Not much - there's still so many low hanging fruits and architectural
things to finish that it didn't yet seem pressing.
> I've got asynchronous writing of WAL mostly working, but need to
> > redesign the locking a bit further. Right now it's a win in some cases,
> > but not others. The latter to a significant degree due to unnecessary
> > blocking....
> That's where io_uring's I/O ordering operations looked interesting. But I
> haven't looked closely enough to see if they're going to help us with I/O
> ordering in a multiprocessing architecture like postgres.
The ordering ops aren't quite powerful enough to be a huge boon
performance-wise (yet). They can cut down on syscall and intra-process
context switch overhead to some degree, but otherwise it's not different
than userspace submitting another request upon receving of a completion.
> In an ideal world we could tell the kernel about WAL-to-heap I/O
> dependencies and even let it apply WAL then heap changes out-of-order so
> long as they didn't violate any ordering constraints we specify between
> particular WAL records or between WAL writes and their corresponding heap
> blocks. But I don't know if the io_uring interface is that capable.
It's not. And that kind of dependency inferrence wouldn't be cheap on
the PG side either.
I don't think it'd help that much for WAL apply anyway. You need
read-ahead of the WAL to avoid unnecessary waits for a lot of records
anyway. And the writes during WAL are mostly pretty asynchronous (mainly
writeback during buffer replacement).
An imo considerably more interesting case is avoiding blocking on a WAL
flush when needing to write a page out in an OLTPish workload. But I can
think of more efficient ways there too.
> How feasible do you think it'd be to take it a step further and structure
> redo as a pipelined queue, where redo calls enqueue I/O operations and
> completion handlers then return immediately? Everything still goes to disk
> in the order it's enqueued, and the callbacks will be invoked in order, so
> they can update appropriate shmem state etc. Since there's no concurrency
> during redo, it should be *much* simpler than normal user backend
> operations where we have all the tight coordination of buffer management,
> WAL write ordering, PGXACT and PGPROC, the clog, etc.
I think it'd be a fairly massive increase in complexity. And I don't see
a really large payoff: Once you have real readahead in the WAL there's
really not much synchronous IO left. What am I missing?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2020-12-08 06:35:51 | Re: PG vs LLVM 12 on seawasp, next round |
Previous Message | Peter Smith | 2020-12-08 06:22:49 | Re: Single transaction in the tablesync worker? |