Re: Blocking I/O, async I/O and io_uring

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Subject: Re: Blocking I/O, async I/O and io_uring
Date: 2020-12-08 12:49:04
Message-ID: b43db30b-8657-7b6d-cef0-fd5520f4b132@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/12/08 11:55, Craig Ringer wrote:
> Hi all
>
> A new kernel API called io_uring has recently come to my attention. I assume some of you (Andres?) have been following it for a while.
>
> io_uring appears to offer a way to make system calls including reads, writes, fsync()s, and more in a non-blocking, batched and pipelined manner, with or without O_DIRECT. Basically async I/O with usable buffered I/O and fsync support. It has ordering support which is really important for us.
>
> This should be on our radar. The main barriers to benefiting from linux-aio based async I/O in postgres in the past has been its reliance on direct I/O, the various kernel-version quirks, platform portability, and its maybe-async-except-when-it's-randomly-not nature.
>
> The kernel version and portability remain an issue with io_uring so it's not like this is something we can pivot over to completely. But we should probably take a closer look at it.
>
> PostgreSQL spends a huge amount of time waiting, doing nothing, for blocking I/O. If we can improve that then we could potentially realize some major increases in I/O utilization especially for bigger, less concurrent workloads. The most obvious candidates to benefit would be redo, logical apply, and bulk loading.
>
> But I have no idea how to even begin to fit this into PostgreSQL's executor pipeline. Almost all PostgreSQL's code is synchronous-blocking-imperative in nature, with a push/pull executor pipeline. It seems to have been recognised for some time that this is increasingly hurting our performance and scalability as platforms become more and more parallel.
>
> To benefit from AIO (be it POSIX, linux-aio, io_uring, Windows AIO, etc) we have to be able to dispatch I/O and do something else while we wait for the results. So we need the ability to pipeline the executor and pipeline redo.
>
> I thought I'd start the discussion on this and see where we can go with it. What incremental steps can be done to move us toward parallelisable I/O without having to redesign everything?
>
> I'm thinking that redo is probably a good first candidate. It doesn't depend on the guts of the executor. It is much less sensitive to ordering between operations in shmem and on disk since it runs in the startup process. And it hurts REALLY BADLY from its single-threaded blocking approach to I/O - as shown by an extension written by 2ndQuadrant that can double redo performance by doing read-ahead on btree pages that will soon be needed.
>
> Thoughts anybody?

I was wondering if async I/O might be helpful for the performance
improvement of walreceiver. In physical replication, walreceiver receives,
writes and fsyncs WAL data. Also it does tasks like keepalive. Since
walreceiver is a single process, for example, currently it cannot do other
tasks while fsyncing WAL to the disk.

OTOH, if walreceiver can do other tasks even while fsyncing WAL by
using async I/O, ISTM that it might improve the performance of walreceiver.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2020-12-08 12:54:10 RE: Parallel Inserts in CREATE TABLE AS
Previous Message Amit Kapila 2020-12-08 12:47:44 Re: Parallel Inserts in CREATE TABLE AS