Re: Blocking I/O, async I/O and io_uring

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Subject: Re: Blocking I/O, async I/O and io_uring
Date: 2020-12-08 03:27:34
Message-ID: CA+hUKGKwArCVb=rdv252yX0GrzkiS+vw7ExAjK7O0bJDUkfzJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 8, 2020 at 3:56 PM Craig Ringer
<craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> I thought I'd start the discussion on this and see where we can go with it. What incremental steps can be done to move us toward parallelisable I/O without having to redesign everything?
>
> I'm thinking that redo is probably a good first candidate. It doesn't depend on the guts of the executor. It is much less sensitive to ordering between operations in shmem and on disk since it runs in the startup process. And it hurts REALLY BADLY from its single-threaded blocking approach to I/O - as shown by an extension written by 2ndQuadrant that can double redo performance by doing read-ahead on btree pages that will soon be needed.

About the redo suggestion: https://commitfest.postgresql.org/31/2410/
does exactly that! It currently uses POSIX_FADV_WILLNEED because
that's what PrefetchSharedBuffer() does, but when combined with a
"real AIO" patch set (see earlier threads and conference talks on this
by Andres) and a few small tweaks to control batching of I/O
submissions, it does exactly what you're describing. I tried to keep
the WAL prefetcher project entirely disentangled from the core AIO
work, though, hence the "poor man's AIO" for now.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-12-08 03:32:37 Re: vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.
Previous Message tsunakawa.takay@fujitsu.com 2020-12-08 03:09:50 RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently