Re: Blocking I/O, async I/O and io_uring

From: Andres Freund <andres(at)anarazel(dot)de>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Blocking I/O, async I/O and io_uring
Date: 2020-12-08 07:04:13
Message-ID: 20201208070413.evydhusv4tnfcgjr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-12-08 04:24:44 +0000, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
> I'm looking forward to this from the async+direct I/O, since the
> throughput of some write-heavy workload decreased by half or more
> during checkpointing (due to fsync?)

Depends on why that is. The most common, I think, cause is that your WAL
volume increases drastically just after a checkpoint starts, because
initially all page modification will trigger full-page writes. There's
a significant slowdown even if you prevent the checkpointer from doing
*any* writes at that point. I got the WAL AIO stuff to the point that I
see a good bit of speedup at high WAL volumes, and I see it helping in
this scenario.

There's of course also the issue that checkpoint writes cause other IO
(including WAL writes) to slow down and, importantly, cause a lot of
jitter leading to unpredictable latencies. I've seen some good and some
bad results around this with the patch, but there's a bunch of TODOs to
resolve before delving deeper really makes sense (the IO depth control
is not good enough right now).

A third issue is that sometimes checkpointer can't really keep up - and
that I think I've seen pretty clearly addressed by the patch. I have
managed to get to ~80% of my NVMe disks top write speed (> 2.5GB/s) by
the checkpointer, and I think I know what to do for the remainder.

> Would you mind sharing any preliminary results on this if you have
> something?

I ran numbers at some point, but since then enough has changed
(including many correctness issues fixed) that they don't seem really
relevant anymore. I'll try to include some in the post I'm planning to
do in a few weeks.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-12-08 07:15:52 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message tsunakawa.takay@fujitsu.com 2020-12-08 06:43:31 RE: [Patch] Optimize dropping of relation buffers using dlist