From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: aio/README.md comments |
Date: | 2025-08-30 16:20:39 |
Message-ID: | uebw3wuq3iudyx7xjgfqt7icqrtk4xv22cmwjittcy4s3rsaj2@d6sf52qwppbe |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-08-29 15:23:48 -0700, Jeff Davis wrote:
> On Fri, 2025-08-29 at 12:32 -0400, Andres Freund wrote:
> > I don't really see an advantage of sync in those cases either.
>
> It seems a bit early to say that it's just there for debugging. But
> it's just in a README, so I won't argue the point.
There might be some regressions that make io_method=sync beneficial, but short
to medium term, the goal ought to be to make all non-ridiculous configurations
(I don't care about AIO performing well with s_b=16) to not regress
meaningfully and for most things to be the same or better with AIO.
I don't see any reason for io_method=sync to be something we should have for
anything other than debugging medium to long term.
Why do you think different?
> diff --git a/src/backend/storage/aio/README.md b/src/backend/storage/aio/README.md
> index 72ae3b3737d..8fa6bd6e9ca 100644
> --- a/src/backend/storage/aio/README.md
> +++ b/src/backend/storage/aio/README.md
> @@ -4,27 +4,38 @@
>
> ### Why Asynchronous IO
>
> -Until the introduction of asynchronous IO postgres relied on the operating
> -system to hide the cost of synchronous IO from postgres. While this worked
> -surprisingly well in a lot of workloads, it does not do as good a job on
> -prefetching and controlled writeback as we would like.
> -
> -There are important expensive operations like `fdatasync()` where the operating
> -system cannot hide the storage latency. This is particularly important for WAL
> -writes, where the ability to asynchronously issue `fdatasync()` or O_DSYNC
> -writes can yield significantly higher throughput.
I think this second paragraph was important and your rewrite largely removed
it?
> +Postgres depends on IO operations happening asynchronously for reasonable
> +performance: for instance, a sequential scan would be far slower without the
> +benefit of readahead. Historically, Postgres only used synchronous APIs for
> +IO, while assuming that the operating system would use the kernel buffer cache
> +to make those operations asynchronous in most cases (aside from, e.g.,
> +`fdatasync()`).
> +
> +The asynchronous IO APIs described here do not depend on that
> +assumption. Instead, they allow different low-level IO methods, which are
> +given more control and therefore rely less on the kernel's
> +behavior. Currently, only async read operations are supported, but the
> +infrastructure is designed to support async write operations in the future.
The infrastructure supports writes today, it's just md.c and bufmgr.c isn't
aren't ready to use it today.
> ### Why Direct / unbuffered IO
>
> The main reasons to want to use Direct IO are:
>
> -- Lower CPU usage / higher throughput. Particularly on modern storage buffered
> - writes are bottlenecked by the operating system having to copy data from the
> - kernel's page cache to postgres buffer pool using the CPU. Whereas direct IO
> - can often move the data directly between the storage devices and postgres'
> - buffer cache, using DMA. While that transfer is ongoing, the CPU is free to
> - perform other work.
> +- Avoid extra memory copies between the kernel buffer cache and Postgres
> + shared buffers. These memory copies can become the bottleneck when the
> + underlying storage has high enough throughput, which is common for
> + solid-state drives or fast network block devices. Instead, direct IO can
> + often move the data directly between the Postgres buffer cache and the
> + device by using DMA, leaving the CPU free to perform other work.
> - Reduced latency - Direct IO can have substantially lower latency than
> buffered IO, which can be impactful for OLTP workloads bottlenecked by WAL
> write latency.
I preferred the prior formulation that had the main reasons at the start of
the bullet points.
> @@ -37,11 +48,24 @@ The main reasons *not* to use Direct IO are:
>
> - Without AIO, Direct IO is unusably slow for most purposes.
> - Even with AIO, many parts of postgres need to be modified to perform
> - explicit prefetching.
> + explicit prefetching (see read_stream.c).
> - In situations where shared_buffers cannot be set appropriately large,
> e.g. because there are many different postgres instances hosted on shared
> hardware, performance will often be worse than when using buffered IO.
Ok, although perhaps better to refer to the read stream section at the bottom?
> +### Writing WAL
> +
> +Using AIO and Direct IO can reduce the overhead of WAL logging
> +substantially:
> +
> +- AIO allows to start WAL writes eagerly, so they complete before needing to
> + wait
> +- AIO allows to have multiple WAL flushes in progress at the same time
> +- Direct IO can reduce the number of roundtrips to storage on some OSs
> + and storage HW (buffered IO and direct IO without O_DSYNC needs to
> + issue a write and after the write's completion a cache flush,
> + whereas O\_DIRECT + O\_DSYNC can use a single Force Unit Access
> + (FUA) write).
> ## AIO Usage Example
>
> @@ -196,25 +220,15 @@ processing to the AIO workers).
>
> ### IO can be started in critical sections
>
> -Using AIO for WAL writes can reduce the overhead of WAL logging substantially:
>
> -- AIO allows to start WAL writes eagerly, so they complete before needing to
> - wait
> -- AIO allows to have multiple WAL flushes in progress at the same time
> -- AIO makes it more realistic to use O\_DIRECT + O\_DSYNC, which can reduce
> - the number of roundtrips to storage on some OSs and storage HW (buffered IO
> - and direct IO without O_DSYNC needs to issue a write and after the write's
> - completion a cache flush, whereas O\_DIRECT + O\_DSYNC can use a single
> - Force Unit Access (FUA) write).
Direct IO alone does not reduce the number of roundtrips, the combination of
DIO and O_DSYNC does. I think that got less clear in the rewrite.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2025-08-30 16:25:48 | Re: Improve LWLock tranche name visibility across backends |
Previous Message | Andres Freund | 2025-08-30 14:54:46 | Re: [PATCH] meson: Update meson to enable building postgres as a subproject |