From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Should io_method=worker remain the default? |
Date: | 2025-09-03 08:34:47 |
Message-ID: | be4c26b3-2805-47f2-862b-271eddfd8994@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9/3/25 08:47, Jeff Davis wrote:
>
> Has there already been a discussion about leaving the default as
> io_method=worker? There was an Open Item for this, which was closed as
> "Won't Fix", but the links don't explain why as far as I can see.
>
There was a discussion in the first thread mentioned in the commit
message, including some test results:
https://www.postgresql.org/message-id/e6db33f3-50de-43d3-9d9f-747c3b376e80%40vondra.me
There were no proposals to change the default (from worker), so the
conclusion was to stick with the current default. It wasn't quite clear
who's expected to make the proposal / final decision, though.
> I tested a concurrent scan-heavy workload (see below) where the data
> fits in memory, and "worker" seems to be 30% slower than "sync" with
> default settings.
>
I think this could be due to the "IPC overhead" discussed in the index
prefetch thread recently:
https://www.postgresql.org/message-id/1c9302da-c834-4773-a527-1c1a7029c5a3%40vondra.me
> I'm not suggesting that AIO overall is slow -- on the contrary, I'm
> excited about AIO. But if it regresses in some cases, we should make a
> conscious choice about the default and what kind of tuning advice needs
> to be offered.
>
> I briefly tried tuning to see if a different io_workers value would
> solve the problem, but no luck.
>
Right, that's what I saw too. It simply boils down to how many signals
can a single process send/receive. I'd bet if you try the signal-echo
thing, it's be close to the throughput with io_method=worker.
> The good news is that io_uring seemed to solve the problem.
> Unfortunately, that's platform-specific, so it can't be the default. I
> didn't dig in very much, but it seemed to be at least as good as "sync"
> mode for this workload.
>
AFAICS that matches what we observed in the index prefetch thread.
>
> Test summary: 32 connections each perform repeated sequential scans.
> Each connection scans a different 1GB partition of the same table. I
> used partitioning and a predicate to make it easier to script in
> pgbench.
I'll try to reproduce this, but if it's due to the same IPC overhead,
that would be surprising (for me). In the index case it makes sense,
because the reads are random enough to prevent I/O combining. But for a
sequential workload I'd expect I/O combining to help. Could it be that
it ends up evicting buffers randomly, which (I guess) might interfere
with the combining? What's shared_buffers set to? Have you watched how
large the I/O requests are? iostat, iosnoop or strace would tell you.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-09-03 08:37:52 | Re: Generate GUC tables from .dat file |
Previous Message | jian he | 2025-09-03 08:30:54 | fix NOT VALID NOT NULL with ALTER COLUMN SET IDENTITY |