Should io_method=worker remain the default?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Should io_method=worker remain the default?
Date: 2025-09-03 06:47:48
Message-ID: d68e2a4f8c356107e5167408ad80eaa2fac0f57d.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Has there already been a discussion about leaving the default as
io_method=worker? There was an Open Item for this, which was closed as
"Won't Fix", but the links don't explain why as far as I can see.

I tested a concurrent scan-heavy workload (see below) where the data
fits in memory, and "worker" seems to be 30% slower than "sync" with
default settings.

I'm not suggesting that AIO overall is slow -- on the contrary, I'm
excited about AIO. But if it regresses in some cases, we should make a
conscious choice about the default and what kind of tuning advice needs
to be offered.

I briefly tried tuning to see if a different io_workers value would
solve the problem, but no luck.

The good news is that io_uring seemed to solve the problem.
Unfortunately, that's platform-specific, so it can't be the default. I
didn't dig in very much, but it seemed to be at least as good as "sync"
mode for this workload.

Regards,
Jeff Davis

Test summary: 32 connections each perform repeated sequential scans.
Each connection scans a different 1GB partition of the same table. I
used partitioning and a predicate to make it easier to script in
pgbench.

Test details:

Machine:
AMD Ryzen 9 9950X 16-Core Processor
64GB RAM
Local storage, NVMe SSD
Ubuntu 24.04 (Linux 6.11, liburing 2.5)

Note: the storage didn't matter much, because the data fits in
memory. To get consistent results, when changing between data
directories for the 17 and 18 tests, I had to drop the filesystem cache
first to make room, then run a few scans to warm it with the data from
the right data directory.

For simplicity I disabled parallel query, but that didn't seem to have
a big effect. Everything else was set to the default.

Setup (checksums enabled):

=> create table t(sid int8, c0 int8, c1 int8, c2 int8, c3 int8, c4
int8, c5 int8, c6 int8, c7 int8) partition by range (sid);

$ (for i in `seq 0 31`; do
echo "create table t$(printf "%02d" $i) partition of t for
values from ($i) to ($((i+1)));";
done) | ./bin/psql postgres
$ (for i in `seq 0 31`; do 
echo "insert into t$(printf "%02d" $i) select $i, 0, 1, 2, 3,
4, 5, 6, 7 from generate_series(0, 10000000);";
done) | ./bin/psql postgres

=> vacuum analyze; checkpoint;

Script count.sql:

SELECT COUNT(*) FROM t WHERE sid=:client_id;

pgbench:

./bin/pgbench --dbname=postgres -M prepared -n -c 32 -T 60 \
-f count.sql

Results:

PG17:
tps = 36.209048

PG18 (io_method=sync)
tps = 34.014890

PG18 (io_method=worker io_workers=3)
tps = 23.938509

PG18 (io_method=worker io_workers=16)
tps = 16.734360

PG18 (io_method=io_uring)
tps = 35.546825

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-09-03 06:56:44 Re: SQL:2023 JSON simplified accessor support
Previous Message Andrey Borodin 2025-09-03 06:47:28 Re: VM corruption on standby