Re: Should io_method=worker remain the default?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Should io_method=worker remain the default?
Date: 2025-09-05 20:25:49
Message-ID: a81f2f7ef34afc24a89c613671ea017e3651329c.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2025-09-03 at 11:55 -0400, Andres Freund wrote:
> I think the regression is not due to anything inherent to worker, but
> due to
> pressure on AioWorkerSubmissionQueueLock - at least that's what I'm
> seeing on
> a older two socket machine. It's possible the bottleneck is different
> on a
> newer machine (my newer workstation is busy on another benchmark rn).

I believe what's happening is that parallelism of the IO completion
work (e.g. checksum verification) is reduced. In worker mode, the
completion work is happening on the io workers (of which there are 3);
while in sync mode the completion work is happening in the backends (of
which there are 32).

There may be lock contention too, but I don't think that's the primary
issue.

I attached a test patch for illustration. It simplifies the code inside
the LWLock to enqueue/dequeue only, and simplifies and reduces the
wakeups by doing pseudo-random wakeups only when enqueuing. Reducing
the wakeups should reduce the number of signals generated without
hurting my case, because the workers are never idle. And reducing the
instructions while holding the LWLock should reduce lock contention.
But the patch barely makes a difference: still around 24tps.

What *does* make a difference is changing io_worker_queue_size. A lower
value of 16 effectively starves the workers of work to do, and I get a
speedup to about 28tps. A higher value of 512 gives the workers more
chance to issue the IOs -- and more responsibility to complete them --
and it drops to 17tps. Furthermore, while the test is running, the io
workers are constantly at 100% (mostly verifying checksums) and the
backends are at 50% (20% when io_worker_queue_size=512).

As an aside, I'm building with meson using -Dc_args="-msse4.2 -Wtype-
limits -Werror=missing-braces". But I notice that the meson build
doesn't seem to use -funroll-loops or -ftree-vectorize when building
checksums.c. Is that intentional? If not, perhaps slower checksum
calculations explain my results.

Regards,
Jeff Davis

Attachment Content-Type Size
test-aio.patch text/x-patch 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikhail Kot 2025-09-05 20:46:55 Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c
Previous Message Robert Haas 2025-09-05 20:19:12 Re: RFC: extensible planner state