Re: Automatically sizing the IO worker pool

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Automatically sizing the IO worker pool
Date: 2026-04-08 02:09:16
Message-ID: CA+hUKGK=vELXFXNj2L=vTkof6s_EQzTjYXXrUVwOOW0rahEfVg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 8, 2026 at 12:30 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2026-04-08 11:18:51 +1200, Thomas Munro wrote:
> > > /* Choose one worker to wake for this batch. */
> > > if (worker == -1)
> > > worker = pgaio_worker_choose_idle(-1);
> >
> > Well I didn't want to wake a worker if we'd failed to enqueue
> > anything.
>
> I think it's worth waking up workers if there are idle ones and the queue is
> full?

True, but I prefer to test nsync because there is another reason to break:

commit 29a0fb215779d10fae0cbeb8ce57805f244bad9b
Author: Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
Date: Wed Mar 11 12:11:04 2026 +0100

Conditional locking in pgaio_worker_submit_internal

I haven't finished digesting that commit, and will follow up shortly
on that topic once this patch is in.

> I suspect the primary reasonis that pgaio_worker_request_grow() is triggered
> even when io_worker_control->nworkers is >= io_max_workers.

Yeah. V6 already addressed that directly.

> I suspect there's also pingpong between submission not finding any workers
> idle, requesting growth, and workers being idle for a short period, then the
> same thing starting again.
>
> Seems like there should be two fields. One saying "notify postmaster again"
> and one "postmaster start a worker". The former would only be cleared by
> postmaster after the timeout.

Good idea. V7 has two tweaks:

* separate grow and grow_signal_sent flags, as you suggested
* it also applies the io_worker_launch_delay to cancelled grow requests

This seems to work pretty well for avoiding useless postmaster
wakeups. You get a few due to cancelled grow requests, but not more
frequently than than io_worker_launch_delay allows, while the pool is
vacillating during workload changes. It soon makes its mind up and
stabilises on a good size. To be clear, there is no change in overall
effect, only a reduction in useless wakeups.

I retested the value of request cancellation. If you comment that
call out, we do tend to overshoot, so I think it's worth having. But
you were quite right to complain about the postmaster wakeup rate it
produced.

> > Our goal is simple: process every IO immediately. We have immediate
> > feedback that is simple: there's an IO in the queue and there is no
> > idle worker. The only action we can take is simple: add one more
> > worker. So we don't need to suffer through the maths required to
> > figure out the ideal k for our M/G/k queue system (I think that's what
> > we have?) or any of the inputs that would require*. The problem is
> > that on its own, the test triggered far too easily because a worker
> > that is not marked idle might in fact be just about to pick up that IO
>
> Is that case really concerning? As long as you have some rate limiting about
> the start rate, starting another worker when there are no idle workers seems
> harmless? Afaict it's fairly self limiting.

I retested without the depth test and I continue to think we need it.
Without it, the pool overshoots by quite a lot. You should be able to
set io_max_workers=32 without fear of creating a ton of useless worker
processes no matter what your workload.

> > on the one the one hand, and because there might be rare
> > spikes/clustering on the other, so I cooled it off a bit by
> > additionally testing if the queue appears to be growing or spiking
> > beyond some threshold. I think it's OK to let the queue grow a bit
> > before we are triggered anyway, so the precise value used doesn't seem
> > too critical. Someone might be able to come up with a more defensible
> > value, but in the end I just wanted a value that isn't triggered by
> > the outliers I see in real systems that are keeping up. We could tune
> > it lower and overshoot more, but this setting seems to work pretty
> > well. It doesn't seem likely that a real system could achieve a
> > steady state that is introducing latency but isn't increasing over
> > time, and pool size adjustments are bound to lag anyway.
>
> Yea, I don't think the precise logic matters that much as long as we ramp up
> reasonably fast without being crazy and ramp up a bit faster.

Cool.

Attachment Content-Type Size
v7-0001-aio-Adjust-I-O-worker-pool-size-automatically.patch text/x-patch 47.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2026-04-08 02:09:18 RE: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Previous Message David G. Johnston 2026-04-08 01:46:17 Re: updates for handling optional argument in system functions