| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Automatically sizing the IO worker pool |
| Date: | 2026-04-08 02:09:16 |
| Message-ID: | CA+hUKGK=vELXFXNj2L=vTkof6s_EQzTjYXXrUVwOOW0rahEfVg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 8, 2026 at 12:30 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2026-04-08 11:18:51 +1200, Thomas Munro wrote:
> > > /* Choose one worker to wake for this batch. */
> > > if (worker == -1)
> > > worker = pgaio_worker_choose_idle(-1);
> >
> > Well I didn't want to wake a worker if we'd failed to enqueue
> > anything.
>
> I think it's worth waking up workers if there are idle ones and the queue is
> full?
True, but I prefer to test nsync because there is another reason to break:
commit 29a0fb215779d10fae0cbeb8ce57805f244bad9b
Author: Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
Date: Wed Mar 11 12:11:04 2026 +0100
Conditional locking in pgaio_worker_submit_internal
I haven't finished digesting that commit, and will follow up shortly
on that topic once this patch is in.
> I suspect the primary reasonis that pgaio_worker_request_grow() is triggered
> even when io_worker_control->nworkers is >= io_max_workers.
Yeah. V6 already addressed that directly.
> I suspect there's also pingpong between submission not finding any workers
> idle, requesting growth, and workers being idle for a short period, then the
> same thing starting again.
>
> Seems like there should be two fields. One saying "notify postmaster again"
> and one "postmaster start a worker". The former would only be cleared by
> postmaster after the timeout.
Good idea. V7 has two tweaks:
* separate grow and grow_signal_sent flags, as you suggested
* it also applies the io_worker_launch_delay to cancelled grow requests
This seems to work pretty well for avoiding useless postmaster
wakeups. You get a few due to cancelled grow requests, but not more
frequently than than io_worker_launch_delay allows, while the pool is
vacillating during workload changes. It soon makes its mind up and
stabilises on a good size. To be clear, there is no change in overall
effect, only a reduction in useless wakeups.
I retested the value of request cancellation. If you comment that
call out, we do tend to overshoot, so I think it's worth having. But
you were quite right to complain about the postmaster wakeup rate it
produced.
> > Our goal is simple: process every IO immediately. We have immediate
> > feedback that is simple: there's an IO in the queue and there is no
> > idle worker. The only action we can take is simple: add one more
> > worker. So we don't need to suffer through the maths required to
> > figure out the ideal k for our M/G/k queue system (I think that's what
> > we have?) or any of the inputs that would require*. The problem is
> > that on its own, the test triggered far too easily because a worker
> > that is not marked idle might in fact be just about to pick up that IO
>
> Is that case really concerning? As long as you have some rate limiting about
> the start rate, starting another worker when there are no idle workers seems
> harmless? Afaict it's fairly self limiting.
I retested without the depth test and I continue to think we need it.
Without it, the pool overshoots by quite a lot. You should be able to
set io_max_workers=32 without fear of creating a ton of useless worker
processes no matter what your workload.
> > on the one the one hand, and because there might be rare
> > spikes/clustering on the other, so I cooled it off a bit by
> > additionally testing if the queue appears to be growing or spiking
> > beyond some threshold. I think it's OK to let the queue grow a bit
> > before we are triggered anyway, so the precise value used doesn't seem
> > too critical. Someone might be able to come up with a more defensible
> > value, but in the end I just wanted a value that isn't triggered by
> > the outliers I see in real systems that are keeping up. We could tune
> > it lower and overshoot more, but this setting seems to work pretty
> > well. It doesn't seem likely that a real system could achieve a
> > steady state that is introducing latency but isn't increasing over
> > time, and pool size adjustments are bound to lag anyway.
>
> Yea, I don't think the precise logic matters that much as long as we ramp up
> reasonably fast without being crazy and ramp up a bit faster.
Cool.
| Attachment | Content-Type | Size |
|---|---|---|
| v7-0001-aio-Adjust-I-O-worker-pool-size-automatically.patch | text/x-patch | 47.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zhijie Hou (Fujitsu) | 2026-04-08 02:09:18 | RE: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Previous Message | David G. Johnston | 2026-04-08 01:46:17 | Re: updates for handling optional argument in system functions |