From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Should io_method=worker remain the default? |
Date: | 2025-09-03 18:50:05 |
Message-ID: | f34fb0bbacc1eb25a04e946880251cff87ab6291.camel@j-davis.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2025-09-03 at 11:55 -0400, Andres Freund wrote:
> 32 parallel seq scans of a large relations, with default shared
> buffers, fully
> cached in the OS page cache, seems like a pretty absurd workload.
It's the default settings, and users often just keep going with the
defaults as long as it works, giving little thought to any kind of
tuning or optimization until they hit a wall. Fully cached data is
common, as are scan-heavy workloads. Calling it "absurd" is an
exaggeration.
> That's not
> to say we shouldn't spend some effort to avoid regressions for it,
> but it also
> doesn't seem to be worth focusing all that much on it.
Fair, but we should acknowledge the places where the new defaults do
better vs worse, and provide some guidance on what to look for and how
to tune it. We should also not be in too much of a rush to get rid of
"sync" mode until we have a better idea about where the tradeoffs are.
> Or is there a
> real-world scenario this actually emulating?
This test was my first try at reproducing a smaller (but still
noticeable) regression seen on a more realistic benchmark. I'm not 100%
sure whether I reproduced the same effect or a different one, but I
don't think we should dismiss it so quickly.
> *If* we actually care about this workload, we can make
> pgaio_worker_submit_internal() acquire that lock conditionally, and
> perform
> the IOs synchronously instead.
I like the idea of some kind of fallback for multiple reasons. I
noticed that if I set io_workers=1, and then I SIGSTOP that worker,
then sequential scans make no progress at all until I send SIGCONT. A
fallback to synchronous sounds more robust, and more similar to what we
do with walwriter and bgwriter. (That may be 19 material, though.)
> But I'm really not sure doing > 30GB/s of repeated reads from the
> page cache
> is a particularly useful thing to optimize.
A long time ago, the expectation was that Postgres might be running on
a machine along with other software, and perhaps many instances of
Postgres on the same machine. In that case, low shared_buffers compared
with the overall system memory makes sense, which would cause a lot of
back-and-forth into shared buffers. That was also the era of magnetic
disks, where such memory copies seemed almost free by comparison --
perhaps we just don't care about that case any more?
> If I instead just increase s_b, I get 2x the throughput...
Increase to what? I tried a number of settings. Obviously >32GB makes
it a non-issue because everything is cached. Values between 128MB and
32GB didn't seem to help, and were in some cases lower, but I didn't
look into why yet. It might have something to do with crowding out the
page cache.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2025-09-03 18:50:42 | Re: Non-reproducible AIO failure |
Previous Message | Andres Freund | 2025-09-03 18:47:25 | Re: index prefetching |