Quick Links

Re: Should we update the random_page_cost default value?

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Should we update the random_page_cost default value?
Date:	2025-10-08 16:17:50
Message-ID:	786a0ddc-f07d-418b-b4bc-5d3fd4176bf0@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/8/25 17:14, Ants Aasma wrote:
> On Mon, 6 Oct 2025 at 12:24, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> On 10/6/25 07:29, Tom Lane wrote:
>>> Another angle is that I expect that the ongoing AIO work will largely
>>> destroy the existing model altogether, at least if you think in terms
>>> of the model as trying to predict query execution time. But if what
>>> we're trying to model is net resource demands, with an eye to
>>> minimizing the total system load not execution time of any one query,
>>> maybe we can continue to work with something close to what we've
>>> traditionally done.
>>>
>>> No answers here, just more questions ...
>>>
>>
>> I had the same thought, when working on the (index) prefetching. Which
>> of course now relies on AIO. Without concurrency, there wasn't much
>> difference between optimizing for resources and time, but AIO changes
>> that. In fact, parallel query has a similar effect, because it also
>> spreads the work to multiple concurrent processes.
>>
>> Parallel query simply divides the cost between workers, as if each use a
>> fraction of resources. And the cost of the parallel plan is lower than
>> summing up the per-worker costs. Maybe AIO should do something similar?
>> That is, estimate the I/O concurrency and lower the cost a bit?
>
> OS and/or disk read-ahead is muddying the water here. Most modern
> storage can saturate their bandwidth capability with enough concurrent
> or large enough requests. The read-ahead is effectively increasing
> request concurrency behind PostgreSQLs back while random is running
> with concurrency 1. It would be very interesting to see what
> debug_io_direct does, and also fio numbers for direct io.
>

I think I've done some of the runs with direct I/O (e.g. the Azure runs
were doing that), and the conclusions were mostly the same. I did a
couple runs on the other machines, but I don't have results that I could
present. I'll try to get some, maybe it'll be different.

But even if this was due to read-ahead, why shouldn't that be reflected
in the costs? Surely that can be considered as I/O cost.

FWIW this is not just about read-ahead done by the kernel. The devices
have this kind of heuristics too [1], in which case it's going to affect
direct I/O too.

[1] https://vondra.me/posts/fun-and-weirdness-with-ssds/

> It seems to me too that the increased capability to utilize I/O
> concurrency from AIO significantly changes how this needs to be
> modeled. In addition to random/sequential distinction there is now
> also prefetchable/non-predictable distinction. And it would be good to
> incorporate some cost to "wasting" resources. If we would apply
> effective_io_concurrency blindly, then scanning 1600 predictable pages
> would be cheaper than 100 unpredictable. And it would be correct that
> it is faster, but maybe not by enough to justify the extra resource
> usage.
>
> I think Laurenz was on the right track with introducing a tunable that
> allows to slide between time and resource use optimization. It doesn't
> have to be all or nothing, like for 2x faster execution, one could
> allow 10% or 2x or any amount of extra resources to be used.
>
> But for that to have a chance of working reasonably the cost model
> needs to be better. Right now the per page costs effectively encode
> expected cache hit ratios, io cost vs page access cpu cost in a very
> coarse and unintuitive way. I don't think the actual buffers read
> number is anywhere close to how many times the planner accounts
> page_cost for anything but a sequential scan of a huge table.
>

True. The challenge to improve this without making the whole cost model
so complex it's effectively impossible to get right.

regards

--
Tomas Vondra

In response to

Re: Should we update the random_page_cost default value? at 2025-10-08 15:14:20 from Ants Aasma

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2025-10-08 16:24:33	Re: Should we update the random_page_cost default value?
Previous Message	Álvaro Herrera	2025-10-08 16:13:54	Re: New commitfest app release on August 19th