Re: Should we update the random_page_cost default value?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should we update the random_page_cost default value?
Date: 2025-10-10 17:22:52
Message-ID: a3ac3e07-0150-4319-a69b-aa367ddf67a5@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/8/25 18:17, Tomas Vondra wrote:
>
>
> On 10/8/25 17:14, Ants Aasma wrote:
>> On Mon, 6 Oct 2025 at 12:24, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>> On 10/6/25 07:29, Tom Lane wrote:
>>>> Another angle is that I expect that the ongoing AIO work will largely
>>>> destroy the existing model altogether, at least if you think in terms
>>>> of the model as trying to predict query execution time. But if what
>>>> we're trying to model is net resource demands, with an eye to
>>>> minimizing the total system load not execution time of any one query,
>>>> maybe we can continue to work with something close to what we've
>>>> traditionally done.
>>>>
>>>> No answers here, just more questions ...
>>>>
>>>
>>> I had the same thought, when working on the (index) prefetching. Which
>>> of course now relies on AIO. Without concurrency, there wasn't much
>>> difference between optimizing for resources and time, but AIO changes
>>> that. In fact, parallel query has a similar effect, because it also
>>> spreads the work to multiple concurrent processes.
>>>
>>> Parallel query simply divides the cost between workers, as if each use a
>>> fraction of resources. And the cost of the parallel plan is lower than
>>> summing up the per-worker costs. Maybe AIO should do something similar?
>>> That is, estimate the I/O concurrency and lower the cost a bit?
>>
>> OS and/or disk read-ahead is muddying the water here. Most modern
>> storage can saturate their bandwidth capability with enough concurrent
>> or large enough requests. The read-ahead is effectively increasing
>> request concurrency behind PostgreSQLs back while random is running
>> with concurrency 1. It would be very interesting to see what
>> debug_io_direct does, and also fio numbers for direct io.
>>
>
> I think I've done some of the runs with direct I/O (e.g. the Azure runs
> were doing that), and the conclusions were mostly the same. I did a
> couple runs on the other machines, but I don't have results that I could
> present. I'll try to get some, maybe it'll be different.
>

Here are results from with debug_io_direct (and shared_buffers=256MB in
both cases), from the single NVMe device.

seqscan (s) index scan (s) random_page_cost
-----------------------------------------------------------------
buffered I/O 115 40404 16.6
direct I/O 108 53053 23.4

I believe the difference is mostly due to page cache - with 182GB data
on 64GB RAM, that's about 30% cache hit ratio, give or take. And the
buffered runs are about 25% faster - not exactly 30%, but close. Also,
funnily enough, the seqscans are faster with direct I/O (so without
kernel read-ahead).

It's just one run, of course. But the results seem reasonable.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-10-10 17:31:13 Re: another autovacuum scheduling thread
Previous Message Ranier Vilela 2025-10-10 16:41:48 Re: Fix array access (src/bin/pg_dump/pg_dump.c)