Re: Should we update the random_page_cost default value?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should we update the random_page_cost default value?
Date: 2025-10-07 15:32:24
Message-ID: aqz34bqmh6v6r6bplgflid3buhdkv45dkkbx6y6gq34dx4gp42@rcz2est27arz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-10-07 16:23:36 +0200, Tomas Vondra wrote:
> On 10/7/25 14:08, Tomas Vondra wrote:
> > ...
> >>>>>> I think doing this kind of measurement via normal SQL query processing is
> >>>>>> almost always going to have too much other influences. I'd measure using fio
> >>>>>> or such instead. It'd be interesting to see fio numbers for your disks...
> >>>>>>
> >>>>>> fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1 --runtime=5 --ioengine pvsync --iodepth 1
> >>>>>> vs --rw randread
> >>>>>>
> >>>>>> gives me 51k/11k for sequential/rand on one SSD and 92k/8.7k for another.
> >>>>>>
> >>>>>
> >>>>> I can give it a try. But do we really want to strip "our" overhead with
> >>>>> reading data?
> >
> > I got this on the two RAID devices (NVMe and SATA):
> >
> > NVMe: 83.5k / 15.8k
> > SATA: 28.6k / 8.5k
> >
> > So the same ballpark / ratio as your test. Not surprising, really.
> >
>
> FWIW I do see about this number in iostat. There's a 500M test running
> right now, and iostat reports this:
>
> Device r/s rkB/s ... rareq-sz ... %util
> md1 15273.10 143512.80 ... 9.40 ... 93.64
>
> So it's not like we're issuing far fewer I/Os than the SSD can handle.

Not really related to this thread:

IME iostat's utilization is pretty much useless for anything other than "is
something happening at all", and even that is not reliable. I don't know the
full reason for it, but I long learned to just discount it.

I ran
fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1 --runtime=100 --ioengine pvsync --iodepth 1 --rate_iops=40000

a few times in a row, while watching iostat. Sometimes utilization is 100%,
sometimes it's 0.2%. Whereas if I run without rate limiting, utilization
never goes above 71%, despite doing more iops.

And then gets completely useless if you use a deeper iodepth, because there's
just not a good way to compute something like a utilization number once
you take parallel IO processing into account.

fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1 --runtime=100 --ioengine io_uring --iodepth 1 --rw randread
iodepth util iops
1 94% 9.3k
2 99.6% 18.4k
4 100% 35.9k
8 100% 68.0k
16 100% 123k

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2025-10-07 15:37:54 Re: Should we update the random_page_cost default value?
Previous Message Tomas Vondra 2025-10-07 15:12:25 Re: Should we update the random_page_cost default value?