From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Should we update the random_page_cost default value? |
Date: | 2025-10-07 15:32:24 |
Message-ID: | aqz34bqmh6v6r6bplgflid3buhdkv45dkkbx6y6gq34dx4gp42@rcz2est27arz |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-10-07 16:23:36 +0200, Tomas Vondra wrote:
> On 10/7/25 14:08, Tomas Vondra wrote:
> > ...
> >>>>>> I think doing this kind of measurement via normal SQL query processing is
> >>>>>> almost always going to have too much other influences. I'd measure using fio
> >>>>>> or such instead. It'd be interesting to see fio numbers for your disks...
> >>>>>>
> >>>>>> fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1 --runtime=5 --ioengine pvsync --iodepth 1
> >>>>>> vs --rw randread
> >>>>>>
> >>>>>> gives me 51k/11k for sequential/rand on one SSD and 92k/8.7k for another.
> >>>>>>
> >>>>>
> >>>>> I can give it a try. But do we really want to strip "our" overhead with
> >>>>> reading data?
> >
> > I got this on the two RAID devices (NVMe and SATA):
> >
> > NVMe: 83.5k / 15.8k
> > SATA: 28.6k / 8.5k
> >
> > So the same ballpark / ratio as your test. Not surprising, really.
> >
>
> FWIW I do see about this number in iostat. There's a 500M test running
> right now, and iostat reports this:
>
> Device r/s rkB/s ... rareq-sz ... %util
> md1 15273.10 143512.80 ... 9.40 ... 93.64
>
> So it's not like we're issuing far fewer I/Os than the SSD can handle.
Not really related to this thread:
IME iostat's utilization is pretty much useless for anything other than "is
something happening at all", and even that is not reliable. I don't know the
full reason for it, but I long learned to just discount it.
I ran
fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1 --runtime=100 --ioengine pvsync --iodepth 1 --rate_iops=40000
a few times in a row, while watching iostat. Sometimes utilization is 100%,
sometimes it's 0.2%. Whereas if I run without rate limiting, utilization
never goes above 71%, despite doing more iops.
And then gets completely useless if you use a deeper iodepth, because there's
just not a good way to compute something like a utilization number once
you take parallel IO processing into account.
fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1 --runtime=100 --ioengine io_uring --iodepth 1 --rw randread
iodepth util iops
1 94% 9.3k
2 99.6% 18.4k
4 100% 35.9k
8 100% 68.0k
16 100% 123k
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Treat | 2025-10-07 15:37:54 | Re: Should we update the random_page_cost default value? |
Previous Message | Tomas Vondra | 2025-10-07 15:12:25 | Re: Should we update the random_page_cost default value? |