From: | "Peter Geoghegan" <pg(at)bowt(dot)ie> |
---|---|
To: | "Andres Freund" <andres(at)anarazel(dot)de> |
Cc: | "Tomas Vondra" <tomas(at)vondra(dot)me>, "Thomas Munro" <thomas(dot)munro(at)gmail(dot)com>, "Nazir Bilal Yavuz" <byavuz81(at)gmail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Melanie Plageman" <melanieplageman(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Georgios" <gkokolatos(at)protonmail(dot)com>, "Konstantin Knizhnik" <knizhnik(at)garret(dot)ru>, "Dilip Kumar" <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-08-14 20:12:40 |
Message-ID: | DC2FB4TJBCYR.2U40BNPFOO51E@bowt.ie |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu Aug 14, 2025 at 3:41 PM EDT, Andres Freund wrote:
> Hm, that is somewhat curious.
>
> I wonder if there's some wait time that's not being captured by "I/O
> Timings". A first thing to do would be to just run strace --summary-only while
> running the query, and see if there are syscall wait times that seem too long.
For the slow, sequential backwards scan query:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.271216 4 66808 io_uring_enter
0.00 0.000004 4 1 sendto
0.00 0.000001 0 2 1 recvfrom
0.00 0.000000 0 5 lseek
0.00 0.000000 0 1 epoll_wait
0.00 0.000000 0 4 openat
------ ----------- ----------- --------- --------- ----------------
100.00 0.271221 4 66821 1 total
For the fast, random backwards scan query:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.99 0.351518 4 77819 io_uring_enter
0.00 0.000007 2 3 1 epoll_wait
0.00 0.000006 6 1 sendto
0.00 0.000003 1 3 2 recvfrom
0.00 0.000002 2 1 read
0.00 0.000002 2 1 1 rt_sigreturn
0.00 0.000002 2 1 getpid
0.00 0.000002 1 2 kill
0.00 0.000000 0 3 lseek
------ ----------- ----------- --------- --------- ----------------
100.00 0.351542 4 77834 4 total
> What effective_io_concurrency and io_max_concurrency setting are you using? If
> there are no free IO handles that's currently not nicely reported (because
> it's unclear how exactly to do so, see comment above pgaio_io_acquire_nb()).
effective_io_concurrency is 100. io_max_concurrency is 64. Nothing out of
the ordinary there.
> iostat -xmy /path/to/block/device
>
> I'd like to see the difference in average IO size (rareq-sz), queue depth
> (aqu-sz) and completion time (r_await) between the fast and slow cases.
I'll show one second interval output.
Slow, sequential backwards scan query
-------------------------------------
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 24613.00 192.29 0.00 0.00 0.20 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.92 53.20
avg-cpu: %user %nice %system %iowait %steal %idle
0.22 0.00 0.44 0.85 0.00 98.50
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 25320.00 197.81 0.00 0.00 0.20 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.18 51.20
Fast, random backwards scan query
---------------------------------
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 27140.59 212.04 0.00 0.00 0.20 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.50 23.37
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.84 0.00 0.00 98.66
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 50401.00 393.76 0.00 0.00 0.20 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10.06 41.60
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-08-14 20:40:25 | Re: POC: Parallel processing of indexes in autovacuum |
Previous Message | Tom Lane | 2025-08-14 19:45:41 | Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words |