| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | pg_stat_io_histogram |
| Date: | 2026-01-26 09:40:52 |
| Message-ID: | CAKZiRmwvE4uJLKTgPXeBA4m+d4tTghayoefcaM9=z3_S7i72GA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I'm proposing that we add pg_stat_io_histogram that would track/show I/O
latencies profile, so we could quickly identify I/O outliers. From time to
time users complain that 'PostgreSQL is slow or stuck' (usually COMMIT is
slow), when it is quite apparent that it is down to somewhere in the I/O
stack. It is quite easy to prove once one has proper measurement tools in
place and is able to correlate, but it takes IMHO way too much time and
energy to cross-correlate all of that information (iostat -x 1s,
wait events 1s, and so on), especially if one would like to provide rapid
response.
Right now the patch does not include per-backend/PID tracking, hopefully if
there will be interest in this, I'll add it, but I would like to first hear
if that's a good idea. The current implementation uses fast bucket calculation
to avoid overheads and tries to cover most useful range of devices via buckets
(128us..256ms, so that covers both NVMe/SSD/HDD and abnormally high latency
too as from time to time I'm try to help with I/O stuck for *seconds*,
usually a sign
of some I/O multipath issues, device resetting, or hypervisor woes).
postgres=# select
substring(backend_type,1,8) as backend,object,context,io_type,
bucket_latency_us as lat_us,
round(bucket_latency_us/1000.0, 3) as lat_ms,
bucket_count as count
from pg_stat_get_io_histogram()
where
bucket_count > 0
order by 1,2,3,4,5;
backend | object | context | io_type | lat_us | lat_ms | count
----------+----------+-----------+-----------+--------+--------+-------
autovacu | relation | normal | read | 128 | 0.128 | 54
autovacu | relation | normal | read | 256 | 0.256 | 7
autovacu | relation | normal | read | 512 | 0.512 | 1
autovacu | relation | vacuum | read | 128 | 0.128 | 8
autovacu | relation | vacuum | read | 256 | 0.256 | 5
backgrou | relation | bulkread | read | 128 | 0.128 | 658
backgrou | relation | normal | read | 128 | 0.128 | 5
checkpoi | relation | normal | fsync | 2048 | 2.048 | 37
checkpoi | relation | normal | fsync | 4096 | 4.096 | 7
checkpoi | relation | normal | fsync | 16384 | 16.384 | 4
checkpoi | relation | normal | fsync | 32768 | 32.768 | 1
checkpoi | relation | normal | fsync | 65536 | 65.536 | 1
checkpoi | relation | normal | write | 128 | 0.128 | 2059
checkpoi | relation | normal | write | 256 | 0.256 | 2
checkpoi | relation | normal | write | 512 | 0.512 | 1
checkpoi | relation | normal | writeback | 128 | 0.128 | 64
checkpoi | relation | normal | writeback | 256 | 0.256 | 1
client b | relation | bulkread | read | 128 | 0.128 | 675
client b | relation | bulkread | read | 256 | 0.256 | 1
client b | relation | bulkwrite | extend | 128 | 0.128 | 260
client b | relation | bulkwrite | extend | 512 | 0.512 | 1
client b | relation | bulkwrite | write | 128 | 0.128 | 14404
client b | relation | normal | extend | 128 | 0.128 | 6
client b | relation | normal | read | 128 | 0.128 | 273
client b | relation | normal | read | 256 | 0.256 | 6
client b | relation | vacuum | read | 128 | 0.128 | 907
client b | relation | vacuum | read | 256 | 0.256 | 3
client b | relation | vacuum | read | 512 | 0.512 | 2
Of course most of the I/O calls today are hitting page cache, so one would
expect they'll be < 128us most of the time, but above you can see here degraded
fsync/fdatasync as well (BTW that was achieved via device mapper
delayed device). My hope that above would help tremendously when dealing
with flaky storage, or I/O path issues, or even hypervisors being paused.
Alternative idea I was having would be simply to add logging of slow I/O
outliers, but meh.. then one would to answer all those questions:
what should be the threshold (=>guc?), risk of spamming the log and so on
(and I wouldn't be fond of proposing yet another log_* GUC ;))
Any hints, co-authors, or help are more than welcome!
-J.
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Add-pg_stat_io_histogram-view-to-provide-more-det.patch | text/x-patch | 30.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zsolt Parragi | 2026-01-26 09:51:02 | Re: Custom oauth validator options |
| Previous Message | Chao Li | 2026-01-26 09:35:59 | Re: Newly created replication slot may be invalidated by checkpoint |