pg_stat_io_histogram

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: pg_stat_io_histogram
Date: 2026-01-26 09:40:52
Message-ID: CAKZiRmwvE4uJLKTgPXeBA4m+d4tTghayoefcaM9=z3_S7i72GA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm proposing that we add pg_stat_io_histogram that would track/show I/O
latencies profile, so we could quickly identify I/O outliers. From time to
time users complain that 'PostgreSQL is slow or stuck' (usually COMMIT is
slow), when it is quite apparent that it is down to somewhere in the I/O
stack. It is quite easy to prove once one has proper measurement tools in
place and is able to correlate, but it takes IMHO way too much time and
energy to cross-correlate all of that information (iostat -x 1s,
wait events 1s, and so on), especially if one would like to provide rapid
response.

Right now the patch does not include per-backend/PID tracking, hopefully if
there will be interest in this, I'll add it, but I would like to first hear
if that's a good idea. The current implementation uses fast bucket calculation
to avoid overheads and tries to cover most useful range of devices via buckets
(128us..256ms, so that covers both NVMe/SSD/HDD and abnormally high latency
too as from time to time I'm try to help with I/O stuck for *seconds*,
usually a sign
of some I/O multipath issues, device resetting, or hypervisor woes).

postgres=# select
substring(backend_type,1,8) as backend,object,context,io_type,
bucket_latency_us as lat_us,
round(bucket_latency_us/1000.0, 3) as lat_ms,
bucket_count as count
from pg_stat_get_io_histogram()
where
bucket_count > 0
order by 1,2,3,4,5;
backend | object | context | io_type | lat_us | lat_ms | count
----------+----------+-----------+-----------+--------+--------+-------
autovacu | relation | normal | read | 128 | 0.128 | 54
autovacu | relation | normal | read | 256 | 0.256 | 7
autovacu | relation | normal | read | 512 | 0.512 | 1
autovacu | relation | vacuum | read | 128 | 0.128 | 8
autovacu | relation | vacuum | read | 256 | 0.256 | 5
backgrou | relation | bulkread | read | 128 | 0.128 | 658
backgrou | relation | normal | read | 128 | 0.128 | 5
checkpoi | relation | normal | fsync | 2048 | 2.048 | 37
checkpoi | relation | normal | fsync | 4096 | 4.096 | 7
checkpoi | relation | normal | fsync | 16384 | 16.384 | 4
checkpoi | relation | normal | fsync | 32768 | 32.768 | 1
checkpoi | relation | normal | fsync | 65536 | 65.536 | 1
checkpoi | relation | normal | write | 128 | 0.128 | 2059
checkpoi | relation | normal | write | 256 | 0.256 | 2
checkpoi | relation | normal | write | 512 | 0.512 | 1
checkpoi | relation | normal | writeback | 128 | 0.128 | 64
checkpoi | relation | normal | writeback | 256 | 0.256 | 1
client b | relation | bulkread | read | 128 | 0.128 | 675
client b | relation | bulkread | read | 256 | 0.256 | 1
client b | relation | bulkwrite | extend | 128 | 0.128 | 260
client b | relation | bulkwrite | extend | 512 | 0.512 | 1
client b | relation | bulkwrite | write | 128 | 0.128 | 14404
client b | relation | normal | extend | 128 | 0.128 | 6
client b | relation | normal | read | 128 | 0.128 | 273
client b | relation | normal | read | 256 | 0.256 | 6
client b | relation | vacuum | read | 128 | 0.128 | 907
client b | relation | vacuum | read | 256 | 0.256 | 3
client b | relation | vacuum | read | 512 | 0.512 | 2

Of course most of the I/O calls today are hitting page cache, so one would
expect they'll be < 128us most of the time, but above you can see here degraded
fsync/fdatasync as well (BTW that was achieved via device mapper
delayed device). My hope that above would help tremendously when dealing
with flaky storage, or I/O path issues, or even hypervisors being paused.

Alternative idea I was having would be simply to add logging of slow I/O
outliers, but meh.. then one would to answer all those questions:
what should be the threshold (=>guc?), risk of spamming the log and so on
(and I wouldn't be fond of proposing yet another log_* GUC ;))

Any hints, co-authors, or help are more than welcome!

-J.

Attachment Content-Type Size
v1-0001-Add-pg_stat_io_histogram-view-to-provide-more-det.patch text/x-patch 30.3 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Zsolt Parragi 2026-01-26 09:51:02 Re: Custom oauth validator options
Previous Message Chao Li 2026-01-26 09:35:59 Re: Newly created replication slot may be invalidated by checkpoint