| From: | Nikolay Samokhvalov <nik(at)postgres(dot)ai> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Proposal: Add rows_filtered column to pg_stat_statements for index opportunity detection |
| Date: | 2025-12-27 23:08:12 |
| Message-ID: | CAM527d-r+RsaAeYsyAPmYtnmWB3rJFJtixUq4bnJW59nN=Zo3w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
(I know, pg_stat_statements is already too wide, but bear with me, there is
a really important use case that the proposed column solves)
When managing many clusters with thousands of query patterns, we need
automated ways to detect missing indexes. Currently, if we want to find
candidate queries that potentially can be improved using new indexes,
looking at pg_stat_statements we need to use us total_exec_time,
mean_exec_time, shared_blks_(hit|read) -- useful, but noisy for index
detection since heavy aggregate queries rank high yet often can't benefit
from additional indexes.
Rows removed by filter is a strong signal that an index would help. EXPLAIN
shows this per-execution, but for fleet-wide automated analysis we need it
aggregated in pg_stat_statements.
The proposed patch adds:
- rows_filtered column: sum of nfiltered1 + nfiltered2 from all plan nodes
- GUC pg_stat_statements.track_rows_filtered (default: off)
- Uses INSTRUMENT_ROWS only (no timing overhead)
Implementation: sums nfiltered1+nfiltered2 at ExecutorEnd. Uses
INSTRUMENT_ROWS (not INSTRUMENT_ALL) — just counter increments, no timing
overhead. Controlled by new GUC track_rows_filtered (default: off,
PGC_SUSET). Stats file header bumped for upgrade compatibility. Includes
regression tests and docs.
Nik
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-pgss-rows_filtered.patch | application/octet-stream | 28.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Peter Eisentraut | 2025-12-27 22:47:30 | Re: Remove MsgType type |