Add rows removed by hash join clause to instrumentation

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Subject: Add rows removed by hash join clause to instrumentation
Date: 2026-01-16 23:51:50
Message-ID: 9f6c6846-036f-4298-a315-5ef674d25eb7@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Playing around [1] to understand how, in practice, an engineer should
identify potential indexes, I found that nfiltered1 and nfiltered2 are
sufficient enough to detect issues with leaf table scan operators. But
the situation is worse when it comes to joins.

The idea behind JOIN optimisation is that sometimes a highly selective,
parameterised NestLoop is more performant than a HashJoin. What we need
is to identify that only a tiny part of the hash table or a sorted
MergeJoin input has been used to produce the JOIN result.

Thanks to [2 - 5], we have metrics showing how many tuples are removed
by joinqual and otherquals in a JOIN operator. That’s good for starters.
But some cases aren’t covered yet: how many tuples filtered by
hashclauses or mergeclauses.

In the attached file, you can see that for the same query, NestLoop
exposes 100k filtered tuples, but HashJoin shows nothing. Original
threads argued that ‘Filtered out’ characteristics should expose extra
work done by the operator. Hashing operation, especially on a set of
variable-length columns sometimes quite expensive. Further filtering,
involving hashclauses looks pretty similar to the joinqual filtering.

For me, ‘filtered’ value represents a flag that some massive part of the
input is not needed at all and using proper parameterisation and
indexing, we could optimise such a JOIN with NestLoop or MergeJoin.

From this point of view, it seems logical to add a nfiltered3
instrumentation field and account rows, filtered out by a ‘special’ join
clause like hashclauses or mergeclauses.

In the attachment, I propose a sketch on how to calculate these metrics.
MergeJoin looks more complicated and I don't propose it for now, but
HashJoin is quite trivial.

Known issues:
- Hash clause failures are counted in nfiltered1, which is shared with
join filter removals. If both are present, counts are combined.
- The metric only counts outer tuples with zero hash-value matches,
not hash collisions within buckets.

Thoughts?

[1] Proposal: Add rows_filtered column to pg_stat_statements for index
opportunity
detectionhttps://www.postgresql.org/message-id/CAM527d-r%2BRsaAeYsyAPmYtnmWB3rJFJtixUq4bnJW59nN%3DZo3w%40mail.gmail.com

[2] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
https://www.postgresql.org/message-id/flat/4A16A8AF(dot)2080508(at)anarazel(dot)de[3]
EXPLAIN and nfiltered - Mailing list pgsql-hackers
https://www.postgresql.org/message-id/4CE54A13.7090609@cs.helsinki.fi

[4] Re: REVIEW: EXPLAIN and nfiltered
https://www.postgresql.org/message-id/9053.1295888538%40sss.pgh.pa.us

[5] EXPLAIN and nfiltered, take two
https://www.postgresql.org/message-id/flat/4E68B108.1090907%402ndquadrant.com

--
regards, Andrei Lepikhov,
pgEdge

Attachment Content-Type Size
filtered-comparison.sql application/sql 2.1 KB
0001-Add-Rows-Removed-by-Hash-Matching-to-EXPLAIN-ANALYZE.patch text/plain 6.6 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2026-01-17 00:12:16 Re: pg_stat_statements: Fix nested tracking for implicitly closed cursors
Previous Message Jacob Champion 2026-01-16 23:44:55 Re: Serverside SNI support in libpq