Re: Hash-based MCV matching for large IN-lists

From: Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash-based MCV matching for large IN-lists
Date: 2026-02-07 07:42:01
Message-ID: CAHza6qdAH99C0xa27YDTixiNVFa99j90QYquUPAxL0JwahmggA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for this patch.
I've been studying how PostgreSQL handles selectivity estimation, and this
optimization for large IN-lists looks very useful.

I ran some tests for the special cases David mentioned:

- NULL + ALL: correctly returns selectivity ≈ 0 (rows=1)
- isInequality: <> ALL estimates match NOT IN
- Cross-type: int = ANY(bigint[]) works correctly
- Duplicate values: IN (1,1,1,2,2,3) preserves existing behavior

I noticed a few minor points:

1. The comment in MCVInHashEntry struct contains a typo:
"number of occurrences od current value" -> "of"

2. The ALL + NULL early-return logic appears in two places (lines 2579-2591
and 2644-2656). I initially considered consolidating this by checking for
NULL elements before building the hash table, but realized this would add
an extra loop in the common case where there are no NULLs.
Perhaps a brief comment explaining why this check is duplicated (to
avoid the overhead of a separate NULL-scanning loop) would help future
readers understand the design choice?

3. Minor style suggestion: adding a brief SQL example in the header comment
(e.g., "WHERE x IN (1,2,3,...)" or "WHERE x = ANY(ARRAY[...])") might help
future readers quickly understand the use case.

Thanks again for working on this optimization. It's been very educational
to follow the discussion and understand how selectivity estimation works in
PostgreSQL.

Regards,
Tatsuya Kawata

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2026-02-07 08:54:31 Re: refactor architecture-specific popcount code
Previous Message Henson Choi 2026-02-07 05:41:26 Re: Row pattern recognition