Re: Hash-based MCV matching for large IN-lists

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Cc: David Geier <geidav(dot)pg(at)gmail(dot)com>, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash-based MCV matching for large IN-lists
Date: 2026-03-20 15:58:43
Message-ID: 98f8b6e5-22ec-46ca-9d58-197c4259b65d@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/11/26 11:01, Zsolt Parragi wrote:

> + /*
> + * For ALL semantics, if the array contains NULL, assume
> + * operator is strict. The ScalarArrayOpExpr cannot
> + * evaluate to TRUE, so return zero.
> + */
>
>
>
> + nonconst_sel = var_eq_non_const(&vardata, operator,
> + clause->inputcollid,
> + other_op, var_on_left,
> + isInequality);
>
> + if (isInequality)
> + individual_s = 1.0 - individual_s - nullfrac;
>
> Isn't this the double negation issue again, which was once
> mentioned/fixed earlier?

Right. I fixed it by using 'invert' for non-constant case. If there is a
more elegant way to structure this, suggestions are very welcome.

> + int count; /* number of occurrences of current value in */
>
> That's a truncated comment

Fixed.

After the commit c95cd29 I have rebased this patch. During the rebase, I
also add the NUL-handling path. In particular, I added an Assert(useOr)
in the relevant branch to document and enforce the expected execution flow.

Additionally after the 374a639 I prepared a set of regression-style
tests to verify that the selectivity estimates remain unchanged before
and after applying the patch. However, these tests rely on stable row
estimates from EXPLAIN, which are not guaranteed to be consistent across
platforms. For that reason, they are not suitable for inclusion in the
upstream test suite. I will keep these tests locally to validate
correctness before and after the patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Attachment Content-Type Size
v9-0001-Use-hash-based-MCV-matching-for-ScalarArrayOpExpr.patch text/x-patch 18.9 KB
hash_based_any_tests.patch text/x-patch 14.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-03-20 16:00:00 Re: Use WALReadFromBuffers in more places
Previous Message Robert Haas 2026-03-20 15:58:08 TupleDescAttr bounds checks