Re: Reduce planning time for large NOT IN lists containing NULL

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, David Geier <geidav(dot)pg(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reduce planning time for large NOT IN lists containing NULL
Date: 2026-03-03 21:55:31
Message-ID: 390a46f3-dbc4-4dc1-b49d-5cc61dd36026@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/3/26 04:08, David Rowley wrote:

> I had a look at this and wondered if we guarantee that no rows will
> match, then why can't we perform constant folding on the
> ScalarArrayOpExpr when !useOr and the array contains a NULL element
> and the operator is strict. Seemingly, one of the reasons for that is
> down to the expression returning NULL vs false. Take the following two
> tests from expressions.out:
>
> select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 2, null);
> ?column?
> ----------
>
> (1 row)
>
> select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1, null);
> ?column?
> ----------
> f
> (1 row)
>
> Here we see that we return false when we find the left operand in the
> array, but NULL when we don't find it and the array contains NULL. So,
> unless the left operand is a const, we wouldn't know how to simplify
> the ScalarArrayOpExpr during planning as the false or NULL would only
> be known during evaluation of the expression.
>
> However, when the expression being simplified is an EXPRKIND_QUAL, it
> shouldn't matter if the result is false or NULL as both mean the same
> and there shouldn't be any code that cares about the difference.
> Currently, we don't pass the "kind" down into
> eval_const_expressions(), but I don't really see why we couldn't. It
> would be a fair bit of work figuring out with confidence what the
> extra arg should be passed as in all the existing call sites of that
> function. We'd have to document in the header comment for
> eval_const_expressions() that constant-folding on EXPRKIND_QUAL
> expressions can enable additional optimisations which disregard the
> difference between NULL and false.
>
> For the patch, I imagine it's still a useful optimisation as the
> ScalarArrayOpExpr might not be in an EXPRKIND_QUAL.

I agree that this could be a useful optimization, at least
for EXPRKIND_QUAL, where NULL and false are semantically equivalent.
However, passing EXPRKIND through eval_const_expressions() would requir
careful auditing of all call sites. If I explore this further and have
something concrete, I'll start a separate thread on that topic.

> There are a couple of things I don't like:
>
> 1) The new test is in expressions.sql. The comment at the top of that
> file reads: "expression evaluation tests that don't fit into a more
> specific file". The new test isn't anything to do with expression
> evaluation. It's about planner estimation. I see that
> misc_function.sql has the explain_mask_costs() function. I'm not sure
> that's the right place either, as the usages of that function are for
> testing SupportRequestRows prosupport functions. I wonder if we need a
> dedicated row_estimate.sql or selectivity_est.sql file. The
> explain_mask_costs() wouldn't be out of place if they were moved into
> a new test like that. It was me that started putting those in
> misc_function.sql, and I don't object to them being moved to a new
> test. I'd be as a separate commit, however.

I've moved explain_mask_costs() and all related tests into a new
regression test file selectivity_est.sql. This is done as separate patch
v6-0001 containing only the test refactoring, with no behavioral changes.

> 2) The new test creates a new table and inserts 1000 rows. There does
> not seem to be anything special about the new table. Why don't you use
> one of the ones from test_setup.sql?

I've switched the test to use tenk1 from test_setup.sql instead of
creating a new table.

>
> 3) Looking at var_eq_const(), it seems like it's coded to assume the
> operator is always strict, per "If the constant is NULL, assume
> operator is strict and return zero". If that's good enough for
> var_eq_const(), then it should be good enough for the new code. I
> think it would be good if you wrote that or something similar in the
> new code so that the reader knows taking the short-circuit with
> non-strict functions is on purpose.

The updated comments are included in the v6-0002, and the test is now in
selectivity_est.sql

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Attachment Content-Type Size
v6-0002-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch text/x-patch 4.6 KB
v6-0001-Move-planner-row-estimation-tests-to-selectivity..patch text/x-patch 29.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2026-03-03 22:08:21 Re: [oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?)
Previous Message Tom Lane 2026-03-03 21:27:58 Re: Areas for Solaris support modernization