| From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
|---|---|
| To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Reduce planning time for large NOT IN lists containing NULL |
| Date: | 2026-02-18 14:11:56 |
| Message-ID: | eaa2598c-5356-4e1e-9ec3-5fd6eb1cd704@tantorlabs.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
In this thread [0] an interesting idea came up about avoiding
unnecessary work during selectivity estimation for x <> ALL (NULL, ...)
or x NOT IN (NULL, ...)
Semantically, if the array contains at least one NULL, the selectivity
of x NOT IN (...) is always 0.0, regardless of the other elements in the
list.
Currently, the planner still iterates over all array elements and
invokes the operator's selectivity estimator for each of them. For large
IN / ALL lists, this increases planning time.
For constant arrays I propose adding a simple pre-check before entering
the per-element loop: detect whether the array contains at least one
NULL element (e.g., via memchr() for the deconstructed array case). If
so, and we are in the ALL / NOT IN case, we can immediately return
selectivity = 0.0 and skip all further computation. This would avoid
extra per-element estimation work while preserving semantics.
In cases where array elements are not known to be constants in advance,
such a pre-check is less straightforward, because each element must
first be inspected to determine whether it is a Const and whether it is
NULL. That already requires iterating through the list, so introducing a
separate early pass would not actually reduce the amount of work.
Therefore, it like makes sense to keep the current behavior in that
situation.
Thoughts?
--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch | text/x-patch | 1.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Aleksander Alekseev | 2026-02-18 14:57:55 | Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions |
| Previous Message | Nazir Bilal Yavuz | 2026-02-18 13:38:07 | Re: Speed up COPY FROM text/CSV parsing using SIMD |