Re: Reduce planning time for large NOT IN lists containing NULL

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reduce planning time for large NOT IN lists containing NULL
Date: 2026-02-20 09:06:55
Message-ID: 7791ec74-369d-48c7-969c-cf4ad18024f2@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

Thanks for review

On 2/19/26 18:38, David Geier wrote:
> +1 on the general idea.
>
>> For constant arrays I propose adding a simple pre-check before entering
>> the per-element loop: detect whether the array contains at least one
>> NULL element (e.g., via memchr() for the deconstructed array case). If
>> so, and we are in the ALL / NOT IN case, we can immediately return
>> selectivity = 0.0 and skip all further computation. This would avoid
>> extra per-element estimation work while preserving semantics.
> How much overhead does the memchr() call add? It seems like this
> approach optimizes the edge case at the expense of the common case,
> which doesn't seem like a good trade-off.
>
> How about instead adding a flag to ArrayType which indicates if the
> array contains NULL or not. This flag could be set in
> construct_md_array() which already iterates over all elements. The flag
> would need to be kept up-to-date, e.g. in array_set_element() and
> possibly other functions modifying the array.

It seems we might reinventing the wheel.

There is already ARR_HASNULL() which allows us to detect the presence of
NULL in ArrayType.

>> In cases where array elements are not known to be constants in advance,
>> such a pre-check is less straightforward, because each element must
>> first be inspected to determine whether it is a Const and whether it is
>> NULL. That already requires iterating through the list, so introducing a
>> separate early pass would not actually reduce the amount of work.
>> Therefore, it like makes sense to keep the current behavior in that
>> situation.
> Agreed.

After thinking about this more, is seems reasonable to short-circuit еру
loop when we detect a NULL element by checking whether the element is a
Const and NULL.

I've attached v2 patch.

--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Attachment Content-Type Size
v2-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2026-02-20 09:08:03 Re: Skipping schema changes in publication
Previous Message Maxim Orlov 2026-02-20 08:47:12 Re: POC: make mxidoff 64 bits