Quick Links

Re: Reduce planning time for large NOT IN lists containing NULL

From:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To:	David Geier <geidav(dot)pg(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Reduce planning time for large NOT IN lists containing NULL
Date:	2026-02-24 21:14:01
Message-ID:	4c761b02-5a60-4076-aa0c-9c6fef06e2c1@tantorlabs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2/24/26 11:29, David Geier wrote:

> Using array_contains_nulls() seems fine. In case the IN list doesn't
> contain NULL, the function can immediately bail thanks to the
> !ARR_HASNULL() check in the beginning.
>
> It only needs to iterate over the NULL-bitmap, if it exists. This is the
> case if there's actually a NULL element in the array, or if the array
> initially contained NULL and all NULLs got removed subsequently.
>
> If we ever find the latter case to matter we could remove the
> NULL-bitmap in array_set_element() / array_set_element_expanded(), when
> the last NULL element got removed.
>
>> Could you clarify what exactly this additional test meant to verify?
> Zsolt's test case creates an array that initially contains NULL. The
> NULL element is subsequently replaced by a non-NULL value but
> array_set_element_expanded() keeps the NULL-bitmap around. With that,
> your ARR_ISNULL() check bails and causes the selectivity estimation to
> incorrectly return 0.

Ah, right - thanks for the clarification. I agree.

Regarding the regression test: the suggestion test case is good, but
there is not a straightforward way to expose the estimated row count
without also showing the costs, and costs are unstable. To avoid that, I
reused the parsing approach already present in stats_ext.sql to extract
only the estimated row count from EXPLAIN.

Since the test table contains exactly 1000 rows and we run ANALYZE, all
rows are included in the statistics sample. Therefore the estimate for x
<> ALL(array[1, 99, 2]) is deterministically 997 rows, and the test
stable and ensures we detect the incorrect early-zero estimate.

Let me know if you'd prefer a different approach. I've attached v4 patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Attachment	Content-Type	Size
v4-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch	text/x-patch	4.9 KB

In response to

Re: Reduce planning time for large NOT IN lists containing NULL at 2026-02-24 08:29:27 from David Geier

Responses

Re: Reduce planning time for large NOT IN lists containing NULL at 2026-02-25 13:43:54 from Zsolt Parragi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2026-02-24 21:51:43	Re: Cleaning up array_ref() and array_set()
Previous Message	Tom Lane	2026-02-24 21:07:39	Re: Optimize SELECT * in EXISTS