Quick Links

Bad estimation for NOT IN clause with big null fraction

From:	Donghang Lin <donghanglin(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Bad estimation for NOT IN clause with big null fraction
Date:	2024-04-05 08:20:52
Message-ID:	CAA=D8a0BfJZzGBTvwsmBmBZ9_W65j0BhshObCTbera0ppeQUjw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi hackers

Discussion[1] and the relevant commit[2] improved the selectivity
calculation for IN/NOT IN.

This is the current logic for NOT IN selectivity calculation and it loops
over the array elements.

else
{
s1 = s1 * s2;
if (isInequality)
s1disjoint += s2 - 1.0;
}

By calculating s2 for each array element, it calls neqsel and returns 1 -
eqsel - nullfrac.
If I expand the s1disjoint calculation for a NOT IN (2,5,8) clause,
It eventually becomes 1 - eqsel(2) - eqsel(5) - eqsel(8) - 3*nullfrac.
If nullfrac is big, s1disjoint will be less than 0 quickly when the array
has more elements,
and the selectivity algorithm falls back to the one prior to commit[2]
which had bad estimation for NOT IN as well.

It seems to me that nullfrac should be subtracted only once. Is it feasible
that we have a new variable s1disjoint2
that add back nullfrac when we get back the result for each s2 and subtract
it once at the end of the loop as a 2nd heuristic?
We then maybe prefer s1disjoint2 over s1disjoint and then s1?

Donghang Lin
(ServiceNow)

[1]
https://www.postgresql.org/message-id/flat/CA%2Bmi_8aPEAzBgWZpNTABGM%3DcSq7mRMyPWbMsU8eGmUfH75OTLA%40mail.gmail.com
[2]
https://github.com/postgres/postgres/commit/66a7e6bae98592d1d98d9ef589753f0e953c5828

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2024-04-05 08:41:01	Re: LogwrtResult contended spinlock
Previous Message	Amit Langote	2024-04-05 08:09:53	Re: remaining sql/json patches