Re: Patch: improve selectivity estimation for IN/NOT IN

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: improve selectivity estimation for IN/NOT IN
Date: 2012-03-08 04:07:19
Message-ID: 407.1331179639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> writes:
>> the attached patch improves the array selectivity estimation for = ANY
>> and <> ALL, hence for the IN/NOT IN operators, to avoid the
>> shortcoming described in
>> <http://archives.postgresql.org/pgsql-performance/2012-03/msg00006.php>.

I've committed a modified version of this patch.

> I'm not sure offhand which way is better. It could be argued that yours
> is more appropriate because if the operator isn't btree equality, but acts
> enough like it to use eqsel() as estimator, then it's still appropriate
> for scalararraysel() to treat it as equality. On the other side of the
> coin, an operator might be equality but have reason to use some
> operator-specific estimator rather than eqsel().

After some reflection I decided it was probably sane to use both
methods, that is apply the disjoint-probabilities calculation if the
operator appears to be equality/inequality according to either rule.
However, I also put in the safety valve I suggested in the -performance
thread, that the code fall back to the old calculation if the assumption
of disjoint probabilities yields an impossible result. The patch as you
had it would have just clamped such a result to 1 or 0, which didn't
seem to me to be the best we could do.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-03-08 04:23:45 Re: Custom Operators Cannot be Found for Composite Type Values
Previous Message Fujii Masao 2012-03-08 03:29:29 Re: pg_stat_statements and planning time