Re: Making clausesel.c Smarter

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making clausesel.c Smarter
Date: 2017-04-07 21:33:07
Message-ID: CAGTBQpZ6ekdwgQ2p558ic0ma+SJiANboF0zdXmPDvanCQUePZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 7, 2017 at 2:28 AM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>> DEFAULT_INEQ_SEL)
>> + {
>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>> implies we have no stats */
>> + s2 = DEFAULT_RANGE_INEQ_SEL;
>> + }
>> + else
>> + {
>> ...
>> + s2 = rqlist->hibound + rqlist->lobound - 1.0;
>> + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>> + notnullsel = 1.0 - nullsel;
>> +
>> + /* Adjust for double-exclusion of NULLs */
>> + s2 += nullsel;
>> + if (s2 <= 0.0) {
>> + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>> + {
>> + /* Most likely contradictory clauses found */
>> + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>> + }
>> + else
>> + {
>> + /* Could be a rounding error */
>> + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>> + }
>> + }
>> + }
>>
>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>> cautious) estimation of the amount of rounding error that could be
>> there with 32-bit floats.
>>
>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>> an estimation based on stats, maybe approximate, but one that makes
>> sense (ie: preserves the monotonicity of the CPF), and as such
>> negative values are either sign of a contradiction or rounding error.
>> The previous code left the possibility of negatives coming out of
>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>> but that doesn't seem possible coming out of scalarineqsel.
>
> I notice this does change the estimates for contradictory clauses such as:
>
> create table a (value int);
> insert into a select x/100 from generate_Series(1,10000) x;
> analyze a;
> explain analyze select * from a where value between 101 and -1;
>
> We now get:
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------
> Seq Scan on a (cost=0.00..195.00 rows=1 width=4) (actual
> time=1.904..1.904 rows=0 loops=1)
> Filter: ((value >= 101) AND (value <= '-1'::integer))
> Rows Removed by Filter: 10000
> Planning time: 0.671 ms
> Execution time: 1.950 ms
> (5 rows)
>
> where before we'd get:
>
> QUERY PLAN
> ----------------------------------------------------------------------------------------------
> Seq Scan on a (cost=0.00..195.00 rows=50 width=4) (actual
> time=0.903..0.903 rows=0 loops=1)
> Filter: ((value >= 101) AND (value <= '-1'::integer))
> Rows Removed by Filter: 10000
> Planning time: 0.162 ms
> Execution time: 0.925 ms
> (5 rows)
>
> Which comes from the 10000 * 0.005 selectivity estimate from tuples *
> DEFAULT_RANGE_INEQ_SEL
>
> I've attached a patch to this effect.

+ /*
+ * A zero or slightly negative s2 should be converted into
+ * a small positive value; we probably are dealing with a
+ * very tight range and got a bogus result due to roundoff
+ * errors. However, if s2 is very negative, then we
+ * probably have default selectivity estimates on one or
+ * both sides of the range that we failed to recognize
+ * above for some reason.
+ */
+ if (s2 <= 0.0)

That comment seems outdated

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

Also, can you add a test case? Cost values could make the test
fragile, so if that gives you trouble I'll understand, but it'd be
best to try and test this if possible.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-04-07 21:38:45 Re: Compilation warning with MSVC in pg_depend.c
Previous Message Joe Conway 2017-04-07 21:22:19 Re: partitioned tables and contrib/sepgsql