Re: Making clausesel.c Smarter

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making clausesel.c Smarter
Date: 2017-04-06 21:05:41
Message-ID: CAGTBQpbbHV0f8JgRr43dVNS559Ve-OS8tCt6aBVcxvFmq52MiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>>> If you ask me, I'd just leave:
>>>
>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>>> DEFAULT_INEQ_SEL)
>>> + {
>>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>>> implies we have no stats */
>>> + s2 = DEFAULT_RANGE_INEQ_SEL;
>>> + }
>>> + else
>>> + {
>>> ...
>>> + s2 = rqlist->hibound + rqlist->lobound - 1.0;
>>> + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>>> + notnullsel = 1.0 - nullsel;
>>> +
>>> + /* Adjust for double-exclusion of NULLs */
>>> + s2 += nullsel;
>>> + if (s2 <= 0.0) {
>>> + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>>> + {
>>> + /* Most likely contradictory clauses found */
>>> + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>>> + }
>>> + else
>>> + {
>>> + /* Could be a rounding error */
>>> + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>> + }
>>> + }
>>> + }
>>>
>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>>> cautious) estimation of the amount of rounding error that could be
>>> there with 32-bit floats.
>>>
>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>>> an estimation based on stats, maybe approximate, but one that makes
>>> sense (ie: preserves the monotonicity of the CPF), and as such
>>> negative values are either sign of a contradiction or rounding error.
>>> The previous code left the possibility of negatives coming out of
>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>>> but that doesn't seem possible coming out of scalarineqsel.
>>>
>>> But I'd like it if Tom could comment on that, since he's the one that
>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>>> in 2004).
>>>
>>> Barring that, I'd just change the
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>>
>>> To
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>
>>> Which makes a lot more sense.
>>
>> I think to fix this properly the selfuncs would have to return the
>> estimate, and nullfrac separately, so the nullfrac could just be
>> applied once per expression. There's already hacks in there to get rid
>> of the double null filtering. I'm not proposing that as something for
>> this patch. It would be pretty invasive to change this.
>
> There's no need, you can compute the nullfrac with nulltestsel. While
> there's some rework involved, it's not a big deal (it just reads the
> stats tuple), and it's a clean solution.

I'm marking this Waiting on Author until we figure out what to do with
those issues (the null issue quoted above, and the quadratic behavior)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-04-06 21:11:35 Re: [COMMITTERS] pgsql: Increase parallel bitmap scan test coverage.
Previous Message Andres Freund 2017-04-06 21:05:30 Re: tuplesort_gettuple_common() and *should_free argument