Re: Making clausesel.c Smarter

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making clausesel.c Smarter
Date: 2017-04-06 23:53:31
Message-ID: CAKJS1f9L-SRUMVE9Mp3zUf603_8y4iwGRVBi2Gg7baNvu8PGXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7 April 2017 at 09:05, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>>>> If you ask me, I'd just leave:
>>>>
>>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>>>> DEFAULT_INEQ_SEL)
>>>> + {
>>>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>>>> implies we have no stats */
>>>> + s2 = DEFAULT_RANGE_INEQ_SEL;
>>>> + }
>>>> + else
>>>> + {
>>>> ...
>>>> + s2 = rqlist->hibound + rqlist->lobound - 1.0;
>>>> + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>>>> + notnullsel = 1.0 - nullsel;
>>>> +
>>>> + /* Adjust for double-exclusion of NULLs */
>>>> + s2 += nullsel;
>>>> + if (s2 <= 0.0) {
>>>> + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>>>> + {
>>>> + /* Most likely contradictory clauses found */
>>>> + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>>>> + }
>>>> + else
>>>> + {
>>>> + /* Could be a rounding error */
>>>> + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>> + }
>>>> + }
>>>> + }
>>>>
>>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>>>> cautious) estimation of the amount of rounding error that could be
>>>> there with 32-bit floats.
>>>>
>>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>>>> an estimation based on stats, maybe approximate, but one that makes
>>>> sense (ie: preserves the monotonicity of the CPF), and as such
>>>> negative values are either sign of a contradiction or rounding error.
>>>> The previous code left the possibility of negatives coming out of
>>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>>>> but that doesn't seem possible coming out of scalarineqsel.
>>>>
>>>> But I'd like it if Tom could comment on that, since he's the one that
>>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>>>> in 2004).
>>>>
>>>> Barring that, I'd just change the
>>>>
>>>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>>>
>>>> To
>>>>
>>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>>
>>>> Which makes a lot more sense.
>>>
>>> I think to fix this properly the selfuncs would have to return the
>>> estimate, and nullfrac separately, so the nullfrac could just be
>>> applied once per expression. There's already hacks in there to get rid
>>> of the double null filtering. I'm not proposing that as something for
>>> this patch. It would be pretty invasive to change this.
>>
>> There's no need, you can compute the nullfrac with nulltestsel. While
>> there's some rework involved, it's not a big deal (it just reads the
>> stats tuple), and it's a clean solution.
>
> I'm marking this Waiting on Author until we figure out what to do with
> those issues (the null issue quoted above, and the quadratic behavior)

I've attached a rebased patch as the existing one no longer applies.

I've not yet had time to wrap my head around the null handling stuff.
I'll try to get to that today.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
smarter_clausesel_2017-04-07.patch application/octet-stream 16.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2017-04-07 00:17:05 Re: pgbench - allow to store select results into variables
Previous Message Tom Lane 2017-04-06 23:47:50 Re: Performance improvement for joins where outer side is unique