Re: WIP patch: distinguish selectivity of < from <= and > from >=

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP patch: distinguish selectivity of < from <= and > from >=
Date: 2017-07-04 18:43:08
Message-ID: CAGz5QCLkXU1PsDmj5AZdYY0m+23D3304=qRv+W11m_vpucLeEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 4, 2017 at 10:56 PM, Kuntal Ghosh
<kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
> On Tue, Jul 4, 2017 at 9:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> writes:
>>> On Tue, Jul 4, 2017 at 9:23 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> ... I have to admit that I've failed to wrap my brain around exactly
>>>> why it's correct. The arguments that I've constructed so far seem to
>>>> point in the direction of applying the opposite correction, which is
>>>> demonstrably wrong. Perhaps someone whose college statistics class
>>>> wasn't quite so long ago can explain this satisfactorily?
>>
>>> I guess that you're referring the last case, i.e.
>>> explain analyze select * from tenk1 where thousand between 10 and 10;
>>
>> No, the thing that is bothering me is why it seems to be correct to
>> apply a positive correction for ">=", a negative correction for "<",
>> and no correction for "<=" or ">". That seems weird and I can't
>> construct a plausible explanation for it. I think it might be a
>> result of the fact that, given a discrete distribution rather than
>> a continuous one, the histogram boundary values should be understood
>> as having some "width" rather than being zero-width points on the
>> distribution axis. But the arguments I tried to fashion on that
>> basis led to other rules that didn't actually work.
>>
>> It's also possible that this logic is in fact wrong and it just happens
>> to give the right answer anyway for uniformly-distributed cases.
>>
> So, here are two points I think:
> 1. When should we apply(add/subtract) the correction?
> 2. What should be the correction?
>
> The first point:
> there can be further two cases,
> a) histfrac - actual_selectivity(p<=0) = 0.
Sorry for the typo. I meant (p<=10) for all the cases.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Barai 2017-07-04 19:10:41 GSoC 2017: weekly progress reports (week 5) and Proposal for predicate locking in gin index
Previous Message Kuntal Ghosh 2017-07-04 17:26:53 Re: WIP patch: distinguish selectivity of < from <= and > from >=