Re: strange row count estimates with conditions on multiple column

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: strange row count estimates with conditions on multiple column
Date: 2010-12-12 19:50:50
Message-ID: 8171.1292183450@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> Dne 17.11.2010 06:58, Tom Lane napsal(a):
>>> BTW I think the default estimate used to be 1000, so it was changed in
>>> one of the 8.x releases? Can you point me to the docs? I've even tried
>>> to find that in the sources, but unsuccessfully.
>>
>> It's DEFAULT_RANGE_INEQ_SEL, and AFAIR it hasn't changed in quite a while.
>> But I wouldn't be surprised if the behavior of this example changed when
>> we boosted the default statistics target.

> I've been thinking about this and I think it might be improved. If I
> understand the logic corretly, it says 'use half of the histogram bin
> size'. But the value

> #define DEFAULT_RANGE_INEQ_SEL 0.005

> says it's always 0.5%, which is not not true if STATISTICS TARGET is not
> 100. This could actually yield 10x more precise estimates when the
> STATISTICS TARGET is set to 1000.

Huh? The default estimates are completely unrelated to the size of the
histogram, and certainly unrelated to the default size of the
histogram. We use those estimates when we don't have relevant stats.
It's pure wishful thinking to suppose that changing the statistics
target would have any impact on what the estimate ought to be in such
a case.

I believe the actual reasoning for setting the default estimates that
are under 1% is that we wanted to encourage indexscan choices in such
cases. Once it's small enough for that, making it even smaller doesn't
really help --- and that does risk making bad join choices. You don't
want the thing coming up with one-row estimates unless there's real
evidence for such an estimate.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message tv 2010-12-12 20:19:11 Re: strange row count estimates with conditions on multiple column
Previous Message Daniel Popowich 2010-12-12 15:55:16 range intervals in window function frames