Re: Row count estimation bug in BETWEEN?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Yaroslav <ladayaroslav(at)yandex(dot)ru>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Row count estimation bug in BETWEEN?
Date: 2015-06-14 16:09:05
Message-ID: 10092.1434298145@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Yaroslav <ladayaroslav(at)yandex(dot)ru> writes:
> Tom Lane-2 wrote
>> PG doesn't try to estimate inequalities exactly, because it usually
>> doesn't make enough of a difference to matter. Currently we don't
>> even bother to distinguish say ">" from ">=" for estimation purposes,
>> though certainly we would need to in order to deal with zero-width ranges
>> with any great amount of precision.

> Thank you for your answer!

> I'm sorry, but after looking into documentation and sources
> (scalarineqsel function in selfuncs.c, clauselist_selectivity and
> addRangeClause functions in clausesel.c) and experimenting a little I've
> got an impression that PostgreSQL actually bothers to distinguish ">"
> from ">=" for estimation purposes sometimes (probably, when MCV is
> used), but in my example it uses histogram and indeed doesn't
> distinguish them.

Well, I was oversimplifying a bit. When testing the MCV list we use the
original operator, so that if the comparison constant is equal to some
MCV entry, it will indeed matter whether you said ">" or ">=". When
dealing with the histogram, however, we don't pay attention to the
difference. The assumption is that the histogram represents a
continuous distribution of values in which no one value occurs often
enough to be interesting (if it did, it would be in the MCV list...).
Therefore it does not matter much whether any specific histogram entry
is exactly "=". And of course, for comparison values that are between
histogram entries, we have no idea whatsoever whether there are any
"=" entries in the table; so even if the code did distinguish ">" from
">=", it would be unclear what to do with the knowledge.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Anton Bushmelev 2015-06-14 23:30:55 pg_last_xact_replay_timestamp lies
Previous Message Yaroslav 2015-06-14 10:59:02 Re: Row count estimation bug in BETWEEN?