Re: a wrong index choose when statistics is out of date

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-03-08 10:20:39
Message-ID: ed69f2d6-a63c-4fe2-b9ec-613b4b9cf484@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/3/2024 17:32, David Rowley wrote:
> On Thu, 7 Mar 2024 at 21:17, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> I would like to ask David why the var_eq_const estimator doesn't have an
>> option for estimation with a histogram. Having that would relieve a
>> problem with skewed data. Detecting the situation with incoming const
>> that is out of the covered area would allow us to fall back to ndistinct
>> estimation or something else. At least, histogram usage can be
>> restricted by the reltuples value and ratio between the total number of
>> MCV values and the total number of distinct values in the table.
>
> If you can think of a way how to calculate it, you should propose a patch.
>
> IIRC, we try to make the histogram buckets evenly sized based on the
> number of occurrences. I've not followed the code in default, I'd
> guess that doing that allows us to just subtract off the MCV
> frequencies and assume the remainder is evenly split over each
> histogram bucket, so unless we had an n_distinct per histogram bucket,
> or at the very least n_distinct_for_histogram_values, then how would
> the calculation look for what we currently record?
Yeah, It is my mistake; I see nothing special here with such a kind of
histogram: in the case of a coarse histogram net, the level of
uncertainty in one bin is too high to make a better estimation. I am
just pondering detection situations when estimation constant is just out
of statistics scope to apply to alternative, more expensive logic
involving the number of index pages out of the boundary, index tuple
width, and distinct value. The Left and right boundaries of the
histogram are suitable detectors for such a situation.

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-03-08 10:26:17 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message Bertrand Drouvot 2024-03-08 10:19:11 Spurious pgstat_drop_replslot() call