Re: Bad estimate with partial index

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: André Hänsel <andre(at)webkr(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bad estimate with partial index
Date: 2022-04-20 13:39:25
Message-ID: 34f4218c-f267-a851-d744-75e242661848@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/20/22 09:58, Tomas Vondra wrote:
> On 4/19/22 23:08, Tom Lane wrote:
>> I wrote:
>>> it looks like the problem is that the extended stats haven't been used
>>> while forming the estimate of the number of index entries retrieved, so
>>> we overestimate the cost of using this index.
>>> That seems like a bug. Tomas?
>>
>> I dug into this enough to locate the source of the problem.
>> btcostestimate includes the partial index clauses in what it
>> sends to clauselist_selectivity, but per the comments for
>> add_predicate_to_index_quals:
>>
>> * Note that indexQuals contains RestrictInfo nodes while the indpred
>> * does not, so the output list will be mixed. This is OK for both
>> * predicate_implied_by() and clauselist_selectivity(), but might be
>> * problematic if the result were passed to other things.
>>
>> That comment was true when it was written, but it's been falsified
>> by the extended-stats patches, which have added a whole lot of logic
>> in and under clauselist_selectivity that ignores clauses that are not
>> RestrictInfos.
>>
>> While we could perhaps fix this by having add_predicate_to_index_quals
>> add RestrictInfos, I'm inclined to feel that the extended-stats code
>> is in the wrong. The contract for clauselist_selectivity has always
>> been that it could optimize if given RestrictInfos rather than bare
>> clauses, not that it would fail to work entirely without them.
>> There are probably more places besides add_predicate_to_index_quals
>> that are relying on that.
>>
>
> Yes, that seems like a fair assessment. I'll look into fixing this, not
> sure how invasive it will get, though.
>

So, here's a WIP fix that improves the example shared by Andre, and does
not seem to break anything (or at least not any regression test).

The whole idea is that instead of bailing out for non-RestrictInfo case,
it calculates the necessary information for the clause from scratch.
This means relids and pseudoconstant flag, which are checked to decide
if the clause is compatible with extended stats.

But when inspecting how to calculate pseudoconstant, I realized that
maybe that's not really needed. Per distribute_qual_to_rels() we only
set it to 'true' when bms_is_empty(relids), and we already check that
relids is a singleton, so it can't be empty - which means pseudoconstant
can't be true either.

Andre, are you in position to test this fix with your application? Which
Postgres version are you using, actually?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
extended-stats-fix.patch text/x-patch 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-04-20 13:50:36 Re: Odd off-by-one dirty buffers and checkpoint buffers written
Previous Message Robert Haas 2022-04-20 13:26:07 Re: using an end-of-recovery record in all cases