Re: Postgres picks suboptimal index after building of an extended statistics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Postgres picks suboptimal index after building of an extended statistics
Date: 2022-07-07 22:07:36
Message-ID: 3119052.1657231656@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
> On 12/8/21 04:26, Tomas Vondra wrote:
>> I wonder if we should teach clauselist_selectivity about UNIQUE indexes,
>> and improve the cardinality estimates directly, not just costing for
>> index scans.

> I tried to implement this in different ways. But it causes additional
> overhead and code complexity - analyzing a list of indexes and match
> clauses of each index with input clauses in each selectivity estimation.
> I don't like that way and propose a new patch in attachment.

I looked at this briefly. I do not think that messing with
btcostestimate/genericcostestimate is the right response at all.
The problem can be demonstrated with no index whatever, as in the
attached shortened version of the original example. I get

QUERY PLAN
---------------------------------------------------
Seq Scan on a (cost=0.00..46.02 rows=1 width=12)
Filter: ((x = 1) AND (y = 1) AND (z = 1))
(2 rows)

before adding the extended stats, and

QUERY PLAN
----------------------------------------------------
Seq Scan on a (cost=0.00..46.02 rows=28 width=12)
Filter: ((x = 1) AND (y = 1) AND (z = 1))
(2 rows)

afterwards. So the extended stats have made the rowcount
estimate significantly worse, which seems like an indicator of a
bug somewhere in extended stats. The more so because I can crank
default_statistics_target all the way to 10000 without these
estimates changing. If we can't get a dead-on estimate for a
2001-row table at that stats level, we're doing something wrong,
surely?

Also, I found that if I ask only for ndistinct stats,
I still get rows=1. The fishiness seems to be directly
a problem with dependencies stats.

regards, tom lane

Attachment Content-Type Size
badestimate.sql text/plain 550 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-07-07 22:20:15 Re: explain analyze rows=%.0f
Previous Message Nathan Bossart 2022-07-07 22:07:16 Re: remove more archiving overhead