Quick Links

Re: Postgres picks suboptimal index after building of an extended statistics

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Postgres picks suboptimal index after building of an extended statistics
Date:	2022-07-07 22:07:36
Message-ID:	3119052.1657231656@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
> On 12/8/21 04:26, Tomas Vondra wrote:
>> I wonder if we should teach clauselist_selectivity about UNIQUE indexes,
>> and improve the cardinality estimates directly, not just costing for
>> index scans.

> I tried to implement this in different ways. But it causes additional
> overhead and code complexity - analyzing a list of indexes and match
> clauses of each index with input clauses in each selectivity estimation.
> I don't like that way and propose a new patch in attachment.

I looked at this briefly. I do not think that messing with
btcostestimate/genericcostestimate is the right response at all.
The problem can be demonstrated with no index whatever, as in the
attached shortened version of the original example. I get

QUERY PLAN
---------------------------------------------------
Seq Scan on a (cost=0.00..46.02 rows=1 width=12)
Filter: ((x = 1) AND (y = 1) AND (z = 1))
(2 rows)

before adding the extended stats, and

QUERY PLAN
----------------------------------------------------
Seq Scan on a (cost=0.00..46.02 rows=28 width=12)
Filter: ((x = 1) AND (y = 1) AND (z = 1))
(2 rows)

afterwards. So the extended stats have made the rowcount
estimate significantly worse, which seems like an indicator of a
bug somewhere in extended stats. The more so because I can crank
default_statistics_target all the way to 10000 without these
estimates changing. If we can't get a dead-on estimate for a
2001-row table at that stats level, we're doing something wrong,
surely?

Also, I found that if I ask only for ndistinct stats,
I still get rows=1. The fishiness seems to be directly
a problem with dependencies stats.

regards, tom lane

Attachment	Content-Type	Size
badestimate.sql	text/plain	550 bytes

In response to

Re: Postgres picks suboptimal index after building of an extended statistics at 2021-08-30 06:44:28 from Andrey Lepikhov

Responses

Re: Postgres picks suboptimal index after building of an extended statistics at 2022-07-11 07:57:36 from Andrey Lepikhov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Justin Pryzby	2022-07-07 22:20:15	Re: explain analyze rows=%.0f
Previous Message	Nathan Bossart	2022-07-07 22:07:16	Re: remove more archiving overhead