| From: | Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Improve row estimation with multi-column unique indexes |
| Date: | 2026-06-24 08:24:05 |
| Message-ID: | CAO6_XqoiwRoVFS1x1oaUpwUi5kKwOj1S1JVYBO2QX91ORJJoxg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
Currently, the row estimation of a query matching a multi-column
unique index only relies on combining columns' selectivity. This leads
to a row estimation >1 despite matching the unique index, and possibly
avoiding using said index.
This issue is visible with the following queries:
CREATE TABLE visitor (id BIGINT GENERATED ALWAYS AS IDENTITY, org INT,
name TEXT);
CREATE UNIQUE INDEX ON visitor(org, name);
INSERT INTO visitor(org, name) SELECT *, 'Support' FROM
generate_series(1, 500000);
INSERT INTO visitor(org, name) SELECT 500000, 'User_' || i FROM
generate_series(1, 2000) as g(i);
-- Test with only multi column index
ANALYZE visitor;
EXPLAIN SELECT * FROM visitor WHERE visitor.org = 500000 AND
visitor.name = 'Support' ORDER BY visitor.id LIMIT 1;
-- Test with index on id
CREATE UNIQUE INDEX ON visitor(id);
EXPLAIN SELECT * FROM visitor WHERE visitor.org = 500000 AND
visitor.name = 'Support' ORDER BY visitor.id LIMIT 1;
Which gives the following plans:
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Limit (cost=2861.98..2861.98 rows=1 width=20)
-> Sort (cost=2861.98..2867.35 rows=2149 width=20)
Sort Key: id
-> Index Scan using visitor_org_name_idx on visitor
(cost=0.42..2851.24 rows=2149 width=20)
Index Cond: ((org = 500000) AND (name = 'Support'::text))
QUERY PLAN
--------------------------------------------------------------------------------------------
Limit (cost=0.42..9.15 rows=1 width=20)
-> Index Scan using visitor_id_idx on visitor
(cost=0.42..18753.42 rows=2149 width=20)
Filter: ((org = 500000) AND (name = 'Support'::text))
Despite matching the unique index, the planner estimates 2149 rows.
Adding an index on id, the planner sees the new index as cheaper and
switches to it.
There's already an unique index matching done in btcostestimate, but
it only adjusts numIndexTuples to 1. The attached patch improves this
behavior by:
- Adjusting the index's selectivity to match the numIndexTuples=1 value
- cost->indexSelectivity is used to propagate this new selectivity,
which is then used in genericcostestimate, similar to what's done with
numIndexTuples
- Index's path row is also adjusted in cost_index if index selectivity
is better than the initial row estimation
With those changes, the query plan becomes:
Limit (cost=8.45..8.46 rows=1 width=20)
-> Sort (cost=8.45..8.46 rows=1 width=20)
Sort Key: id
-> Index Scan using visitor_org_name_idx on visitor
(cost=0.42..8.44 rows=1 width=20)
Index Cond: ((org = 500000) AND (name = 'Support'::text))
Regards,
Anthonin Bonnefoy
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Adjust-row-estimation-and-index-selectivity-with-.patch | application/octet-stream | 5.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-06-24 08:30:46 | Re: psql: add tab completion for subscription wal_receiver_timeout |
| Previous Message | Fujii Masao | 2026-06-24 08:23:36 | Re: psql: add tab completion for subscription wal_receiver_timeout |