Quick Links

Improve row estimation with multi-column unique indexes

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Improve row estimation with multi-column unique indexes
Date:	2026-06-24 08:24:05
Message-ID:	CAO6_XqoiwRoVFS1x1oaUpwUi5kKwOj1S1JVYBO2QX91ORJJoxg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Currently, the row estimation of a query matching a multi-column
unique index only relies on combining columns' selectivity. This leads
to a row estimation >1 despite matching the unique index, and possibly
avoiding using said index.

This issue is visible with the following queries:

CREATE TABLE visitor (id BIGINT GENERATED ALWAYS AS IDENTITY, org INT,
name TEXT);
CREATE UNIQUE INDEX ON visitor(org, name);
INSERT INTO visitor(org, name) SELECT *, 'Support' FROM
generate_series(1, 500000);
INSERT INTO visitor(org, name) SELECT 500000, 'User_' || i FROM
generate_series(1, 2000) as g(i);
-- Test with only multi column index
ANALYZE visitor;
EXPLAIN SELECT * FROM visitor WHERE visitor.org = 500000 AND
visitor.name = 'Support' ORDER BY visitor.id LIMIT 1;
-- Test with index on id
CREATE UNIQUE INDEX ON visitor(id);
EXPLAIN SELECT * FROM visitor WHERE visitor.org = 500000 AND
visitor.name = 'Support' ORDER BY visitor.id LIMIT 1;

Which gives the following plans:
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Limit (cost=2861.98..2861.98 rows=1 width=20)
-> Sort (cost=2861.98..2867.35 rows=2149 width=20)
Sort Key: id
-> Index Scan using visitor_org_name_idx on visitor
(cost=0.42..2851.24 rows=2149 width=20)
Index Cond: ((org = 500000) AND (name = 'Support'::text))

QUERY PLAN
--------------------------------------------------------------------------------------------
Limit (cost=0.42..9.15 rows=1 width=20)
-> Index Scan using visitor_id_idx on visitor
(cost=0.42..18753.42 rows=2149 width=20)
Filter: ((org = 500000) AND (name = 'Support'::text))

Despite matching the unique index, the planner estimates 2149 rows.
Adding an index on id, the planner sees the new index as cheaper and
switches to it.

There's already an unique index matching done in btcostestimate, but
it only adjusts numIndexTuples to 1. The attached patch improves this
behavior by:
- Adjusting the index's selectivity to match the numIndexTuples=1 value
- cost->indexSelectivity is used to propagate this new selectivity,
which is then used in genericcostestimate, similar to what's done with
numIndexTuples
- Index's path row is also adjusted in cost_index if index selectivity
is better than the initial row estimation

With those changes, the query plan becomes:
Limit (cost=8.45..8.46 rows=1 width=20)
-> Sort (cost=8.45..8.46 rows=1 width=20)
Sort Key: id
-> Index Scan using visitor_org_name_idx on visitor
(cost=0.42..8.44 rows=1 width=20)
Index Cond: ((org = 500000) AND (name = 'Support'::text))

Regards,
Anthonin Bonnefoy

Attachment	Content-Type	Size
v1-0001-Adjust-row-estimation-and-index-selectivity-with-.patch	application/octet-stream	5.8 KB

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Chao Li	2026-06-24 08:30:46	Re: psql: add tab completion for subscription wal_receiver_timeout
Previous Message	Fujii Masao	2026-06-24 08:23:36	Re: psql: add tab completion for subscription wal_receiver_timeout