| From: | feichanghong <feichanghong(at)qq(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Optimize cardinality estimation when unique keys are fully covered |
| Date: | 2025-11-21 17:11:56 |
| Message-ID: | tencent_E033CE3A613FEE4CC0384E11F494B4DE7D09@qq.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Nov 22, 2025, at 00:54, feichanghong <feichanghong(at)qq(dot)com> wrote:
>
> Hi,
>
> Recently, I ran into a cardinality estimation problem where the query
> predicates fully cover a primary key or unique index. The issue can be
> reproduced with the following case:
>
> ```sql
> create table t1(a int, b int, primary key(a, b));
>
> DO $$
> DECLARE
> i integer;
> BEGIN
> FOR i IN 1..100000 LOOP
> INSERT INTO t1 VALUES (i, 0);
> END LOOP;
> END;
> $$ LANGUAGE plpgsql;
>
>
> DO $$
> DECLARE
> i integer;
> BEGIN
> FOR j IN 1..10 LOOP
> FOR i IN 1..100000 LOOP
> INSERT INTO t1 VALUES (i%10+10*(j-1), i);
> END LOOP;
> END LOOP;
> END;
> $$ LANGUAGE plpgsql;
>
>
>
> create table t2(a int);
> insert into t2 select 1;
>
> analyze t1, t2;
>
> postgres=# explain analyze select * from t1 where a = 1 and b = 0;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
> Recheck Cond: ((a = 1) AND (b = 0))
> Heap Blocks: exact=1
> Buffers: shared hit=4
> -> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
> Index Cond: ((a = 1) AND (b = 0))
> Index Searches: 1
> Buffers: shared hit=3
> Planning Time: 0.146 ms
> Execution Time: 0.105 ms
> (10 rows)
>
> postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
> Buffers: shared hit=5
> -> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
> Buffers: shared hit=1
> -> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
> Index Cond: ((a = t2.a) AND (b = 0))
> Heap Fetches: 0
> Index Searches: 1
> Buffers: shared hit=4
> Planning Time: 0.257 ms
> Execution Time: 0.114 ms
> (11 rows)
>
> ```
>
> It can be observed that the Index Cond fully covers the primary key index.
> Naturally, the correct rows estimate should be 1, but the optimizer
> estimates the two conditions independently, resulting in an overestimated
> row count. This overestimation has a greater impact in scenarios with
> partitioned tables and multi-table joins, making the planner more inclined
> to choose hash or merge joins.
>
> We may consider checking in cardinality estimation whether the restrictlist
> fully covers a unique index. If so, we can directly set the estimated rows
> to 1. The attached patch provides a very early demo of this approach.
>
Apologies for the garbled text due to my email client. Please find the
reproduction SQL below again:
```sql
create table t1(a int, b int, primary key(a, b));
DO $$
DECLARE
i integer;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i, 0);
END LOOP;
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE
i integer;
BEGIN
FOR j IN 1..10 LOOP
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i%10+10*(j-1), i);
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;
create table t2(a int);
insert into t2 select 1;
analyze t1, t2;
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
Recheck Cond: ((a = 1) AND (b = 0))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
Index Cond: ((a = 1) AND (b = 0))
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.146 ms
Execution Time: 0.105 ms
(10 rows)
postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
Buffers: shared hit=5
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
Buffers: shared hit=1
-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
Index Cond: ((a = t2.a) AND (b = 0))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=4
Planning Time: 0.257 ms
Execution Time: 0.114 ms
(11 rows)
```
Best Regards,
Fei Changhong
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Laurenz Albe | 2025-11-21 17:18:06 | Re: System views for versions reporting |
| Previous Message | feichanghong | 2025-11-21 16:54:10 | Optimize cardinality estimation when unique keys are fully covered |