Re: Optimize cardinality estimation when unique keys are fully covered

From: feichanghong <feichanghong(at)qq(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize cardinality estimation when unique keys are fully covered
Date: 2025-11-21 17:11:56
Message-ID: tencent_E033CE3A613FEE4CC0384E11F494B4DE7D09@qq.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 22, 2025, at 00:54, feichanghong <feichanghong(at)qq(dot)com> wrote:
>
> Hi,
>
> Recently, I ran into a cardinality estimation problem where the query
> predicates fully cover a primary key or unique index. The issue can be
> reproduced with the following case:
>
> ```sql
> create table t1(a int, b int, primary key(a, b));
>
> DO $$
> DECLARE
> i integer;
> BEGIN
> FOR i IN 1..100000 LOOP
> INSERT INTO t1 VALUES (i, 0);
> END LOOP;
> END;
> $$ LANGUAGE plpgsql;
>
>
> DO $$
> DECLARE
> i integer;
> BEGIN
> FOR j IN 1..10 LOOP
> FOR i IN 1..100000 LOOP
> INSERT INTO t1 VALUES (i%10+10*(j-1), i);
> END LOOP;
> END LOOP;
> END;
> $$ LANGUAGE plpgsql;
>
>
>
> create table t2(a int);
> insert into t2 select 1;
>
> analyze t1, t2;
>
> postgres=# explain analyze select * from t1 where a = 1 and b = 0;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
> Recheck Cond: ((a = 1) AND (b = 0))
> Heap Blocks: exact=1
> Buffers: shared hit=4
> -> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
> Index Cond: ((a = 1) AND (b = 0))
> Index Searches: 1
> Buffers: shared hit=3
> Planning Time: 0.146 ms
> Execution Time: 0.105 ms
> (10 rows)
>
> postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
> Buffers: shared hit=5
> -> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
> Buffers: shared hit=1
> -> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
> Index Cond: ((a = t2.a) AND (b = 0))
> Heap Fetches: 0
> Index Searches: 1
> Buffers: shared hit=4
> Planning Time: 0.257 ms
> Execution Time: 0.114 ms
> (11 rows)
>
> ```
>
> It can be observed that the Index Cond fully covers the primary key index.
> Naturally, the correct rows estimate should be 1, but the optimizer
> estimates the two conditions independently, resulting in an overestimated
> row count. This overestimation has a greater impact in scenarios with
> partitioned tables and multi-table joins, making the planner more inclined
> to choose hash or merge joins.
>
> We may consider checking in cardinality estimation whether the restrictlist
> fully covers a unique index. If so, we can directly set the estimated rows
> to 1. The attached patch provides a very early demo of this approach.
>
Apologies for the garbled text due to my email client. Please find the
reproduction SQL below again:

```sql
create table t1(a int, b int, primary key(a, b));

DO $$
DECLARE
i integer;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i, 0);
END LOOP;
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE
i integer;
BEGIN
FOR j IN 1..10 LOOP
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i%10+10*(j-1), i);
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;

create table t2(a int);
insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
Recheck Cond: ((a = 1) AND (b = 0))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
Index Cond: ((a = 1) AND (b = 0))
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.146 ms
Execution Time: 0.105 ms
(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
Buffers: shared hit=5
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
Buffers: shared hit=1
-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
Index Cond: ((a = t2.a) AND (b = 0))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=4
Planning Time: 0.257 ms
Execution Time: 0.114 ms
(11 rows)

```

Best Regards,
Fei Changhong

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2025-11-21 17:18:06 Re: System views for versions reporting
Previous Message feichanghong 2025-11-21 16:54:10 Optimize cardinality estimation when unique keys are fully covered