Re: Regression in join selectivity estimations when using foreign keys

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Regression in join selectivity estimations when using foreign keys
Date: 2017-05-20 01:59:43
Message-ID: CAKJS1f_86xU5ht4Tw8e=jyoegZKDZVCWO-2YXRUHJLMrSskhXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 May 2017 at 20:28, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> A vastly simplified example case is:
>
> create table fkest (a int, b int, c int unique, primary key(a,b));
> create table fkest1 (a int, b int, primary key(a,b));
>
> insert into fkest select x/10,x%10, x from generate_Series(1,400) x;
> insert into fkest1 select x/10,x%10 from generate_Series(1,400) x;
>
> alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b)
> references fkest;
>
> analyze fkest;
> analyze fkest1;
>
> explain (costs on) select * from fkest f
> left join fkest1 f1 on f.a = f1.a and f.b = f1.b
> left join fkest1 f2 on f.a = f2.a and f.b = f2.b
> left join fkest1 f3 on f.a = f3.a and f.b = f3.b
> where f.c = 1;

I should have shown the EXPLAIN ANALYZE of this instead.

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=24.15..41.89 rows=996 width=36) (actual
time=0.430..0.463 rows=1 loops=1)
Hash Cond: ((f.a = f3.a) AND (f.b = f3.b))
-> Hash Left Join (cost=12.15..28.36 rows=100 width=28) (actual
time=0.255..0.288 rows=1 loops=1)
Hash Cond: ((f.a = f2.a) AND (f.b = f2.b))
-> Nested Loop Left Join (cost=0.15..16.21 rows=10
width=20) (actual time=0.046..0.079 rows=1 loops=1)
-> Seq Scan on fkest f (cost=0.00..8.00 rows=1
width=12) (actual time=0.013..0.045 rows=1 loops=1)
Filter: (c = 1)
Rows Removed by Filter: 399
-> Index Only Scan using fkest1_pkey on fkest1 f1
(cost=0.15..8.17 rows=1 width=8) (actual time=0.031..0.031 rows=1
loops=1)
Index Cond: ((a = f.a) AND (b = f.b))
Heap Fetches: 1
-> Hash (cost=6.00..6.00 rows=400 width=8) (actual
time=0.180..0.180 rows=400 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on fkest1 f2 (cost=0.00..6.00 rows=400
width=8) (actual time=0.006..0.041 rows=400 loops=1)
-> Hash (cost=6.00..6.00 rows=400 width=8) (actual
time=0.162..0.162 rows=400 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on fkest1 f3 (cost=0.00..6.00 rows=400 width=8)
(actual time=0.010..0.040 rows=400 loops=1)
Planning time: 0.409 ms
Execution time: 0.513 ms
(19 rows)

which you can obviously see the poor estimate propagating to up the plan tree.

If we add another left join the final estimate is even worse:

explain analyze select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
left join fkest1 f4 on f.a = f4.a and f.b = f4.b where f.c = 1;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=36.15..69.06 rows=9915 width=44) (actual
time=0.535..0.569 rows=1 loops=1)
Hash Cond: ((f.a = f4.a) AND (f.b = f4.b))
-> Hash Left Join (cost=24.15..41.89 rows=996 width=36) (actual
time=0.371..0.404 rows=1 loops=1)
Hash Cond: ((f.a = f3.a) AND (f.b = f3.b))
-> Hash Left Join (cost=12.15..28.36 rows=100 width=28)
(actual time=0.208..0.241 rows=1 loops=1)
Hash Cond: ((f.a = f2.a) AND (f.b = f2.b))
-> Nested Loop Left Join (cost=0.15..16.21 rows=10
width=20) (actual time=0.029..0.062 rows=1 loops=1)
-> Seq Scan on fkest f (cost=0.00..8.00 rows=1
width=12) (actual time=0.014..0.047 rows=1 loops=1)
Filter: (c = 1)
Rows Removed by Filter: 399
-> Index Only Scan using fkest1_pkey on fkest1
f1 (cost=0.15..8.17 rows=1 width=8) (actual time=0.012..0.012 rows=1
loops=1)
Index Cond: ((a = f.a) AND (b = f.b))
Heap Fetches: 1
-> Hash (cost=6.00..6.00 rows=400 width=8) (actual
time=0.168..0.168 rows=400 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on fkest1 f2 (cost=0.00..6.00
rows=400 width=8) (actual time=0.008..0.043 rows=400 loops=1)
-> Hash (cost=6.00..6.00 rows=400 width=8) (actual
time=0.156..0.156 rows=400 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on fkest1 f3 (cost=0.00..6.00 rows=400
width=8) (actual time=0.006..0.035 rows=400 loops=1)
-> Hash (cost=6.00..6.00 rows=400 width=8) (actual
time=0.155..0.155 rows=400 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on fkest1 f4 (cost=0.00..6.00 rows=400 width=8)
(actual time=0.004..0.034 rows=400 loops=1)
Planning time: 0.864 ms
Execution time: 0.698 ms

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-05-20 02:18:28 Re: [POC] Faster processing at Gather node
Previous Message Tom Lane 2017-05-20 00:49:00 Re: [Bug fix]If recovery.conf has target_session_attrs=read-write, the standby fails to start.