Regression in join selectivity estimations when using foreign keys

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Regression in join selectivity estimations when using foreign keys
Date: 2017-05-18 08:28:56
Message-ID: CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been analyzing a reported regression case between a 9.5 plan and
a 9.6 plan. I tracked this down to the foreign key join selectivity
code, specifically the use_smallest_selectivity code which is applied
to outer joins where the referenced table is on the outer side of the
join.

A vastly simplified example case is:

create table fkest (a int, b int, c int unique, primary key(a,b));
create table fkest1 (a int, b int, primary key(a,b));

insert into fkest select x/10,x%10, x from generate_Series(1,400) x;
insert into fkest1 select x/10,x%10 from generate_Series(1,400) x;

alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b)
references fkest;

analyze fkest;
analyze fkest1;

explain (costs on) select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
where f.c = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Hash Left Join (cost=24.15..41.89 rows=996 width=36)
Hash Cond: ((f.a = f3.a) AND (f.b = f3.b))
-> Hash Left Join (cost=12.15..28.36 rows=100 width=28)
Hash Cond: ((f.a = f2.a) AND (f.b = f2.b))
-> Nested Loop Left Join (cost=0.15..16.21 rows=10 width=20)
-> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12)
Filter: (c = 1)
-> Index Only Scan using fkest1_pkey on fkest1 f1
(cost=0.15..8.17 rows=1 width=8)
Index Cond: ((a = f.a) AND (b = f.b))
-> Hash (cost=6.00..6.00 rows=400 width=8)
-> Seq Scan on fkest1 f2 (cost=0.00..6.00 rows=400 width=8)
-> Hash (cost=6.00..6.00 rows=400 width=8)
-> Seq Scan on fkest1 f3 (cost=0.00..6.00 rows=400 width=8)
(13 rows)

-- now drop the foreign key to simulate the behaviour pre-9.6

alter table fkest1 drop constraint fkest1_a_b_fkey;

explain (costs on) select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
where f.c = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.44..32.62 rows=1 width=36)
-> Nested Loop Left Join (cost=0.29..24.41 rows=1 width=28)
-> Nested Loop Left Join (cost=0.15..16.21 rows=1 width=20)
-> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12)
Filter: (c = 1)
-> Index Only Scan using fkest1_pkey on fkest1 f1
(cost=0.15..8.17 rows=1 width=8)
Index Cond: ((a = f.a) AND (b = f.b))
-> Index Only Scan using fkest1_pkey on fkest1 f2
(cost=0.15..8.17 rows=1 width=8)
Index Cond: ((a = f.a) AND (b = f.b))
-> Index Only Scan using fkest1_pkey on fkest1 f3
(cost=0.15..8.17 rows=1 width=8)
Index Cond: ((a = f.a) AND (b = f.b))
(11 rows)

The problem is that the use_smallest_selectivity in
get_foreign_key_join_selectivity() here is calculating 1/10 instead of
1/100 for each join level. Really it should be multiplying the
selectivities instead of taking the minimum selectivity, but even that
does not seem correct.

That bad estimate propagates up the join tree until some pretty insane
estimates come back.

The following is the offending code:

foreach(cell, removedlist)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
Selectivity csel;

csel = clause_selectivity(root, (Node *) rinfo,
0, jointype, sjinfo);
thisfksel = Min(thisfksel, csel);
}

This code is only applied to outer joins. I've studied this a bit and
generally, I'm unsure why we're treating outer joins in a special way
at all. Some comments indicate that this is in regards to NULLs, but
I'm not sure why we've code special code for outer joins and none for
INNER joins.

It's quite easy to show how this regresses in general in regards to
NULLs on INNER joins with:

create table part (partid int primary key);
create table sale (saleid int primary key, partid int references part);
insert into part select generate_series(1,1000);
insert into sale select x,NULL from generate_series(1,1000) x;
analyze part;
analyze sale;
explain analyze select * from part p inner join sale s on p.partid =
s.partid; -- using foreign key join estimations
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=27.50..55.12 rows=1000 width=12) (actual
time=0.733..0.733 rows=0 loops=1)
Hash Cond: (s.partid = p.partid)
-> Seq Scan on sale s (cost=0.00..15.00 rows=1000 width=8)
(actual time=0.018..0.332 rows=1000 loops=1)
-> Hash (cost=15.00..15.00 rows=1000 width=4) (actual
time=0.336..0.336 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 44kB
-> Seq Scan on part p (cost=0.00..15.00 rows=1000 width=4)
(actual time=0.014..0.132 rows=1000 loops=1)
Planning time: 0.468 ms
Execution time: 0.787 ms
(8 rows)

alter table sale drop constraint sale_partid_fkey; -- simulate pre-9.6
behaviour by dropping the fkey

explain analyze select * from part p inner join sale s on p.partid =
s.partid; -- normal join estimations.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=27.50..55.12 rows=1 width=12) (actual
time=0.546..0.546 rows=0 loops=1)
Hash Cond: (s.partid = p.partid)
-> Seq Scan on sale s (cost=0.00..15.00 rows=1000 width=8)
(actual time=0.014..0.102 rows=1000 loops=1)
-> Hash (cost=15.00..15.00 rows=1000 width=4) (actual
time=0.387..0.387 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 44kB
-> Seq Scan on part p (cost=0.00..15.00 rows=1000 width=4)
(actual time=0.009..0.110 rows=1000 loops=1)
Planning time: 0.278 ms
Execution time: 0.572 ms

I've attached fixes, based on master, for both of these cases.

Patches:

0001: Is a minimal fix for the repored regression, but may cause
further regression as it does nothing to account for NULLs valued
columns in outer joins, but at least it is consistent with how INNER
joins are estimated regarding NULLs.

0002: Adds further code to apply the nullfrac from pg_statistics.

I've given this a round of testing and I can't find any case where it
gives worse estimates than the standard join estimation as seen when
you drop the foreign key, but I'd like to challenge someone else to
give it a go and see if they can.... Not because I'm super confident,
more just because I want this to be correct.

Thoughts?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Minimal-fix-for-foreign-key-join-estimations.patch application/octet-stream 8.3 KB
0002-Apply-nullfrac-during-foreign-key-join-estimations.patch application/octet-stream 4.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-05-18 08:38:40 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Tsunakawa, Takayuki 2017-05-18 08:05:58 Re: [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur