200 = 199 + 1?

From: Marko Tiikkaja <marko(at)joh(dot)to>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: 200 = 199 + 1?
Date: 2017-09-27 10:52:33
Message-ID: CAL9smLC_ZfUO1YxEWMoiuxhEA_VriRhk5U1BQj7wXJdWLYXWKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I just came across this very peculiar behavior:

=# create table foo(id int primary key);
CREATE TABLE
=# insert into foo select generate_series(1, 1000000);
INSERT 0 1000000
=# set enable_hashjoin to false; set enable_mergejoin to false;
SET
SET
=# explain select * from foo where id in (select i from generate_series(1,
200) i limit 199);
QUERY PLAN
--------------------------------------------------------------------------------------------
Nested Loop (cost=4.90..1648.52 rows=199 width=4)
-> HashAggregate (cost=4.48..6.47 rows=199 width=4)
Group Key: i.i
-> Limit (cost=0.00..1.99 rows=199 width=4)
-> Function Scan on generate_series i (cost=0.00..10.00
rows=1000 width=4)
-> Index Only Scan using foo_pkey on foo (cost=0.42..8.24 rows=1
width=4)
Index Cond: (id = i.i)
(7 rows)

=# explain select * from foo where id in (select i from generate_series(1,
200) i limit 200);
QUERY PLAN
--------------------------------------------------------------------------------------------
Nested Loop (cost=4.93..1653.00 rows=500000 width=4)
-> HashAggregate (cost=4.50..6.50 rows=200 width=4)
Group Key: i.i
-> Limit (cost=0.00..2.00 rows=200 width=4)
-> Function Scan on generate_series i (cost=0.00..10.00
rows=1000 width=4)
-> Index Only Scan using foo_pkey on foo (cost=0.42..8.22 rows=1
width=4)
Index Cond: (id = i.i)
(7 rows)

So it seems that once the HashAggregate estimates to return 200 rows or
more, something extremely weird happens and the Nested Loop's estimate goes
wild. I've recently seen numerous instances of this kind of a problem
where the row estimates from a nested loop's child nodes are very
reasonable but the loop itself goes absolutely nuts. I can't see how this
can possibly be justified.

I wonder if the nested loop shouldn't have some kind of a cap on its own
estimate if it's wildly off of what you'd get by multiplying the child
nodes' estimates with each other?

.m

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2017-09-27 11:06:55 Re: Float value 'Infinity' is cast to numeric 1 on Windows
Previous Message Michael Paquier 2017-09-27 10:48:14 Re: Enhancements to passwordcheck