Re: Planner doesn't look at LIMIT?

From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Planner doesn't look at LIMIT?
Date: 2005-07-22 16:09:37
Message-ID: 758d5e7f05072209094156c984@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 7/22/05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dawid Kuroczko <qnex42(at)gmail(dot)com> writes:
> > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;
>
> > Limit (cost=15912.20..15912.31 rows=1 width=272)
> > -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272)
>
> This is quite strange. The nestloop plan definitely should be preferred
> in the context of the LIMIT, considering that it has far lower estimated
> cost. And it is preferred in simple tests for me. It seems there must
> be something specific to your installation that's causing the planner to
> go wrong. Can you develop a self-contained test case that behaves this
> way for you?

Why, certainly. I did test it also on Gentoo Linux PostgreSQL 8.0.1 (yeah,
a bit older one), but the behaviour is the same. The test looks like this:

-- First lets make a "small" lookup table -- 400000 rows.
CREATE TABLE lookup (
lookup_id serial PRIMARY KEY,
value integer NOT NULL
);
INSERT INTO lookup (value) SELECT * FROM generate_series(1, 400000);
VACUUM ANALYZE lookup;
-- Then lets make a huge data table...
CREATE TABLE huge_data (
huge_data_id serial PRIMARY KEY,
lookup_id integer NOT NULL
);
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM lookup;
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 800 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 1 600 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 3 200 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 6 400 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000
-- You may want to put ANALYZE and EXPLAIN between each of these
-- steps. In my cases, at 12.8 mln rows PostgreSQL seems to go for hashjoin
-- in each case. YMMV, so you may try to push it up to 1024 mln rows.
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 25 600 000
ANALYZE huge_data;
EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;

My EXPLAIN FROM Linux (SMP P-III box), with PostgreSQL 8.0.1, during making
this test case:

qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------
Limit (cost=0.00..3.21 rows=1 width=12)
-> Nested Loop (cost=0.00..19557596.04 rows=6094777 width=12)
-> Seq Scan on huge_data (cost=0.00..95372.42 rows=6399942 width=8)
-> Index Scan using lookup_pkey on lookup (cost=0.00..3.02
rows=1 width=8)
Index Cond: ("outer".lookup_id = lookup.lookup_id)
(5 rows)

Time: 4,333 ms
qnex=# INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM
huge_data; -- 12 800 000
INSERT 0 6400000
Time: 501014,692 ms
qnex=# ANALYZE huge_data;
ANALYZE
Time: 4243,453 ms
qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=11719.00..11719.09 rows=1 width=12)
-> Hash Join (cost=11719.00..1212739.73 rows=12800185 width=12)
Hash Cond: ("outer".lookup_id = "inner".lookup_id)
-> Seq Scan on huge_data (cost=0.00..190747.84 rows=12800184 width=8)
-> Hash (cost=5961.00..5961.00 rows=400000 width=8)
-> Seq Scan on lookup (cost=0.00..5961.00 rows=400000 width=8)
(6 rows)

Regards,
Dawid

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-07-22 16:20:20 Re: Planner doesn't look at LIMIT?
Previous Message Germán Poó Caamaño 2005-07-22 14:59:53 Re: Constraint Exclusion on all tables

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-07-22 16:20:20 Re: Planner doesn't look at LIMIT?
Previous Message Madison Kelly 2005-07-22 15:09:32 Solved (was: Re: Another index question)