Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?

From: Karl Denninger <karl(at)denninger(dot)net>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?
Date: 2010-05-08 20:35:19
Message-ID: 4BE5CB07.60508@denninger.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a message posted in pgsql-general that outlines what I thought
was an indexing problem - it's not, so I'm bringing it here.

I dumped the table from our production system and stuffed it into a test
machine, then started refining and playing with the query until I was
able to get it to the "de-minimus" that showed the issue. Note that the
actual query is frequently MUCH more complicated, but without the LIMIT
shown below the planner seems to do a decent job of figuring out how to
"get it done."

The actual table in question has ~2m rows totaling several gigabytes of
space.

Here's an abstract of the schema:

Table "public.post"
Column | Type |
Modifiers
-----------+--------------------------+--------------------------------------------------------
subject | text |
message | text |
inserted | timestamp with time zone |
modified | timestamp with time zone |
replied | timestamp with time zone |
ordinal | integer | not null default
nextval('post_ordinal_seq'::regclass)

Indexes:
"post_pkey" PRIMARY KEY, btree (ordinal)
"idx_message" gin (to_tsvector('english'::text, message))
"idx_subject" gin (to_tsvector('english'::text, subject))

There's a bunch of other stuff in the table and many more indices, plus
foreign references, but stripping the table down to JUST THIS shows the
problem.

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('violence') order by modified desc;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=31795.16..31819.68 rows=9808 width=436) (actual
time=14.222..17.213 rows=3421 loops=1)
Sort Key: modified
Sort Method: quicksort Memory: 3358kB
-> Bitmap Heap Scan on post (cost=1418.95..31144.90 rows=9808
width=436) (actual time=1.878..7.514 rows=3421 loops=1)
Recheck Cond: (to_tsvector('english'::text, message) @@
to_tsquery('violence'::text))
-> Bitmap Index Scan on idx_message (cost=0.00..1416.49
rows=9808 width=0) (actual time=1.334..1.334 rows=3434 loops=1)
Index Cond: (to_tsvector('english'::text, message) @@
to_tsquery('violence'::text))
Total runtime: 20.547 ms
(8 rows)

Ok, very nice. 20ms. I like that.

Now lets limit the return to 100 items:

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('violence') order by modified desc limit 100;

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..5348.69 rows=100 width=436) (actual
time=198.047..2607.077 rows=100 loops=1)
-> Index Scan Backward using post_modified on post
(cost=0.00..524599.31 rows=9808 width=436) (actual
time=198.043..2606.864 rows=100 loops=1)
Filter: (to_tsvector('english'::text, message) @@
to_tsquery('violence'::text))
Total runtime: 2607.231 ms
(4 rows)

Bad. Notice that the optimizer decided it was going to do an index scan
with an internal filter on it! That's BACKWARD; what I want is for the
planner to first execute the index scan on the GIN index, then order the
return and limit the returned data set.

But it gets much worse - let's use something that's NOT in the message
base (the table in question has some ~2m rows by the way and consumes
several gigabytes on disk - anything that actually READS the table is
instant "bad news!")

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('hosehead') order by modified;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------
Sort (cost=31795.16..31819.68 rows=9808 width=436) (actual
time=0.407..0.407 rows=0 loops=1)
Sort Key: modified
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on post (cost=1418.95..31144.90 rows=9808
width=436) (actual time=0.402..0.402 rows=0 loops=1)
Recheck Cond: (to_tsvector('english'::text, message) @@
to_tsquery('hosehead'::text))
-> Bitmap Index Scan on idx_message (cost=0.00..1416.49
rows=9808 width=0) (actual time=0.399..0.399 rows=0 loops=1)
Index Cond: (to_tsvector('english'::text, message) @@
to_tsquery('hosehead'::text))
Total runtime: 0.441 ms
(8 rows)

Very fast, as you'd expect - it returned nothing. Now let's try it with
a "LIMIT":

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('hosehead') order by modified limit 100;
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..5348.69 rows=100 width=436) (actual
time=254217.850..254217.850 rows=0 loops=1)
-> Index Scan using post_modified on post (cost=0.00..524599.31
rows=9808 width=436) (actual time=254217.847..254217.847 rows=0 loops=1)
Filter: (to_tsvector('english'::text, message) @@
to_tsquery('hosehead'::text))
Total runtime: 254217.891 ms
(4 rows)

ticker=#

Oh crap. It actually went through and looked at the entire freaking
table - one message at a time.

An attempt to re-write the query into something that FORCES the planner
to do the right thing fails too. For example:

ticker=# explain analyze select * from post where ordinal in (select
ordinal from post where to_tsvector('english', message) @@
to_tsquery('hosehead')) order by modified;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=94886.44..94910.96 rows=9808 width=436) (actual
time=0.884..0.884 rows=0 loops=1)
Sort Key: public.post.modified
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=31173.42..94236.19 rows=9808 width=436)
(actual time=0.879..0.879 rows=0 loops=1)
-> HashAggregate (cost=31173.42..31271.50 rows=9808 width=4)
(actual time=0.877..0.877 rows=0 loops=1)
-> Bitmap Heap Scan on post (cost=1422.95..31148.90
rows=9808 width=4) (actual time=0.850..0.850 rows=0 loops=1)
Recheck Cond: (to_tsvector('english'::text,
message) @@ to_tsquery('hosehead'::text))
-> Bitmap Index Scan on idx_message
(cost=0.00..1420.50 rows=9808 width=0) (actual time=0.848..0.848 rows=0
loops=1)
Index Cond: (to_tsvector('english'::text,
message) @@ to_tsquery('hosehead'::text))
-> Index Scan using post_ordinal on post (cost=0.00..6.41
rows=1 width=436) (never executed)
Index Cond: (public.post.ordinal = public.post.ordinal)
Total runtime: 0.985 ms
(12 rows)

Fast, if convoluted.

ticker=# explain analyze select * from post where ordinal in (select
ordinal from post where to_tsvector('english', message) @@
to_tsquery('hosehead')) order by modified limit 100;
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..19892.88 rows=100 width=436) (actual
time=270563.091..270563.091 rows=0 loops=1)
-> Nested Loop Semi Join (cost=0.00..1951093.77 rows=9808
width=436) (actual time=270563.088..270563.088 rows=0 loops=1)
-> Index Scan using post_modified on post
(cost=0.00..509887.63 rows=1961557 width=436) (actual
time=0.015..3427.627 rows=1953674 loops=1)
-> Index Scan using post_ordinal on post (cost=0.00..0.73
rows=1 width=4) (actual time=0.134..0.134 rows=0 loops=1953674)
Index Cond: (public.post.ordinal = public.post.ordinal)
Filter: (to_tsvector('english'::text,
public.post.message) @@ to_tsquery('hosehead'::text))
Total runtime: 270563.147 ms
(7 rows)

ticker=#

Ok, that didn't work either.

Interestingly enough, if I crank up the limit to 500, it starts behaving!

ticker=# explain analyze select * from post where ordinal in (select
ordinal from post where to_tsvector('english', message) @@
to_tsquery('hosehead')) order by modified limit 500;

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=94724.91..94726.16 rows=500 width=436) (actual
time=1.475..1.475 rows=0 loops=1)
-> Sort (cost=94724.91..94749.43 rows=9808 width=436) (actual
time=1.473..1.473 rows=0 loops=1)
Sort Key: public.post.modified
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=31173.43..94236.19 rows=9808 width=436)
(actual time=1.468..1.468 rows=0 loops=1)
-> HashAggregate (cost=31173.43..31271.51 rows=9808
width=4) (actual time=1.466..1.466 rows=0 loops=1)
-> Bitmap Heap Scan on post
(cost=1422.95..31148.91 rows=9808 width=4) (actual time=1.440..1.440
rows=0 loops=1)
Recheck Cond: (to_tsvector('english'::text,
message) @@ to_tsquery('hosehead'::text))
-> Bitmap Index Scan on idx_message
(cost=0.00..1420.50 rows=9808 width=0) (actual time=1.438..1.438 rows=0
loops=1)
Index Cond:
(to_tsvector('english'::text, message) @@ to_tsquery('hosehead'::text))
-> Index Scan using post_ordinal on post
(cost=0.00..6.41 rows=1 width=436) (never executed)
Index Cond: (public.post.ordinal = public.post.ordinal)
Total runtime: 1.600 ms
(13 rows)

Why is the planner "taking into consideration" the LIMIT (I know the
docs say it does) and choosing to sequentially scan a table of nearly 2
million rows?! I don't see how that makes sense.... irrespective of the
query being LIMITed.

If it matters setting enable_seqscan OFF does not impact the results.

-- Karl

Attachment Content-Type Size
karl.vcf text/x-vcard 124 bytes

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jignesh Shah 2010-05-08 21:39:02 Re: 8K recordsize bad on ZFS?
Previous Message Andy Colson 2010-05-08 13:17:13 Re: Slow Bulk Delete