512, 600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

From: Bryce Nesbitt <bryce2(at)obviously(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: 512, 600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Date: 2010-02-10 04:04:18
Message-ID: 4B723042.7090406@obviously.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I've got a very slow query, which I can make faster by doing something
seemingly trivial.
The query has been trouble for years (always slow, sometimes taking hours):

512,600ms Original, filter on articles.indexed (622 results)
7,500ms Remove "AND articles.indexed" (726 results, undesirable).
7,675ms Extra join for "AND articles.indexed" (622 results, same as
original).

Hardware is Postgres 8.3 on a Sunfire X4240 under Debian Lenny, with a
fresh ANALYZE. What I don't understand is why the improvement? Is the
second way of doing things actually superior, or is this just a query
planner edge case?

Original (512,600ms)
---------------------
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
(SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
INTERSECT
SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
(SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
WHERE lower(bp_categories.category) = 'law')
AND articles.indexed;

Extra join (7,675ms)
---------------------------------------------
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts JOIN articles using (context_key)
WHERE contexts.context_key IN
(
SELECT contexts.context_key FROM contexts
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
(SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
INTERSECT
SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
(SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
WHERE lower(bp_categories.category) = 'law')
)
AND articles.indexed;

# select indexed,count(*) from articles group by indexed;
indexed | count
---------+--------
t | 354605
f | 513552

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=131663.39..140005.40 rows=4769 width=4) (actual
time=511893.241..512599.348 rows=622 loops=1)
Hash Cond: (matview_82034.context_key = articles.context_key)
-> Seq Scan on matview_82034 (cost=0.00..6596.00 rows=465600
width=4) (actual time=0.019..463.278 rows=438220 loops=1)
-> Hash (cost=131663.38..131663.38 rows=1 width=16) (actual
time=511659.671..511659.671 rows=622 loops=1)
-> Nested Loop IN Join (cost=46757.70..131663.38 rows=1
width=16) (actual time=1142.789..511656.211 rows=622 loops=1)
Join Filter: (a.context_key = articles.context_key)
-> Nested Loop (cost=46757.70..46789.06 rows=2
width=12) (actual time=688.057..839.297 rows=1472 loops=1)
-> Nested Loop (cost=46757.70..46780.26 rows=2
width=8) (actual time=688.022..799.945 rows=1472 loops=1)
-> Subquery Scan "IN_subquery"
(cost=46757.70..46757.97 rows=5 width=4) (actual time=687.963..743.587
rows=1652 loops=1)
-> SetOp Intersect
(cost=46757.70..46757.93 rows=5 width=4) (actual time=687.961..738.955
rows=1652 loops=1)
-> Sort
(cost=46757.70..46757.81 rows=46 width=4) (actual time=687.943..709.972
rows=19527 loops=1)
Sort Key: "*SELECT*
1".context_key
Sort Method: quicksort
Memory: 1684kB
-> Append
(cost=0.00..46756.43 rows=46 width=4) (actual time=8.385..657.839
rows=19527 loops=1)
-> Subquery Scan
"*SELECT* 1" (cost=0.00..23378.21 rows=23 width=4) (actual
time=8.383..215.613 rows=4002 loops=1)
-> Nested
Loop (cost=0.00..23377.98 rows=23 width=4) (actual time=8.380..207.499
rows=4002 loops=1)
-> Index
Scan using words_word on words (cost=0.00..5.47 rows=1 width=4) (actual
time=0.102..0.105 rows=1 loops=1)

Index Cond: ((word)::text = 'insider'::text)
-> Index
Scan using article_words_wc on article_words (cost=0.00..23219.17
rows=12268 width=8) (actual time=8.272..199.224 rows=4002 loops=1)

Index Cond: (public.article_words.word_key = public.words.word_key)
-> Subquery Scan
"*SELECT* 2" (cost=0.00..23378.21 rows=23 width=4) (actual
time=5.397..404.164 rows=15525 loops=1)
-> Nested
Loop (cost=0.00..23377.98 rows=23 width=4) (actual time=5.394..372.883
rows=15525 loops=1)
-> Index
Scan using words_word on words (cost=0.00..5.47 rows=1 width=4) (actual
time=0.054..0.056 rows=1 loops=1)

Index Cond: ((word)::text = 'trading'::text)
-> Index
Scan using article_words_wc on article_words (cost=0.00..23219.17
rows=12268 width=8) (actual time=5.331..341.535 rows=15525 loops=1)

Index Cond: (public.article_words.word_key = public.words.word_key)
-> Index Scan using article_key_idx on
articles (cost=0.00..4.44 rows=1 width=4) (actual time=0.026..0.029
rows=1 loops=1652)
Index Cond: (articles.context_key =
"IN_subquery".context_key)
Filter: articles.indexed
-> Index Scan using contexts_pkey on contexts
(cost=0.00..4.39 rows=1 width=4) (actual time=0.018..0.021 rows=1
loops=1472)
Index Cond: (contexts.context_key =
articles.context_key)
-> Nested Loop (cost=0.00..111539.51 rows=1261757
width=4) (actual time=0.019..306.679 rows=39189 loops=1472)
-> Seq Scan on bp_categories (cost=0.00..1315.59
rows=16613 width=4) (actual time=0.008..57.960 rows=14529 loops=1472)
Filter: (lower(category) = 'law'::text)
-> Index Scan using virtual_ancestor_key_idx on
virtual_ancestors a (cost=0.00..5.18 rows=116 width=8) (actual
time=0.005..0.010 rows=3 loops=21386112)
Index Cond: (a.ancestor_key =
bp_categories.context_key)
Total runtime: 512600.354 ms
(37 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jayadevan M 2010-02-10 04:09:02 PostgreSQL - case studies
Previous Message Dimi Paun 2010-02-10 01:43:47 Re: DISTINCT vs. GROUP BY