Re: severe performance issue with planner

From: "Eric Brown" <bigwhitecow(at)hotmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: severe performance issue with planner
Date: 2004-03-23 01:30:29
Message-ID: BAY13-F143A1R1q0iLo000112cf@hotmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

ok, at the advice of jurka, I will post new results:

Here's the query as I have changed it now:
SELECT
w8.wid,
w8.variant,
w8.num_variants,
sum_text(w8.unicode) as unicodes,
sum_text(w8.pinyin) as pinyins
FROM
(words as w8 JOIN
(words as w7 JOIN
(words as w6 JOIN
(words as w5 JOIN
(words as w4 JOIN
(words as w3 JOIN
(words as w2 JOIN
(words as w0 JOIN words as w1

ON(w1.wid = w0.wid AND w1.variant = w0.variant AND w1.sequence = w0.sequence
+ 1 AND w1.pinyin LIKE 'fu_'))
ON(w2.wid = w1.wid AND w2.variant = w1.variant AND w2.sequence = w1.sequence
+ 1 AND w2.pinyin LIKE 'ji_'))
ON(w3.wid = w2.wid AND w3.variant = w2.variant AND w3.sequence = w2.sequence
+ 1 AND w3.pinyin LIKE 'guan_'))
ON(w4.wid = w3.wid AND w4.variant = w3.variant AND w4.sequence = w3.sequence
+ 1 AND w4.pinyin LIKE 'kai_'))
ON(w5.wid = w4.wid AND w5.variant = w4.variant AND w5.sequence = w4.sequence
+ 1 AND w5.pinyin LIKE 'fang_'))
ON(w6.wid = w5.wid AND w6.variant = w5.variant AND w6.sequence = w5.sequence
+ 1 AND w6.pinyin LIKE 'xi_'))
ON(w7.wid = w6.wid AND w7.variant = w6.variant AND w7.sequence = w6.sequence
+ 1 AND w7.pinyin LIKE 'tong_'))
ON(w8.wid = w7.wid AND w8.variant = w7.variant))

WHERE
w0.wid > 0 AND
w0.pinyin = 'zheng4' AND
w0.def_exists = 't' AND
w0.sequence = 0
GROUP BY
w8.wid,
w8.variant,
w8.num_variants,
w8.page_order,
w0.sequence ,
w1.sequence ,
w2.sequence ,
w3.sequence ,
w4.sequence ,
w5.sequence ,
w6.sequence ,
w7.sequence
ORDER BY
w8.page_order;

And here's the output of explain analyze:


QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=54.26..54.26 rows=1 width=43) (actual time=14.916..14.917 rows=1
loops=1)
Sort Key: w8.page_order
-> HashAggregate (cost=54.24..54.25 rows=1 width=43) (actual
time=14.891..14.892 rows=1 loops=1)
-> Nested Loop (cost=0.00..54.10 rows=4 width=43) (actual
time=3.676..14.446 rows=12 loops=1)
-> Nested Loop (cost=0.00..48.09 rows=1 width=64) (actual
time=3.638..14.269 rows=1 loops=1)
Join Filter: ("outer"."sequence" = ("inner"."sequence"
+ 1))
-> Nested Loop (cost=0.00..42.06 rows=1 width=56)
(actual time=3.581..14.181 rows=1 loops=1)
Join Filter: (("inner".wid = "outer".wid) AND
("inner".variant = "outer".variant) AND ("inner"."sequence" =
("outer"."sequence" + 1)) AND ("outer"."sequence" = ("inner"."sequence" +
1)))
-> Nested Loop (cost=0.00..36.05 rows=1
width=48) (actual time=2.152..12.443 rows=1 loops=1)
Join Filter: (("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
-> Nested Loop (cost=0.00..30.03 rows=1
width=40) (actual time=2.104..12.368 rows=1 loops=1)
Join Filter: (("outer".variant =
"inner".variant) AND ("outer".wid = "inner".wid) AND ("inner"."sequence" =
("outer"."sequence" + 1)))
-> Nested Loop (cost=0.00..24.02
rows=1 width=32) (actual time=2.040..11.226 rows=1 loops=1)
Join Filter:
("outer"."sequence" = ("inner"."sequence" + 1))
-> Nested Loop
(cost=0.00..18.00 rows=1 width=24) (actual time=1.979..11.147 rows=1
loops=1)
Join Filter:
(("inner".variant = "outer".variant) AND ("inner".wid = "outer".wid))
-> Nested Loop
(cost=0.00..12.00 rows=1 width=16) (actual time=0.258..8.765 rows=1 loops=1)
-> Index Scan
using pinyin_index on words w3 (cost=0.00..5.99 rows=1 width=8) (actual
time=0.084..2.399 rows=304 loops=1)
Index Cond:
(((pinyin)::text >= 'guan'::character varying) AND ((pinyin)::text <
'guao'::character varying))
Filter:
((pinyin)::text ~~ 'guan_'::text)
-> Index Scan
using words2_pkey on words w1 (cost=0.00..6.00 rows=1 width=8) (actual
time=0.018..0.018 rows=0 loops=304)
Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
Filter:
((pinyin)::text ~~ 'fu_'::text)
-> Index Scan using
pinyin_index on words w7 (cost=0.00..5.99 rows=1 width=8) (actual
time=0.025..1.863 rows=338 loops=1)
Index Cond:
(((pinyin)::text >= 'tong'::character varying) AND ((pinyin)::text <
'tonh'::character varying))
Filter:
((pinyin)::text ~~ 'tong_'::text)
-> Index Scan using
words2_pkey on words w6 (cost=0.00..6.00 rows=1 width=8) (actual
time=0.037..0.052 rows=1 loops=1)
Index Cond: ((w6.wid =
"outer".wid) AND (w6.variant = "outer".variant))
Filter: ((pinyin)::text
~~ 'xi_'::text)
-> Index Scan using pinyin_index on
words w4 (cost=0.00..5.99 rows=1 width=8) (actual time=0.028..0.874
rows=165 loops=1)
Index Cond: (((pinyin)::text >=
'kai'::character varying) AND ((pinyin)::text < 'kaj'::character varying))
Filter: ((pinyin)::text ~~
'kai_'::text)
-> Index Scan using words2_pkey on words
w2 (cost=0.00..6.00 rows=1 width=8) (actual time=0.023..0.047 rows=1
loops=1)
Index Cond: (("outer".wid = w2.wid)
AND ("outer".variant = w2.variant))
Filter: ((pinyin)::text ~~
'ji_'::text)
-> Index Scan using pinyin_index on words w5
(cost=0.00..5.99 rows=1 width=8) (actual time=0.025..1.436 rows=259 loops=1)
Index Cond: (((pinyin)::text >=
'fang'::character varying) AND ((pinyin)::text < 'fanh'::character varying))
Filter: ((pinyin)::text ~~ 'fang_'::text)
-> Index Scan using words2_pkey on words w0
(cost=0.00..6.01 rows=1 width=8) (actual time=0.030..0.058 rows=1 loops=1)
Index Cond: (("outer".wid = w0.wid) AND (w0.wid >
0) AND ("outer".variant = w0.variant))
Filter: (((pinyin)::text = 'zheng4'::text) AND
(def_exists = true) AND ("sequence" = 0))
-> Index Scan using words2_pkey on words w8
(cost=0.00..6.00 rows=1 width=27) (actual time=0.019..0.103 rows=12 loops=1)
Index Cond: ((w8.wid = "outer".wid) AND (w8.variant =
"outer".variant))
Total runtime: 15.987 ms
(44 rows)

Time: 838.446 ms

As you can see, there still appears to be some issue with planning taking
~820ms, while the actual query should only take 16ms (and as you can see i
used the "folding" technique which takes some load off of the planner.

Here is the old query/analysis as run right now:
SELECT
w8.wid,
w8.variant,
w8.num_variants,
sum_text(w8.unicode) as unicodes,
sum_text(w8.pinyin) as pinyins
FROM
words as w0, words as w1,
words as w2, words as w3,
words as w4, words as w5,
words as w6, words as w7,
words as w8
WHERE
w0.wid > 0 AND
w0.pinyin = 'zheng4' AND
w0.def_exists = 't' AND
w0.sequence = 0 AND
w1.wid = w0.wid AND
w1.pinyin LIKE 'fu_' AND
w1.variant = w0.variant AND
w1.sequence = (w0.sequence + 1) AND
w2.wid = w1.wid AND
w2.pinyin LIKE 'ji_' AND
w2.variant = w1.variant AND
w2.sequence = (w1.sequence + 1) AND
w3.wid = w2.wid AND
w3.pinyin LIKE 'guan_' AND
w3.variant = w2.variant AND
w3.sequence = (w2.sequence + 1) AND
w4.wid = w3.wid AND
w4.pinyin LIKE 'kai_' AND
w4.variant = w3.variant AND
w4.sequence = (w3.sequence + 1) AND
w5.wid = w4.wid AND
w5.pinyin LIKE 'fang_' AND
w5.variant = w4.variant AND
w5.sequence = (w4.sequence + 1) AND
w6.wid = w5.wid AND
w6.pinyin LIKE 'xi_' AND
w6.variant = w5.variant AND
w6.sequence = (w5.sequence + 1) AND
w7.wid = w6.wid AND
w7.pinyin LIKE 'tong_' AND
w7.variant = w6.variant AND
w7.sequence = (w6.sequence + 1) AND
w8.wid = w7.wid AND
w8.variant = w7.variant
GROUP BY
w8.wid,
w8.variant,
w8.num_variants,
w8.page_order ,
w0.sequence ,
w1.sequence ,
w2.sequence ,
w3.sequence ,
w4.sequence ,
w5.sequence ,
w6.sequence ,
w7.sequence
ORDER BY
w8.page_order;

wepy=> EXPLAIN ANALYZE wepy=> explain ANALYZE
QUERY
PLAN
wepy-> wepy->
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
wepy-> wepy-> Sort (cost=54.15..54.16 rows=1 width=43) (actual
time=1043.148..1043.149 rows=1 loops=1)
wepy-> wepy-> Sort Key: w8.page_order
wepy-> wepy-> -> HashAggregate (cost=54.14..54.14 rows=1 width=43)
(actual time=1043.121..1043.122 rows=1 loops=1)
wepy-> wepy-> -> Nested Loop (cost=0.00..54.10 rows=1 width=43)
(actual time=9.627..1042.565 rows=12 loops=1)
wepy-> wepy-> Join Filter: (("inner"."sequence" =
("outer"."sequence" + 1)) AND ("outer"."sequence" = ("inner"."sequence" +
1)))
wepy-> wepy-> -> Nested Loop (cost=0.00..48.08 rows=1
width=83) (actual time=9.557..1041.784 rows=12 loops=1)
wepy-> wepy-> Join Filter: (("outer".wid = "inner".wid)
AND ("outer".variant = "inner".variant) AND ("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
wepy-> wepy-> -> Nested Loop (cost=0.00..11.99 rows=1
width=16) (actual time=3.152..290.176 rows=6 loops=1)
wepy-> wepy-> Join Filter: (("inner".wid =
"outer".wid) AND ("inner".variant = "outer".variant) AND ("inner"."sequence"
= ("outer"."sequence" + 1)))
wepy-> wepy-> -> Index Scan using pinyin_index
on words w4 (cost=0.00..5.99 rows=1 width=8) (actual time=0.084..1.233
rows=165 loops=1)
wepy-> wepy-> Index Cond: (((pinyin)::text
>= 'kai'::character varying) AND ((pinyin)::text < 'kaj'::character
varying))
wepy-> wepy-> Filter: ((pinyin)::text ~~
'kai_'::text)
wepy-> wepy-> -> Index Scan using pinyin_index
on words w5 (cost=0.00..5.99 rows=1 width=8) (actual time=0.018..1.411
rows=259 loops=165)
wepy-> wepy-> Index Cond: (((pinyin)::text
>= 'fang'::character varying) AND ((pinyin)::text < 'fanh'::character
varying))
wepy-> wepy-> Filter: ((pinyin)::text ~~
'fang_'::text)
wepy-> wepy-> -> Nested Loop (cost=0.00..36.06 rows=1
width=67) (actual time=4.500..125.184 rows=12 loops=6)
wepy-> wepy-> -> Nested Loop (cost=0.00..30.05
rows=1 width=40) (actual time=4.446..125.003 rows=1 loops=6)
wepy-> wepy-> Join Filter:
("inner"."sequence" = ("outer"."sequence" + 1))
wepy-> wepy-> -> Nested Loop
(cost=0.00..24.03 rows=1 width=32) (actual time=4.391..124.920 rows=1
loops=6)
wepy-> wepy-> -> Nested Loop
(cost=0.00..18.01 rows=1 width=24) (actual time=4.339..124.781 rows=3
loops=6)
wepy-> wepy-> Join Filter:
(("inner".variant = "outer".variant) AND ("inner".wid = "outer".wid) AND
("inner"."sequence" = ("outer"."sequence" + 1)))
wepy-> wepy-> -> Nested Loop
(cost=0.00..12.00 rows=1 width=16) (actual time=0.154..10.898 rows=18
loops=6)
wepy-> wepy-> -> Index
Scan using pinyin_index on words w3 (cost=0.00..5.99 rows=1 width=8)
(actual time=0.027..2.358 rows=304 loops=6)
wepy-> wepy-> Index
Cond: (((pinyin)::text >= 'guan'::character varying) AND ((pinyin)::text <
'guao'::character varying))
wepy-> wepy->
Filter: ((pinyin)::text ~~ 'guan_'::text)
wepy-> wepy-> -> Index
Scan using words2_pkey on words w6 (cost=0.00..6.00 rows=1 width=8) (actual
time=0.025..0.025 rows=0 loops=1824)
wepy-> wepy-> Index
Cond: ((w6.wid = "outer".wid) AND (w6.variant = "outer".variant))
wepy-> wepy->
Filter: ((pinyin)::text ~~ 'xi_'::text)
wepy-> wepy-> -> Index Scan
using pinyin_index on words w7 (cost=0.00..5.99 rows=1 width=8) (actual
time=0.017..5.788 rows=338 loops=108)
wepy-> wepy-> Index Cond:
(((pinyin)::text >= 'tong'::character varying) AND ((pinyin)::text <
'tonh'::character varying))
wepy-> wepy-> Filter:
((pinyin)::text ~~ 'tong_'::text)
wepy-> wepy-> -> Index Scan using
words2_pkey on words w0 (cost=0.00..6.01 rows=1 width=8) (actual
time=0.026..0.035 rows=0 loops=18)
wepy-> wepy-> Index Cond:
(("outer".wid = w0.wid) AND (w0.wid > 0) AND ("outer".variant = w0.variant))
wepy-> wepy-> Filter:
(((pinyin)::text = 'zheng4'::text) AND (def_exists = true) AND ("sequence" =
0))
wepy-> wepy-> -> Index Scan using
words2_pkey on words w1 (cost=0.00..6.00 rows=1 width=8) (actual
time=0.021..0.047 rows=1 loops=6)
wepy-> wepy-> Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
wepy-> wepy-> Filter: ((pinyin)::text
~~ 'fu_'::text)
wepy-> wepy-> -> Index Scan using words2_pkey on
words w8 (cost=0.00..6.00 rows=1 width=27) (actual time=0.021..0.085
rows=12 loops=6)
wepy-> wepy-> Index Cond: ((w8.wid =
"outer".wid) AND (w8.variant = "outer".variant))
wepy-> wepy-> -> Index Scan using words2_pkey on words w2
(cost=0.00..6.00 rows=1 width=8) (actual time=0.021..0.046 rows=1 loops=12)
wepy-> wepy-> Index Cond: ((w2.wid = "outer".wid) AND
(w2.variant = "outer".variant))
wepy-> wepy-> Filter: ((pinyin)::text ~~ 'ji_'::text)
wepy-> wepy-> Total runtime: 1044.239 ms
wepy-> wepy-> (43 rows)
wepy-> wepy->
wepy-> wepy-> Time: 3939.938 ms

Even the total runtime has improved dramatically with the use of explicit
joins and folding. However, I still need to cut the query time down much
more... My last resort is to dig through the source code and see if I can
understand how the planner works, but I suspect it's thousands of lines of
code :\

BTW, I have since upgraded to 7.4.2 (from 7.4.1), and you can verify these
planner times even with an empty table. The table spec is below:

Table "public.words"
Column | Type | Modifiers
--------------+----------------------+-----------
wid | integer | not null
sequence | smallint | not null
variant | smallint | not null
char_count | smallint | not null
unicode | character varying(5) | not null
pinyin | character varying(8) | not null
simpvar | character varying(5) |
zvar | character varying(5) |
compatvar | character varying(5) |
def_exists | boolean | not null
num_variants | smallint |
page_order | integer |
pinyins | character varying |
unicodes | character varying |
Indexes:
"words2_pkey" primary key, btree (wid, variant, "sequence")
"page_index" btree (page_order)
"pinyin_index" btree (pinyin)
"unicode_index" btree (unicode)

_________________________________________________________________
Get rid of annoying pop-up ads with the new MSN Toolbar FREE!
http://clk.atdmt.com/AVE/go/onm00200414ave/direct/01/

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-03-23 05:19:50 Re: [PERFORM] Databases Vs. Schemas
Previous Message Subbiah, Stalin 2004-03-23 00:05:45 Benchmarking postgres on Solaris/Linux