Re: [pgsql-de-allgemein] Performance bricht abrupt ein bei großen Ergebnismengen

From: Volker Sievert <sievert(at)molgen(dot)mpg(dot)de>
To: pgsql-de-allgemein(at)postgresql(dot)org
Subject: Re: [pgsql-de-allgemein] Performance bricht abrupt ein bei großen Ergebnismengen
Date: 2011-09-30 13:28:35
Message-ID: 4E85C403.4010206@molgen.mpg.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-de-allgemein

On 30.09.2011 07:43, Hans-Jürgen Schönig wrote:
> hallo,
>
> ich denke, dir kippt der plan irgendwo um. das kann man ansich schon ganz gut in den griff bekommen, denke ich (stichwort planner cost parameter und co).
> eine ferndiagnose ist aber schwer, wenn die query nicht vollständig ist.
> wie meine kollegen schon gesagt haben; poste mal mit verschiedenen datenmengen ein ...
>
> explain (buffers true, costs true, analyze true) ... (sofern deine version von pg schon mehr als nur plain "explain analyze" kann.

Vielen Dank für die Hinweise,

ich muss zugeben, dass mich die Interpretation des Explain
Analyse-Output etwas überfordert (bin Biolge, kein Informatiker...) -
versucht hatte ich das schon vorher mal und bin nicht so recht schlau
daraus geworden.

Als nicht-Informatiker fallen mir bei der "großen" Query die vielen
Aufrufe von "Materialize" auf - das würde ich mal so interpretieren,
dass da ständig Zwischenergebnisse auf Platte geschrieben werden - nur,
was kann man dagegen tun? Mehr Metall werde ich kaum dafür bekommen...

das ganze ist Postgresql Version 9.0 und läuft unter Linux 2.6.36 (X64)
auf 4 GiB RAM und teilt sich die Resourcen mit einer Java-Anwendung.

Zumindest laut "top" steigt der RAM-Verbrauch aber nicht auf unmäßig
hohe Werte, mehr als 15-20% will Postgres während der Ausführung dieser
Querys nicht haben.

Läufe von Explain mit einer real existierenden Query:

----------------------------
explain (buffers true, costs true, analyze true)
with aa as (select vquest_aa.sequence_id, abdb_sequence, vdj,
vregion, fr1_imgt, fr2_imgt, fr3_imgt,
fr4_imgt, cdr1, cdr2, cdr3, jregion, junction
from vquest_aa inner join sequence on vquest_aa.sequence_id =
sequence.sequence_id limit XXX) select
sequence_id, abdb_sequence, s1.seq as vdj_seq, s2.seq as vregion_seq,
s3.seq as fr1_seq, s4.seq as fr2_seq, s5.seq as fr3_seq, s6.seq as fr4_seq,
s7.seq as cdr1_seq, s8.seq as cdr2_seq, s9.seq as cdr3_seq,
s10.seq as jregion, s11.seq as junction

from aa left outer join sequences as s1 on s1.id = vdj
left outer join sequences as s2 on s2.id = vregion
left outer join sequences as s3 on s3.id = fr1_imgt
left outer join sequences as s4 on s4.id = fr2_imgt
left outer join sequences as s5 on s5.id = fr3_imgt
left outer join sequences as s6 on s6.id = fr4_imgt
left outer join sequences as s7 on s7.id = cdr1
left outer join sequences as s8 on s8.id = cdr2
left outer join sequences as s9 on s9.id = cdr3
left outer join sequences as s10 on s10.id = jregion
left outer join sequences as s11 on s11.id = junction
---------------------------------------------------

ergeben folgenden Output bei XXX = 200:

---------------------------------------------------
"Nested Loop Left Join (cost=1695.94..25780.53 rows=200 width=1231)
(actual time=55.106..87.102 rows=200 loops=1)"
" Buffers: shared hit=9534 read=49"
" CTE aa"
" -> Limit (cost=0.00..1695.94 rows=200 width=104) (actual
time=43.635..49.006 rows=200 loops=1)"
" Buffers: shared hit=794 read=13"
" -> Nested Loop (cost=0.00..13051100.46 rows=1539098
width=104) (actual time=43.631..48.474 rows=200 loops=1)"
" Buffers: shared hit=794 read=13"
" -> Seq Scan on vquest_aa (cost=0.00..44881.98
rows=1539098 width=96) (actual time=16.268..17.140 rows=200 loops=1)"
" Buffers: shared read=4"
" -> Index Scan using sequence_pkey on sequence
(cost=0.00..8.44 rows=1 width=16) (actual time=0.148..0.150 rows=1
loops=200)"
" Index Cond: ((sequence.sequence_id)::text =
(vquest_aa.sequence_id)::text)"
" Buffers: shared hit=794 read=9"
" -> Nested Loop Left Join (cost=0.00..21895.44 rows=200 width=1144)
(actual time=55.097..84.777 rows=200 loops=1)"
" Buffers: shared hit=8742 read=48"
" -> Nested Loop Left Join (cost=0.00..19706.30 rows=200
width=1057) (actual time=55.088..82.395 rows=200 loops=1)"
" Buffers: shared hit=7950 read=47"
" -> Nested Loop Left Join (cost=0.00..17517.15 rows=200
width=970) (actual time=55.080..80.111 rows=200 loops=1)"
" Buffers: shared hit=7152 read=47"
" -> Nested Loop Left Join (cost=0.00..15328.01
rows=200 width=883) (actual time=55.070..77.807 rows=200 loops=1)"
" Buffers: shared hit=6352 read=47"
" -> Nested Loop Left Join
(cost=0.00..13138.86 rows=200 width=796) (actual time=55.062..75.404
rows=200 loops=1)"
" Buffers: shared hit=5551 read=47"
" -> Nested Loop Left Join
(cost=0.00..10949.72 rows=200 width=709) (actual time=55.054..73.134
rows=200 loops=1)"
" Buffers: shared hit=4758 read=47"
" -> Nested Loop Left Join
(cost=0.00..8760.58 rows=200 width=622) (actual time=55.045..70.840
rows=200 loops=1)"
" Buffers: shared hit=3956
read=47"
" -> Nested Loop Left Join
(cost=0.00..6571.43 rows=200 width=535) (actual time=55.035..68.538
rows=200 loops=1)"
" Buffers: shared
hit=3156 read=47"
" -> Nested Loop Left
Join (cost=0.00..4382.29 rows=200 width=448) (actual
time=55.026..66.194 rows=200 loops=1)"
" Buffers: shared
hit=2354 read=47"
" -> Nested Loop
Left Join (cost=0.00..2193.14 rows=200 width=361) (actual
time=55.016..63.817 rows=200 loops=1)"
" Buffers:
shared hit=1557 read=43"
" -> CTE
Scan on aa (cost=0.00..4.00 rows=200 width=274) (actual
time=43.645..49.767 rows=200 loops=1)"
"
Buffers: shared hit=794 read=13"
" -> Index
Scan using biosequences_pkey on sequences s1 (cost=0.00..10.93 rows=1
width=103) (actual time=0.063..0.064 rows=1 loops=200)"
"
Index Cond: (s1.id = aa.vdj)"
"
Buffers: shared hit=763 read=30"
" -> Index Scan
using biosequences_pkey on sequences s2 (cost=0.00..10.93 rows=1
width=103) (actual time=0.004..0.006 rows=1 loops=200)"
" Index
Cond: (s2.id = aa.vregion)"
" Buffers:
shared hit=797 read=4"
" -> Index Scan using
biosequences_pkey on sequences s3 (cost=0.00..10.93 rows=1 width=103)
(actual time=0.004..0.006 rows=1 loops=200)"
" Index Cond:
(s3.id = aa.fr1_imgt)"
" Buffers: shared
hit=802"
" -> Index Scan using
biosequences_pkey on sequences s4 (cost=0.00..10.93 rows=1 width=103)
(actual time=0.004..0.005 rows=1 loops=200)"
" Index Cond: (s4.id =
aa.fr2_imgt)"
" Buffers: shared hit=800"
" -> Index Scan using
biosequences_pkey on sequences s5 (cost=0.00..10.93 rows=1 width=103)
(actual time=0.004..0.005 rows=1 loops=200)"
" Index Cond: (s5.id =
aa.fr3_imgt)"
" Buffers: shared hit=802"
" -> Index Scan using biosequences_pkey
on sequences s6 (cost=0.00..10.93 rows=1 width=103) (actual
time=0.004..0.005 rows=1 loops=200)"
" Index Cond: (s6.id = aa.fr4_imgt)"
" Buffers: shared hit=793"
" -> Index Scan using biosequences_pkey on
sequences s7 (cost=0.00..10.93 rows=1 width=103) (actual
time=0.004..0.006 rows=1 loops=200)"
" Index Cond: (s7.id = aa.cdr1)"
" Buffers: shared hit=801"
" -> Index Scan using biosequences_pkey on sequences
s8 (cost=0.00..10.93 rows=1 width=103) (actual time=0.004..0.006 rows=1
loops=200)"
" Index Cond: (s8.id = aa.cdr2)"
" Buffers: shared hit=800"
" -> Index Scan using biosequences_pkey on sequences s9
(cost=0.00..10.93 rows=1 width=103) (actual time=0.004..0.006 rows=1
loops=200)"
" Index Cond: (s9.id = aa.cdr3)"
" Buffers: shared hit=798"
" -> Index Scan using biosequences_pkey on sequences s10
(cost=0.00..10.93 rows=1 width=103) (actual time=0.004..0.006 rows=1
loops=200)"
" Index Cond: (s10.id = aa.jregion)"
" Buffers: shared hit=792 read=1"
" -> Index Scan using biosequences_pkey on sequences s11
(cost=0.00..10.93 rows=1 width=103) (actual time=0.004..0.006 rows=1
loops=200)"
" Index Cond: (s11.id = aa.junction)"
" Buffers: shared hit=792 read=1"
"Total runtime: 87.683 ms"
--------------------------------------------

Und bei XXX=200000:

--------------------------------------------
Merge Right Join (cost=13165433.17..14401451.91 rows=200000 width=1231)
(actual time=1865697.697..1939302.226 rows=200000 loops=1)
Merge Cond: (s11.id = aa.junction)
Buffers: shared hit=2828843 read=7011755, temp written=21429
CTE aa
-> Limit (cost=86666.71..120575.23 rows=200000 width=104) (actual
time=8463.992..26659.383 rows=200000 loops=1)
Buffers: shared hit=6 read=59381, temp written=21429
-> Hash Join (cost=86666.71..347609.45 rows=1539098
width=104) (actual time=8463.988..26136.955 rows=200000 loops=1)
Hash Cond: ((sequence.sequence_id)::text =
(vquest_aa.sequence_id)::text)
Buffers: shared hit=6 read=59381, temp written=21429
-> Seq Scan on sequence (cost=0.00..101023.38
rows=6171438 width=16) (actual time=5.537..7866.609 rows=4693476 loops=1)
Buffers: shared hit=3 read=29893
-> Hash (cost=44881.98..44881.98 rows=1539098
width=96) (actual time=8396.404..8396.404 rows=1470174 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 97185kB
Buffers: shared hit=3 read=29488, temp written=11090
-> Seq Scan on vquest_aa (cost=0.00..44881.98
rows=1539098 width=96) (actual time=0.011..4288.422 rows=1470174 loops=1)
Buffers: shared hit=3 read=29488
-> Index Scan using biosequences_pkey on sequences s11
(cost=0.00..1173213.59 rows=23722060 width=103) (actual
time=36.189..118986.813 rows=23722060 loops=1)
Buffers: shared hit=257174 read=632027
-> Materialize (cost=13044857.94..13045857.94 rows=200000
width=1144) (actual time=1778162.821..1779107.784 rows=200000 loops=1)
Buffers: shared hit=2571669 read=6379728, temp written=21429
-> Sort (cost=13044857.94..13045357.94 rows=200000
width=1144) (actual time=1778160.740..1778467.544 rows=200000 loops=1)
Sort Key: aa.junction
Sort Method: quicksort Memory: 109142kB
Buffers: shared hit=2571669 read=6379728, temp written=21429
-> Merge Right Join (cost=11691423.56..12927442.30
rows=200000 width=1144) (actual time=1683740.302..1777062.899
rows=200000 loops=1)
Merge Cond: (s10.id = aa.jregion)
Buffers: shared hit=2571669 read=6379728, temp
written=21429
-> Index Scan using biosequences_pkey on sequences
s10 (cost=0.00..1173213.59 rows=23722060 width=103) (actual
time=0.012..157747.588 rows=23722058 loops=1)
Buffers: shared hit=257159 read=632042
-> Materialize (cost=11691423.56..11692423.56
rows=200000 width=1057) (actual time=1576243.597..1577200.558
rows=200000 loops=1)
Buffers: shared hit=2314510 read=5747686,
temp written=21429
-> Sort (cost=11691423.56..11691923.56
rows=200000 width=1057) (actual time=1576225.828..1576540.692
rows=200000 loops=1)
Sort Key: aa.jregion
Sort Method: quicksort Memory: 109141kB
Buffers: shared hit=2314510
read=5747686, temp written=21429
-> Merge Right Join
(cost=10344824.68..11580843.42 rows=200000 width=1057) (actual
time=1482245.997..1575195.906 rows=200000 loops=1)
Merge Cond: (s9.id = aa.cdr3)
Buffers: shared hit=2314510
read=5747686, temp written=21429
-> Index Scan using
biosequences_pkey on sequences s9 (cost=0.00..1173213.59 rows=23722060
width=103) (actual time=0.014..154690.793 rows=23722053 loops=1)
Buffers: shared hit=257156
read=632045
-> Materialize
(cost=10344824.68..10345824.68 rows=200000 width=970) (actual
time=1377571.940..1378572.983 rows=200000 loops=1)
Buffers: shared hit=2057354
read=5115641, temp written=21429
-> Sort
(cost=10344824.68..10345324.68 rows=200000 width=970) (actual
time=1377561.968..1377879.536 rows=200000 loops=1)
Sort Key: aa.cdr3
Sort Method:
quicksort Memory: 109137kB
Buffers: shared
hit=2057354 read=5115641, temp written=21429
-> Merge Right Join
(cost=9005743.80..10241762.54 rows=200000 width=970) (actual
time=1284751.972..1376566.143 rows=200000 loops=1)
Merge Cond:
(s8.id = aa.cdr2)
Buffers: shared
hit=2057354 read=5115641, temp written=21429
-> Index Scan
using biosequences_pkey on sequences s8 (cost=0.00..1173213.59
rows=23722060 width=103) (actual time=0.013..156466.277 rows=23722052
loops=1)
Buffers:
shared hit=257169 read=632032
-> Materialize
(cost=9005743.80..9006743.80 rows=200000 width=883) (actual
time=1176878.668..1177863.431 rows=200000 loops=1)
Buffers:
shared hit=1800185 read=4483609, temp written=21429
-> Sort
(cost=9005743.80..9006243.80 rows=200000 width=883) (actual
time=1176876.609..1177193.551 rows=200000 loops=1)

Sort Key: aa.cdr2

Sort Method: quicksort Memory: 109106kB

Buffers: shared hit=1800185 read=4483609, temp written=21429
->
Merge Right Join (cost=7674184.42..8910203.16 rows=200000 width=883)
(actual time=1086039.220..1176022.857 rows=200000 loops=1)

Merge Cond: (s7.id = aa.cdr1)

Buffers: shared hit=1800185 read=4483609, temp written=21429

-> Index Scan using biosequences_pkey on sequences s7
(cost=0.00..1173213.59 rows=23722060 width=103) (actual
time=0.016..134532.516 rows=23722051 loops=1)

Buffers: shared hit=257162 read=632039

-> Materialize (cost=7674184.42..7675184.42 rows=200000 width=796)
(actual time=998770.854..999725.786 rows=200000 loops=1)

Buffers: shared hit=1543023 read=3851570, temp written=21429

-> Sort (cost=7674184.42..7674684.42 rows=200000 width=796)
(actual time=998768.908..999079.526 rows=200000 loops=1)

Sort Key: aa.cdr1

Sort Method: quicksort Memory: 109020kB

Buffers: shared hit=1543023 read=3851570, temp written=21429

-> Merge Right Join (cost=6350143.04..7586161.78
rows=200000 width=796) (actual time=924025.962..997633.963 rows=200000
loops=1)

Merge Cond: (s6.id = aa.fr4_imgt)

Buffers: shared hit=1543023 read=3851570, temp
written=21429

-> Index Scan using biosequences_pkey on
sequences s6 (cost=0.00..1173213.59 rows=23722060 width=103) (actual
time=0.010..118666.804 rows=23722057 loops=1)

Buffers: shared hit=257166 read=632035

-> Materialize (cost=6350143.04..6351143.04
rows=200000 width=709) (actual time=836887.849..837838.133 rows=200000
loops=1)

Buffers: shared hit=1285857 read=3219535,
temp written=21429

-> Sort (cost=6350143.04..6350643.04
rows=200000 width=709) (actual time=836885.917..837193.904 rows=200000
loops=1)

Sort Key: aa.fr4_imgt

Sort Method: quicksort Memory: 109020kB

Buffers: shared hit=1285857
read=3219535, temp written=21429

-> Merge Right Join
(cost=5034123.16..6269641.90 rows=200000 width=709) (actual
time=762781.034..835798.442 rows=200000 loops=1)

Merge Cond: (s5.id = aa.fr3_imgt)

Buffers: shared hit=1285857
read=3219535, temp written=21429

-> Index Scan using
biosequences_pkey on sequences s5 (cost=0.00..1173213.59 rows=23722060
width=103) (actual time=0.010..118368.344 rows=23722056 loops=1)

Buffers: shared hit=257155
read=632046

-> Sort
(cost=5034123.16..5034623.16 rows=200000 width=622) (actual
time=675955.766..676260.201 rows=200000 loops=1)

Sort Key: aa.fr3_imgt

Sort Method: quicksort
Memory: 108667kB

Buffers: shared
hit=1028702 read=2587489, temp written=21429

-> Merge Right Join
(cost=3780994.78..5016513.52 rows=200000 width=622) (actual
time=601634.880..674774.955 rows=200000 loops=1)

Merge Cond: (s4.id =
aa.fr2_imgt)

Buffers: shared
hit=1028702 read=2587489, temp written=21429

-> Index Scan using
biosequences_pkey on sequences s4 (cost=0.00..1173213.59 rows=23722060
width=103) (actual time=0.010..118787.249 rows=23722055 loops=1)

Buffers:
shared hit=257164 read=632037

-> Sort
(cost=3780994.78..3781494.78 rows=200000 width=535) (actual
time=514392.602..514698.836 rows=200000 loops=1)

Sort Key:
aa.fr2_imgt

Sort Method:
quicksort Memory: 108667kB

Buffers:
shared hit=771538 read=1955452, temp written=21429

-> Merge
Right Join (cost=2527866.40..3763385.14 rows=200000 width=535) (actual
time=439805.541..512861.294 rows=200000 loops=1)

Merge
Cond: (s3.id = aa.fr1_imgt)

Buffers:
shared hit=771538 read=1955452, temp written=21429

->
Index Scan using biosequences_pkey on sequences s3
(cost=0.00..1173213.59 rows=23722060 width=103) (actual
time=0.011..118178.639 rows=23722054 loops=1)


Buffers: shared hit=257180 read=632021

-> Sort
(cost=2527866.40..2528366.40 rows=200000 width=448) (actual
time=353204.303..353511.857 rows=200000 loops=1)


Sort Key: aa.fr1_imgt


Sort Method: quicksort Memory: 108667kB


Buffers: shared hit=514358 read=1323431, temp written=21429

->
Merge Right Join (cost=1274738.02..2510256.76 rows=200000 width=448)
(actual time=278817.368..351719.870 rows=200000 loops=1)


Merge Cond: (s2.id = aa.vregion)


Buffers: shared hit=514358 read=1323431, temp written=21429


-> Index Scan using biosequences_pkey on sequences s2
(cost=0.00..1173213.59 rows=23722060 width=103) (actual
time=0.010..117622.111 rows=23722060 loops=1)


Buffers: shared hit=257172 read=632029


-> Sort (cost=1274738.02..1275238.02 rows=200000 width=361)
(actual time=192528.719..192836.816 rows=200000 loops=1)


Sort Key: aa.vregion


Sort Method: quicksort Memory: 67612kB


Buffers: shared hit=257186 read=691402, temp written=21429


-> Merge Right Join (cost=21609.64..1257128.38 rows=200000
width=361) (actual time=118193.401..191254.339 rows=200000 loops=1)


Merge Cond: (s1.id = aa.vdj)


Buffers: shared hit=257186 read=691402, temp written=21429


-> Index Scan using biosequences_pkey on sequences s1
(cost=0.00..1173213.59 rows=23722060 width=103) (actual
time=0.010..121574.630 rows=23722059 loops=1)


Buffers: shared hit=257180 read=632021


-> Sort (cost=21609.64..22109.64 rows=200000
width=274) (actual time=27974.028..28279.362 rows=200000 loops=1)


Sort Key: aa.vdj


Sort Method: quicksort Memory: 34270kB


Buffers: shared hit=6 read=59381, temp written=21429


-> CTE Scan on aa (cost=0.00..4000.00
rows=200000 width=274) (actual time=8464.001..27457.434 rows=200000 loops=1)


Buffers: shared hit=6 read=59381, temp
written=21429
Total runtime: 1939690.783 ms
------------------------------------------

>
> was mir noch aufgefallen ist: "from vquest_aa limit XXX".
> das wird dir ein ziemlich zufälliges ergebnis geben ... das ergebnis ist ziemlich vom disk layout abhängig. will man das?

Im Endergebnis nicht. Da sollte die Query eigentlich mal alles zurück
geben, diese "harte" Grenze gibt es aber auch, wenn man die
Ergebnismenge auf sinnvollere Art begrenzt, z.B. mit einem IN (...)
Statement.

>
> meine wette: der plan für die joins kippt um ...
>
> liebe grüße,
>
> hans

Vielen Dank und liebe Grüße,

Volker

>
>

--
Volker Sievert
AG Konthur
Max Planck Institute for Molecular Genetics
Department of Vertebrate Genomics (Prof. H. Lehrach)
Ihnestr.63, 14195 Berlin, Germany
Tel.: +49/30/ 8413-1578
Fax.: +49/30/ 8413-1718 or -1365
sievert(at)molgen(dot)mpg(dot)de
http://www.molgen.mpg.de/~in-vitro/

In response to

Responses

Browse pgsql-de-allgemein by date

  From Date Subject
Next Message Andreas 'ads' Scherbaum 2011-09-30 13:46:23 Re: [pgsql-de-allgemein] Re: [pgsql-de-allgemein] Performance bricht abrupt ein bei großen Ergebnismengen
Previous Message Susanne Ebrecht 2011-09-30 10:55:52 Re: [pgsql-de-allgemein] Performance bricht abrupt ein bei großen Ergebnismengen