Re: Performance problem. Could it be related to 8.3-beta4?

From: Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL General ML" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Performance problem. Could it be related to 8.3-beta4?
Date: 2008-01-05 14:34:27
Message-ID: a595de7a0801050634pb2e942cx17d9d865cd13206c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2008/1/4, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com> writes:
> > The same insert query takes 20 minutes in the production system and 2
> > hours in the new one.
>
> Hmph. It's the same plan, so it's not a planner mistake. Could you
> post EXPLAIN ANALYZE rather than just EXPLAIN for both cases?

New server's explain analyze now with debug_assertions = false as
suggested by Greg:

fahstats=> explain analyze
fahstats-> insert into usuarios (
fahstats(> data,
fahstats(> usuario,
fahstats(> pontos,
fahstats(> wus
fahstats(> )
fahstats-> select
fahstats-> (select data_serial from data_serial) as data,
fahstats-> ui.usuario_serial as usuario,
fahstats-> sum(pontos) as pontos,
fahstats-> sum(wus) as wus
fahstats-> from usuarios_temp as ut inner join usuarios_indice as ui
fahstats-> on ut.usuario = ui.usuario_nome and ut.n_time = ui.n_time
fahstats-> group by data, ui.usuario_serial
fahstats-> ;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan "*SELECT*" (cost=323715.27..347866.73 rows=878235
width=20) (actual time=11072.973..13236.655 rows=878022 loops=1)
-> HashAggregate (cost=323715.27..336888.79 rows=878235 width=12)
(actual time=11072.960..12494.611 rows=878022 loops=1)
InitPlan
-> Seq Scan on data_serial (cost=0.00..1.01 rows=1
width=4) (actual time=0.008..0.009 rows=1 loops=1)
-> Merge Join (cost=101826.80..263869.07 rows=5984519
width=12) (actual time=4515.658..10039.560 rows=883856 loops=1)
Merge Cond: ((ut.n_time = ui.n_time) AND
((ut.usuario)::text = (ui.usuario_nome)::text))
-> Index Scan using usuarios_temp_ndx on usuarios_temp
ut (cost=0.00..52895.92 rows=883856 width=23) (actual
time=0.097..2188.894 rows=883856 loops=1)
-> Sort (cost=101826.80..104022.38 rows=878235
width=19) (actual time=4515.538..4846.120 rows=883896 loops=1)
Sort Key: ui.n_time, ui.usuario_nome
Sort Method: quicksort Memory: 90130kB
-> Seq Scan on usuarios_indice ui
(cost=0.00..15126.35 rows=878235 width=19) (actual time=0.022..300.928
rows=878062 loops=1)
Trigger for constraint datas: time=21853.261 calls=878022
Total runtime: 6889463.591 ms
(13 rows)

Time: 6889813.327 ms

Same explain analyze in the production server:

Subquery Scan "*SELECT*" (cost=317921.51..342084.63 rows=878659
width=20) (actual time=32184.729..37067.875 rows=878645 loops=1)
-> HashAggregate (cost=317921.51..331101.40 rows=878659 width=12)
(actual time=32184.714..34808.398 rows=878645 loops=1)
InitPlan
-> Seq Scan on data_serial (cost=0.00..1.01 rows=1
width=4) (actual time=0.015..0.017 rows=1 loops=1)
-> Merge Join (cost=102030.95..261051.62 rows=5686888
width=12) (actual time=12060.037..29542.750 rows=884482 loops=1)
Merge Cond: ((ut.n_time = ui.n_time) AND
((ut.usuario)::text = "inner"."?column4?"))
-> Index Scan using usuarios_temp_ndx on usuarios_temp
ut (cost=0.00..55089.16 rows=884482 width=26) (actual
time=0.196..8751.167 rows=884482 loops=1)
-> Sort (cost=102030.95..104227.60 rows=878659
width=22) (actual time=12059.789..13205.930 rows=884522 loops=1)
Sort Key: ui.n_time, (ui.usuario_nome)::text
-> Seq Scan on usuarios_indice ui
(cost=0.00..15285.59 rows=878659 width=22) (actual
time=0.051..1375.554 rows=878685 loops=1)
Trigger for constraint datas: time=34157.875 calls=878645
Total runtime: 808956.038 ms
(12 rows)

Time: 809219.070 ms

> Also, it'd be worth watching "vmstat 1" output while the queries are
> running on each system, to see if there's any obvious discrepancy
> in behavior at that level.

# vmstat 1 in new system while running the insert query:
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 8 64 25320 58176 3342492 0 0 156 355 27 88 1
0 88 11 0
0 8 64 25344 58176 3342492 0 0 0 0 168 212 0
1 0 99 0
0 8 64 25344 58176 3342492 0 0 0 0 143 185 0
1 0 99 0
0 8 64 25468 58176 3342492 0 0 0 0 166 189 0
1 0 99 0
0 8 64 25568 58176 3342492 0 0 0 0 159 210 0
1 0 99 0
0 8 64 25708 58176 3342492 0 0 0 0 151 185 0
1 0 99 0

Some minutes later in the same system while still running the insert query:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 5 64 25060 58352 3348704 0 0 156 356 27 88 1
0 87 11 0
0 5 64 24988 58352 3348732 0 0 48 0 113 156 1
1 50 49 0
0 5 64 24920 58352 3348808 0 0 48 0 116 150 0
0 50 50 0
0 6 64 24920 58352 3348884 0 0 64 0 146 163 0
0 41 58 0
0 6 64 24860 58352 3348964 0 0 56 0 136 204 0
0 0 100 0
0 6 64 24800 58352 3349044 0 0 56 0 128 194 0
0 0 99 0

About 40 minutes later still running the query:
# vmstat 1 # new system
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 4 64 25700 60492 3334264 0 0 155 358 28 88 1
0 87 12 0
0 4 64 25672 60492 3334292 0 0 80 0 167 160 0
0 0 100 0
0 4 64 25672 60492 3334372 0 0 72 0 157 157 0
0 0 99 0
0 2 64 25544 60492 3334448 0 0 72 0 175 170 0
0 0 100 0
0 1 64 25224 60500 3334732 0 0 248 100 143 278 0
0 11 88 0
0 1 64 25024 60500 3334900 0 0 128 8 23 168 0
0 50 50 0
0 3 64 24956 60500 3335028 0 0 124 3336 264 168 0
0 46 53 0
0 2 64 24896 60500 3335112 0 0 80 0 180 168 0
0 50 50 0
0 1 64 24672 60508 3335360 0 0 168 48 35 220 0
0 50 49 0
0 1 64 24332 60508 3335772 0 0 452 104 76 400 0
0 50 49 0
0 3 64 25132 60508 3334556 0 0 652 7992 255 534 0
1 36 63 0
0 3 64 24996 60508 3334688 0 0 64 0 154 126 0
0 0 100 0

# vmstat 1 -p /dev/sdb3 # new system - don't know it this has any relevance.
sdb3 reads read sectors writes requested writes
1643755 81073154 23523967 188191736
1643763 81073282 23524094 188192752
1643783 81073602 23524106 188192848
1643790 81073714 23524106 188192848
1643882 81075178 23524144 188193152
1643933 81075986 23524908 188199264
1643943 81076138 23525230 188201840

Production system while running the insert query:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 1116 76224 16196 1704500 0 0 228 869 8 11 12
2 72 14 0
1 0 1116 71264 16196 1704500 0 0 0 0 1009 32 59
41 0 0 0
1 0 1116 66304 16196 1704500 0 0 0 0 1009 23 56
44 0 0 0
1 0 1116 60984 16196 1704500 0 0 0 0 1009 27 59
41 0 0 0
1 0 1116 55904 16204 1704492 0 0 0 12 1011 31 60
40 0 0 0
1 0 1116 50424 16204 1704500 0 0 0 0 1009 29 66
34 0 0 0
1 0 1116 50848 16204 1699168 0 0 0 8 1015 37 84
16 0 0 0

A few minutes later:
# vmstat 1 # production system
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 1 1108 50832 10452 1670888 0 0 229 869 9 11 12
2 72 14 0
2 3 1108 52232 10448 1668680 0 0 5820 36556 1478 1094 59
30 0 11 0
2 7 1108 50908 10460 1670172 0 0 4476 8416 1417 1126 10
8 0 82 0
0 8 1108 52420 10392 1668368 0 0 6460 10568 1509 1465 8
8 0 84 0
0 2 1108 50528 10400 1671140 0 0 4304 10188 1355 973 5
8 0 87 0
0 2 1108 51848 10416 1669616 0 0 552 32 1153 399 0
2 0 98 0

I forgot to say that the raid 1 is software raid.

Regards, Clodoaldo Pinto Neto

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Clodoaldo 2008-01-05 14:40:21 Re: Performance problem. Could it be related to 8.3-beta4?
Previous Message cdecarlo 2008-01-05 14:10:31 Setting variables equal to elements from an Array