Context-switch storm in 8.1.15

From: Iñigo Martinez Lasala <imartinez(at)vectorsf(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Context-switch storm in 8.1.15
Date: 2008-12-30 11:02:36
Message-ID: 1FBF68E78578447EAD1570BC0DDF1041@DEIMOS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi everybody.

Recently our company has been granted with a contract for an on-line store
mainteinance.
The website has been developed under J2EE and Postgres 8.1 as database
backend. The system has been working without problem for several month,
but with Christmas access to web portal has raised a lot.
The database suffers of a performance problem on high load. Lot of context
switch happens reaching up to 200.000 cs per second.
This system is a 16GB, 4 CPU intel Xeon MP with HT enabled and a RAID10
iSCSI storage, kernel 2.4.21 (RHAS 3).

Half of CPU power is lost on system time, as you can see.

Vmstat on high load
19 0 0 281852 150316 13732396 0 0 32 80 1071 128209 41
43 16 0
75 0 0 282040 150316 13732396 0 0 0 0 719 148023 40
38 22 0
3 0 0 284208 150324 13732412 0 0 16 484 728 145371 39
40 21 0
12 0 0 278364 150324 13732508 0 0 80 56 660 157533 35
42 23 1
6 0 0 284972 150324 13732580 0 0 32 200 685 142014 39
41 20 0
8 0 0 296424 150324 13732624 0 0 40 136 554 139601 41
39 20 0
85 0 0 265004 150324 13732664 0 0 32 48 642 142437 48
32 20 0
32 0 0 267432 150324 13732680 0 0 0 788 1003 144409 37
42 21 0
13 0 0 270468 150324 13732676 0 0 0 24 724 146663 42
40 19

Vmstat after 20 seconds after stopping portal:
8 0 0 962388 206744 13771548 0 0 0 0 131 199784 11
38 51 0
3 0 0 970212 206744 13771548 0 0 0 1856 305 203639 12
40 48 0
10 0 0 975036 206744 13771588 0 0 0 128 212 201899 11
36 52 0
3 0 0 970272 206744 13771652 0 0 16 232 685 202672 14
41 44 0
6 0 0 1008320 206744 13771656 0 0 0 40 198 196298 14
46 39 0
3 0 0 1034836 206744 13771656 0 0 0 0 147 202731 12
39 50 0
3 0 0 1037764 206752 13771656 0 0 0 952 202 202933 11
39 50 0
5 0 0 1078132 206752 13771656 0 0 0 0 154 203408 18
35 47 0
6 0 0 1110572 206752 13771656 0 0 0 0 153 196864 18
41 41 0
4 0 0 1105440 206752 13771824 0 0 16 592 461 207538 12
37 51 1


I've read about this problem with version prior 8.2. However at this
moment is not possible to migrate to 8.2 due to the amount of stored
procedures and we don't have time enough to test ALL procedures in order
to migrate to 8.2 (or 8.3).
However we have performed light tests with 8.2 on high load an this
problem has been solved or mitigated.

Now the question. Is there any backport patch for 8.1 that solves
context-switch storm?

The patch I'm looking for is this or a similar one(this one is for 8.2):
---
A Itagaki Takahiro/Tom Lane patch which arranges for GetSnapshotData
to copy live-subtransaction XIDs from the PGPROC array into
snapshots, and use this information to avoid visits to pg_subtrans
in HeapTupleSatisfiesSnapshot. This appears to solve the
pg_subtrans-related context swap storm problem that's been reported
by several people for 8.1. While at it, modify GetSnapshotData to
not take an exclusive lock on ProcArrayLock, as closer analysis
shows that shared lock is always sufficient.
---

Thanks in advance.

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Bhujbal, Santosh 2008-12-30 12:28:15 postgres block_size problem
Previous Message Tom Lane 2008-12-30 03:13:12 Re: ssl database connection problems...