System overload / context switching / oom, 8.3

From: Rob <rclemley(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: System overload / context switching / oom, 8.3
Date: 2010-02-02 19:11:27
Message-ID: 4B6878DF.3000004@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

pg 8.3.9, Debian Etch, 8gb ram, quadcore xeon, megaraid (more details at end)
~240 active databases, 800+ db connections via tcp.

Everything goes along fairly well, load average from 0.5 to 4.0. Disk
IO is writing about 12-20 MB every 4 or 5 seconds. Cache memory about
4gb. Then under load, we see swapping and then context switch storm and
then oom-killer.

I'm hoping to find some ideas for spreading out the load of bgwriter
and/or autovacuum somehow or possibly reconfiguring memory to help
alleviate the problem, or at least to avoid crashing.

(Hardware/software/configuration specs are below the following dstat output).

I've been able to recreate the context switch storm (without the crash)
by running 4 simultaneous 'vacuum analyze' tasks during a pg_dump.
During these times, htop shows all 8 cpu going red bar 100% for a second
or two or three, and this is when I see the context switch storm.
The following stat data however is from a production workload crash.

During the dstat output below, postgresql was protected by oom_adj -17.
vm_overcommit_memory set to 2, but at this time vm_overcommit_ratio was
still at 50 (has since been changed to 90, should this be 100?). The
memory usage was fairly constant 4056M 91M 3906M, until the end and after
heavier swapping it went to 4681M 984k 3305M (used/buf/cache).

dstat output under light to normal load:
---procs--- ---paging-- -dsk/total- ---system-- ----total-cpu-usage----
run blk new|__in_ _out_|_read _writ|_int_ _csw_|usr sys idl wai hiq siq
0 2 5| 0 0 | 608k 884k| 756 801 | 11 2 83 4 0 0
1 0 4| 0 0 | 360k 1636k|1062 1147 | 13 1 83 2 0 0
2 2 5| 0 0 | 664k 1404k| 880 998 | 13 2 82 4 0 0
0 4 4| 0 0 |2700k 6724k|1004 909 | 10 1 72 16 0 0
0 2 4| 0 0 | 13M 14M|1490 1496 | 13 2 72 12 0 0
1 1 4| 0 0 | 21M 1076k|1472 1413 | 12 2 74 11 0 0
0 3 5| 0 0 | 15M 1712k|1211 1192 | 10 1 76 12 0 0
1 0 4| 0 0 |7384k 1124k|1277 1403 | 15 2 75 9 0 0
0 7 4| 0 0 |8864k 9528k|1431 1270 | 11 2 63 24 0 0
1 3 4| 0 0 |2520k 15M|2225 3410 | 13 2 66 19 0 0
2 1 5| 0 0 |4388k 1720k|1823 2246 | 14 2 70 13 0 0
2 0 4| 0 0 |2804k 1276k|1284 1378 | 12 2 80 6 0 0
0 0 4| 0 0 | 224k 884k| 825 900 | 12 2 86 1 0 0

under heavy load, just before crash, swap use has been increasing for
several seconds or minutes:
---procs--- ---paging-- -dsk/total- ---system-- ----total-cpu-usage----
run blk new|__in_ _out_|_read _writ|_int_ _csw_|usr sys idl wai hiq siq
2 22 9| 124k 28k| 12M 1360k|1831 2536 | 7 4 46 44 0 0
4 7 8| 156k 80k| 14M 348k|1742 2625 | 5 3 53 38 0 0
1 14 7| 60k 232k|9028k 24M|1278 1642 | 4 3 50 42 0 0
0 24 7| 564k 0 | 15M 5832k|1640 2199 | 7 2 41 50 0 0
1 26 7| 172k 0 | 13M 1052k|1433 2121 | 5 3 54 37 0 0
0 15 6| 36k 0 |6912k 35M|1295 3486 | 2 3 58 37 0 0
3 30 2| 0 0 |9724k 13M|1373 2378 | 4 3 48 45 0 0
5 20 4|4096B 0 | 10M 26M|2945 87k | 0 1 44 55 0 0
1 29 8| 0 0 | 19M 8192B| 840 19k | 0 0 12 87 0 0
4 33 3| 0 0 |4096B 0 | 14 39 | 17 17 0 67 0 0
3 31 0| 64k 0 | 116k 0 | 580 8418 | 0 0 0 100 0 0
0 36 0| 0 0 |8192B 0 | 533 12k | 0 0 9 91 0 0
2 32 1| 0 0 | 0 0 | 519 12k | 0 0 11 89 0 0
2 34 1| 0 0 | 16k 0 | 28 94 | 9 0 0 91 0 0
1 32 0| 0 0 | 20k 0 | 467 2295 | 1 0 13 87 0 0
2 32 0| 0 0 | 0 0 | 811 21k | 0 0 12 87 0 0
4 35 3| 0 0 | 44k 0 | 582 11k | 0 0 0 100 0 0
3 37 0| 0 0 | 0 0 | 16 67 | 0 9 0 91 0 0
2 35 0| 0 0 | 0 0 | 519 8205 | 0 2 21 77 0 0
0 37 0| 0 0 | 0 0 | 11 60 | 0 4 12 85 0 0
1 35 1| 0 0 | 20k 0 | 334 2499 | 0 0 23 77 0 0
0 36 1| 0 0 | 80k 0 | 305 8144 | 0 1 23 76 0 0
0 35 3| 0 0 | 952k 0 | 541 2537 | 0 0 16 84 0 0
2 35 2| 0 0 | 40k 0 | 285 8162 | 0 0 24 75 0 0
2 35 0| 100k 0 | 108k 0 | 550 9595 | 0 0 37 63 0 0
0 40 3| 0 0 | 16k 0 |1092 26k | 0 0 26 74 0 0
4 37 3| 0 0 | 96k 0 | 790 12k | 0 0 34 66 0 0
2 39 2| 0 0 | 24k 0 | 77 116 | 8 8 0 83 0 0
2 37 1| 0 0 | 0 0 | 354 2457 | 0 0 29 71 0 0
2 37 0|4096B 0 | 28k 0 |1909 57k | 0 0 27 73 0 0
0 39 1| 0 0 | 32k 0 |1060 25k | 0 0 12 88 0 0
---procs--- ---paging-- -dsk/total- ---system-- ----total-cpu-usage----
run blk new|__in_ _out_|_read _writ|_int_ _csw_|usr sys idl wai hiq siq

SPECS:

PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)
Installed from the debian etch-backports package.

Linux 2.6.18-6-686-bigmem #1 SMP Thu Nov 5 17:30:05 UTC 2009 i686
GNU/Linux (Debian Etch)

8 MB RAM
4 Quad Core Intel(R) Xeon(R) CPU E5440 @ 2.83GHz stepping 06
L1 I cache: 32K, L1 D cache: 32K, L2 cache: 6144K

LSI Logic SAS based MegaRAID driver (batter backed/write cache enabled)
Dell PERC 6/i
# 8 SEAGATE Model: ST973451SS Rev: SM04 (72 GB) ANSI SCSI revision: 05

RAID Configuration:
sda RAID1 2 disks (with pg_xlog wal files on it's own partition)
sdb RAID10 6 disks (pg base dir only)

POSTGRES:

261 databases
238 active databases (w/connection processes)
863 connections to those 238 databases

postgresql.conf:
max_connections = 1100
shared_buffers = 800MB
max_prepared_transactions = 0
work_mem = 32MB
maintenance_work_mem = 64MB
max_fsm_pages = 3300000
max_fsm_relations = 10000
vacuum_cost_delay = 50ms
bgwriter_delay = 150ms
bgwriter_lru_maxpages = 250
bgwriter_lru_multiplier = 2.5
wal_buffers = 8MB
checkpoint_segments = 32
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9
effective_cache_size = 5000MB
default_statistics_target = 100
log_min_duration_statement = 1000
log_checkpoints = on
log_connections = on
log_disconnections = on
log_temp_files = 0
track_counts = on
autovacuum = on
log_autovacuum_min_duration = 0

Thanks for any ideas!
Rob

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-02-02 19:27:59 Re: Queries within a function
Previous Message Andres Freund 2010-02-02 19:08:12 Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)