Kernel kills postgres process - help need

From: Hervé Piedvache <bill(dot)footcow(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Kernel kills postgres process - help need
Date: 2008-01-09 21:57:06
Message-ID: 200801092257.06912.bill.footcow@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have a big trouble with a PostgreSQL server ... regulary since I have added
8 Gb of memory, on a server having already 8Gb of memory, I have troubles.
Nothing else have changed ... I'm on a Dell server, and all the memory
diagnostics from Dell seems to be good ...
When I have a lot of connexions (persistante connexions from 6 web apache/php
serveurs using PDO, about 110 process on each web servers) on the server, or
long request, it's difficult for me to know when it's appening, the kernel
seems to kill my postgresql process then the server become completly
instable, and most of the time need a reboot ...

I'm on Linux kernel 2.6.15 with a version 8.1.10 of PostgreSQL.
My database is a size of 56G
RAM = 16 Gb

kernel shmmax : 941604096

Postgresql config :
max_connections = 2048
shared_buffers = 40000
#temp_buffers = 1000 # min 100, 8KB each
work_mem = 2048 # min 64, size in KB
maintenance_work_mem = 512000 # min 1024, size in KB
max_stack_depth = 4096 # min 100, size in KB
max_fsm_pages = 25000000
max_fsm_relations = 2000 # min 100, ~70 bytes each
max_files_per_process = 255 # min 25
fsync = on
wal_buffers = 128 # min 4, 8KB each
commit_delay = 500 # range 0-100000, in microseconds
commit_siblings = 5 # range 1-1000
checkpoint_segments = 160
effective_cache_size = 600000 # typically 8KB each
random_page_cost = 2

Syslog when crashing :
Jan 9 20:30:47 db2 kernel: oom-killer: gfp_mask=0x84d0, order=0
Jan 9 20:30:48 db2 kernel: Mem-info:
Jan 9 20:30:48 db2 kernel: DMA per-cpu:
Jan 9 20:30:48 db2 kernel: cpu 0 hot: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 0 cold: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 1 hot: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 1 cold: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 2 hot: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 2 cold: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 3 hot: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 3 cold: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: DMA32 per-cpu: empty
Jan 9 20:30:48 db2 kernel: Normal per-cpu:
Jan 9 20:30:48 db2 kernel: cpu 0 hot: low 0, high 186, batch 31 used:5
Jan 9 20:30:48 db2 kernel: cpu 0 cold: low 0, high 62, batch 15 used:59
Jan 9 20:30:48 db2 kernel: cpu 1 hot: low 0, high 186, batch 31 used:22
Jan 9 20:30:48 db2 kernel: cpu 1 cold: low 0, high 62, batch 15 used:49
Jan 9 20:30:48 db2 kernel: cpu 2 hot: low 0, high 186, batch 31 used:33
Jan 9 20:30:48 db2 kernel: cpu 2 cold: low 0, high 62, batch 15 used:60
Jan 9 20:30:48 db2 kernel: cpu 3 hot: low 0, high 186, batch 31 used:3
Jan 9 20:30:48 db2 kernel: cpu 3 cold: low 0, high 62, batch 15 used:55
Jan 9 20:30:48 db2 kernel: HighMem per-cpu:
Jan 9 20:30:48 db2 kernel: cpu 0 hot: low 0, high 186, batch 31 used:5
Jan 9 20:30:48 db2 kernel: cpu 0 cold: low 0, high 62, batch 15 used:5
Jan 9 20:30:48 db2 kernel: cpu 1 hot: low 0, high 186, batch 31 used:11
Jan 9 20:30:48 db2 kernel: cpu 1 cold: low 0, high 62, batch 15 used:4
Jan 9 20:30:48 db2 kernel: cpu 2 hot: low 0, high 186, batch 31 used:17
Jan 9 20:30:48 db2 kernel: cpu 2 cold: low 0, high 62, batch 15 used:14
Jan 9 20:30:48 db2 kernel: cpu 3 hot: low 0, high 186, batch 31 used:14
Jan 9 20:30:48 db2 kernel: cpu 3 cold: low 0, high 62, batch 15 used:9
Jan 9 20:30:48 db2 kernel: Free pages: 497624kB (490232kB HighMem)
Jan 9 20:30:48 db2 kernel: Active:3604892 inactive:234379 dirty:20273
writeback:210 unstable:0 free:124406 slab:49119 mapped:547571
pagetables:139724
Jan 9 20:30:48 db2 kernel: DMA free:3588kB min:68kB low:84kB high:100kB
active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable?
yes
Jan 9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 880 17392
Jan 9 20:30:48 db2 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jan 9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 880 17392
Jan 9 20:30:48 db2 kernel: Normal free:3804kB min:3756kB low:4692kB
high:5632kB active:508kB inactive:464kB present:901120kB pages_scanned:975
all_unreclaimable? yes
Jan 9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 0 132096
Jan 9 20:30:48 db2 kernel: HighMem free:490108kB min:512kB low:18148kB
high:35784kB active:14419044kB inactive:937112kB present:16908288kB
pages_scanned:0 all_unreclaimable? no
Jan 9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 0 0
Jan 9 20:30:48 db2 kernel: DMA: 1*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB
1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
Jan 9 20:30:48 db2 kernel: DMA32: empty
Jan 9 20:30:48 db2 kernel: Normal: 35*4kB 0*8kB 7*16kB 5*32kB 1*64kB 0*128kB
1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3804kB
Jan 9 20:30:48 db2 kernel: HighMem: 29171*4kB 43358*8kB 1620*16kB 8*32kB
0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 490108kB
Jan 9 20:30:48 db2 kernel: Swap cache: add 161, delete 160, find 98/138, race
0+0
Jan 9 20:30:48 db2 kernel: Free swap = 15623168kB
Jan 9 20:30:48 db2 kernel: Total swap = 15623172kB
Jan 9 20:30:48 db2 kernel: Free swap: 15623168kB
Jan 9 20:30:48 db2 kernel: oom-killer: gfp_mask=0x84d0, order=0
Jan 9 20:30:48 db2 kernel: Mem-info:
Jan 9 20:30:48 db2 kernel: DMA per-cpu:
Jan 9 20:30:48 db2 postgres[7634]: [2-1] LOG: background writer process (PID
7639) was terminated by signal 9
Jan 9 20:30:48 db2 kernel: cpu 0 hot: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 kernel: cpu 0 cold: low 0, high 0, batch 1 used:0
Jan 9 20:30:48 db2 postgres[7634]: [3-1] LOG: terminating any other active
server processes
Jan 9 20:30:48 db2 postgres[4058]: [2-1] WARNING: terminating connection
because of crash of another server process
Jan 9 20:30:48 db2 postgres[4058]: [2-2] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Jan 9 20:30:48 db2 postgres[4058]: [2-3] process exited abnormally and
possibly corrupted shared memory.
Jan 9 20:30:48 db2 postgres[4044]: [2-1] WARNING: terminating connection
because of crash of another server process
Jan 9 20:30:48 db2 postgres[4058]: [2-4] HINT: In a moment you should be
able to reconnect to the database and repeat your command.
Jan 9 20:30:48 db2 postgres[4023]: [2-1] WARNING: terminating connection
because of crash of another server process
Jan 9 20:30:48 db2 postgres[4023]: [2-2] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Jan 9 20:30:48 db2 postgres[4023]: [2-3] process exited abnormally and
possibly corrupted shared memory.
Jan 9 20:30:48 db2 postgres[4023]: [2-4] HINT: In a moment you should be
able to reconnect to the database and repeat your command.
etc.

At this moment I had 877 connexions ... nothing very big for our activity.

If somebody have any idea ... a bad configuration parameter ... or another
idea to solve my problem ... help will be really appreciated.

Regards,
--
Hervé

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Davis 2008-01-09 22:17:14 Re: Kernel kills postgres process - help need
Previous Message Zoltan Boszormenyi 2008-01-09 21:13:35 Re: count(*) and bad design was: Experiences with extensibility