segfault in postmaster (pg 7.3.2 & FreeBSD)

From: Herve Boulouis <boulouis(at)corp(dot)nerim(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: segfault in postmaster (pg 7.3.2 & FreeBSD)
Date: 2003-05-14 10:59:38
Message-ID: 20030514125938.O12831@amonbophis.noc.nerim.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

============================================================================
POSTGRESQL BUG REPORT TEMPLATE
============================================================================

Your name : herve boulouis
Your email address : boulouis(at)nerim(dot)net

System Configuration
---------------------
Architecture (example: Intel Pentium) : intel pIII (dell poweredge)

Operating System (example: Linux 2.0.26 ELF) : FreeBSD 4.7-STABLE

PostgreSQL version (example: PostgreSQL-7.3.2): PostgreSQL-7.3.2 from ports (pkg postgresql-7.3.2_1)

Compiler used (example: gcc 2.95.2) : gcc version 2.95.4 20020320 [FreeBSD]

Please enter a FULL description of your problem:
------------------------------------------------

My postgresql stores radius accounting tickets (from a freeradius server and a script I wrote)
It typically serves no more than 3 update or insert per second.

The problem is that half an hour ago one of the postmaster processes died with sig11, here's the backtrace :
(sorry, no debug symbols)

norfair:/usr/local/pgsql# gdb /usr/local/bin/postgres postgres.core
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...(no debugging symbols found)...
Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
#0 0x81a3955 in set_config_by_name ()
(gdb) bt
#0 0x81a3955 in set_config_by_name ()
#1 0x283c0756 in ?? ()
#2 0x81a0dcb in set_config_option ()
#3 0x81a33f3 in SetConfigOption ()
#4 0x8092720 in BootstrapMain ()
#5 0x8120525 in PostmasterMain ()
#6 0x811f5c1 in PostmasterMain ()
#7 0xbfbfffac in ?? ()
#8 0x811e858 in PostmasterMain ()
#9 0x80f9dcf in main ()
#10 0x8069aee in _start ()

Nothing special happended at the time of the crash 11:50 (french localtime) except that the freeradius
got a sighup to reload its configuration files.

Note that a second process (pid 84239) core dumped after the first, so the core file is from that process
an not from the first sig11. (see logs below)

Here are the logs of the problem :

May 14 11:49:45 norfair /kernel: pid 84238 (postgres), uid 70: exited on signal 11 (core dumped)
May 14 11:49:45 norfair /kernel: May 14 11:49:45 norfair /kernel: pid 84238 (postgres), uid 70: exited on sign
al 11 (core dumped)
May 14 11:49:45 norfair postgres[743]: [1] LOG: server process (pid 84238) was terminated by signal 11
May 14 11:49:45 norfair postgres[743]: [2] LOG: terminating any other active server processes
May 14 11:49:45 norfair postgres[81722]: [1-1] WARNING: Message from PostgreSQL backend:
May 14 11:49:45 norfair postgres[81722]: [1-2] The Postmaster has informed me that some other backend
May 14 11:49:45 norfair postgres[81722]: [1-3] died abnormally and possibly corrupted shared memory.
May 14 11:49:45 norfair postgres[81722]: [1-4] I have rolled back the current transaction and am
May 14 11:49:45 norfair postgres[81722]: [1-5] going to terminate your database system connection and exit.
May 14 11:49:45 norfair postgres[81722]: [1-6] Please reconnect to the database system and repeat your query.
May 14 11:49:45 norfair postgres[81721]: [1-1] WARNING: Message from PostgreSQL backend:
May 14 11:49:45 norfair postgres[81721]: [1-2] The Postmaster has informed me that some other backend
May 14 11:49:45 norfair postgres[81721]: [1-3] died abnormally and possibly corrupted shared memory.
May 14 11:49:45 norfair postgres[81721]: [1-4] I have rolled back the current transaction and am
May 14 11:49:45 norfair postgres[81721]: [1-5] going to terminate your database system connection and exit.
May 14 11:49:45 norfair postgres[81721]: [1-6] Please reconnect to the database system and repeat your query.
May 14 11:49:45 norfair postgres[81720]: [1-1] WARNING: Message from PostgreSQL backend:
May 14 11:49:45 norfair postgres[81720]: [1-2] The Postmaster has informed me that some other backend
May 14 11:49:45 norfair postgres[81720]: [1-3] died abnormally and possibly corrupted shared memory.
May 14 11:49:45 norfair postgres[81720]: [1-4] I have rolled back the current transaction and am
May 14 11:49:45 norfair postgres[81720]: [1-5] going to terminate your database system connection and exit.
May 14 11:49:45 norfair postgres[81720]: [1-6] Please reconnect to the database system and repeat your query.
May 14 11:49:45 norfair postgres[81719]: [1-1] WARNING: Message from PostgreSQL backend:
May 14 11:49:45 norfair postgres[81719]: [1-2] The Postmaster has informed me that some other backend
May 14 11:49:45 norfair postgres[81719]: [1-3] died abnormally and possibly corrupted shared memory.
May 14 11:49:45 norfair postgres[81719]: [1-4] I have rolled back the current transaction and am
May 14 11:49:45 norfair postgres[81719]: [1-5] going to terminate your database system connection and exit.
May 14 11:49:45 norfair postgres[81719]: [1-6] Please reconnect to the database system and repeat your query.
May 14 11:49:45 norfair postgres[81718]: [1-1] WARNING: Message from PostgreSQL backend:
May 14 11:49:45 norfair postgres[81718]: [1-2] The Postmaster has informed me that some other backend
May 14 11:49:45 norfair postgres[81718]: [1-3] died abnormally and possibly corrupted shared memory.
May 14 11:49:45 norfair postgres[81718]: [1-4] I have rolled back the current transaction and am
May 14 11:49:45 norfair postgres[81718]: [1-5] going to terminate your database system connection and exit.
May 14 11:49:45 norfair postgres[81718]: [1-6] Please reconnect to the database system and repeat your query.
May 14 11:49:45 norfair postgres[81717]: [1-1] WARNING: Message from PostgreSQL backend:
May 14 11:49:45 norfair postgres[81717]: [1-2] The Postmaster has informed me that some other backend
May 14 11:49:45 norfair postgres[81717]: [1-3] died abnormally and possibly corrupted shared memory.
May 14 11:49:45 norfair postgres[81717]: [1-4] I have rolled back the current transaction and am
May 14 11:49:45 norfair /kernel: pid 84239 (postgres), uid 70: exited on signal 11 (core dumped)
May 14 11:49:45 norfair /kernel: May 14 11:49:45 norfair /kernel: pid 84239 (postgres), uid 70: exited on sign
al 11 (core dumped)
May 14 11:49:45 norfair postgres[743]: [4] LOG: startup process (pid 84239) was terminated by signal 11
May 14 11:49:45 norfair postgres[743]: [5] LOG: aborting startup due to startup process failure

It also has to be said that the freeradius automatically get a sighup at minute 10, 30 and 50 and at 10 & 30
of the hour of the crash (11) I got plenty of auth failed but nothing had been changed on any side except
that I had bumped the freeradius max sql connection from 20 to 25 to handle the load.

Example :

May 14 11:09:52 norfair postgres[84139]: [1] FATAL: Password authentication failed for user "radius"
failed for user "radius"
May 14 11:09:52 norfair postgres[84140]: [1] FATAL: Password authentication failed for user "radius"
failed for user "radius"
May 14 11:29:47 norfair postgres[84176]: [1] FATAL: Password authentication failed for user "radius"
failed for user "radius"
May 14 11:29:47 norfair postgres[84177]: [1] FATAL: Password authentication failed for user "radius"
failed for user "radius"

So maybe there had been corruption in one of the pgsql processes long before the crash.

After the crash, I reloaded the postgres and the freeradius without changing anything and all is
working fine now.

Is this a known bug ?

Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

Sorry, this happened half an hour ago and I don't it to reproduce :))

I still have the core file for examination if anyones wants it.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

dunno.

Regards

--
Herve Boulouis - Nerim

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Todd Nemanich 2003-05-14 13:55:37 Re: shared memory corruption
Previous Message Tom Lane 2003-05-13 23:32:31 Re: shared memory corruption