BUG #2858: postgres periodically restarts (problem with MemoryContextAllocZeroAligned)...

From: "Robert Locke" <rob(at)mobius(dot)ph>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #2858: postgres periodically restarts (problem with MemoryContextAllocZeroAligned)...
Date: 2006-12-22 10:39:19
Message-ID: 200612221039.kBMAdJ7r033547@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 2858
Logged by: Robert Locke
Email address: rob(at)mobius(dot)ph
PostgreSQL version: 8.1.4
Operating system: FreeBSD 6.1-RELEASE-p6
Description: postgres periodically restarts (problem with
MemoryContextAllocZeroAligned)...
Details:

We recently began experiencing a problem with postgres where the server
would periodically restart with messages such as the following in the LOG
file:

Dec 22 14:15:56 MOv2DB postgres[38675]: [100-1] WARNING: terminating
connection because of crash of another server process
Dec 22 14:15:56 MOv2DB postgres[38675]: [100-2] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Dec 22 14:15:56 MOv2DB postgres[38675]: [100-3] process exited abnormally
and possibly corrupted shared memory.
Dec 22 14:15:56 MOv2DB postgres[38675]: [100-4] HINT: In a moment you
should be able to reconnect to the database and repeat your command.

"dmesg" would reveal errors such as:

pid 34866 (postgres), uid 70: exited on signal 11 (core dumped)
pid 43893 (postgres), uid 70: exited on signal 11 (core dumped)
pid 43907 (postgres), uid 70: exited on signal 11 (core dumped)
pid 46337 (postgres), uid 70: exited on signal 11 (core dumped)

We enabled query logging and found that the process would sometimes die when
a function called "removeAccount" was executed:

46337 2006-12-22 14:21:56 PHT 10.48.14.246 LOG: statement: SELECT * FROM
core."removeAccount"(5130175)
45166 2006-12-22 14:21:59 PHT LOG: server process (PID 46337) was
terminated by signal 11

This function simply executes a number of delete statements to remove a user
from the system. We discovered, however, that it was a little slow (3 - 4
seconds) because the final delete removed the record from a table which is
referenced as a foreign key in a number of other tables.

Adding a couple of indices greatly improved the performance of the function,
and the problem has now disappeared. However, we are concerned that this
might indicate a more severe problem with Postgres which might cause further
issues down the road.

Here's a back trace of the core dump for reference:

#0 0x08079d7f in heap_modifytuple ()
#1 0x08079eb6 in slot_getattr ()
#2 0x0816344d in ExecMakeFunctionResult ()
#3 0x081675b7 in ExecQual ()
#4 0x08167bae in ExecScan ()
#5 0x08175547 in ExecSeqScan ()
#6 0x08161b52 in ExecProcNode ()
#7 0x08160a8e in ExecutorRun ()
#8 0x0817ae0f in spi_printtup ()
#9 0x0817b9b0 in SPI_execute_snapshot ()
#10 0x00000000 in ?? ()
#11 0x00000000 in ?? ()
#12 0x00000000 in ?? ()
#13 0x00000001 in ?? ()
#14 0x083ecc88 in ?? ()
#15 0xbfbfa3e8 in ?? ()
#16 0x0000000a in ?? ()
#17 0x08607018 in ?? ()
#18 0x00000001 in ?? ()
#19 0xbfbfa588 in ?? ()
#20 0x08299fa9 in RI_Initial_Check ()
#21 0x083ecad8 in ?? ()
#22 0xbfbfa458 in ?? ()
#23 0x082e08ca in MemoryContextAllocZeroAligned ()
Previous frame inner to this frame (corrupt stack?)

Any ideas?

Browse pgsql-bugs by date

  From Date Subject
Next Message Aaron Bingham 2006-12-22 13:07:31 SIMILAR TO incorrect with alternation
Previous Message Tomislav Karastojkovic 2006-12-22 09:16:59 BUG #2857: Sequence and table partitioning