| From: | Richard Neill <rn214(at)cam(dot)ac(dot)uk> | 
|---|---|
| To: | pgsql-bugs(at)postgresql(dot)org | 
| Subject: | Postgresql 8.4.1 segfault, backtrace | 
| Date: | 2009-09-24 06:13:55 | 
| Message-ID: | 4ABB0E23.1010704@cam.ac.uk | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
Dear All,
I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and 
we've found that this is still happening repeatedly in 8.4.1. We're in a 
  bit of a bind, as this is a production system, and we get segfaults 
every few hours.
[It's a testament to how good the postgres crash recovery is that, with 
a reasonably small value of checkpoint_segments = 4, recovery happens in 
30 seconds, and the warehouse systems seem to continue OK].
The version I'm using is 8.4.1, in the source package provided for 
Ubuntu Karmic, compiled by me on a 64-bit server (running Ubuntu Jaunty).
I'm not sufficiently expert to debug it very far, but I wonder whether 
the following info from GDB would help one of the hackers here (I've 
trimmed out the uninteresting bits):
------------
$ gdb /usr/lib/postgresql/8.4/bin/postgres core.200909030901
GNU gdb 6.8-debian
This GDB was configured as "x86_64-linux-gnu"...
Core was generated by `postgres: fensys fswcs [local] startup 
                              '.
Program terminated with signal 11, Segmentation fault.
[New process 14965]
#0  RelationCacheInitializePhase2 () at relcache.c:2654
2654                    if (relation->rd_rel->relhasrules && 
relation->rd_rules == NULL)
(gdb) bt
#0  RelationCacheInitializePhase2 () at relcache.c:2654
#1  0x00007f61355a1021 in InitPostgres (in_dbname=0x7f613788c610 
"fswcs", dboid=0, username=0x7f6137889450 "fensys", out_dbname=0x0) at 
postinit.c:576
#2  0x00007f61354dbcc5 in PostgresMain (argc=4, argv=0x7f6137889480, 
username=0x7f6137889450 "fensys") at postgres.c:3334
#3  0x00007f61354aefdd in ServerLoop () at postmaster.c:3447
#4  0x00007f61354afecc in PostmasterMain (argc=5, argv=0x7f6137885140) 
at postmaster.c:1040
#5  0x00007f61354568ce in main (argc=5, argv=0x7f6137885140) at main.c:188
(gdb) quit
-------------
A few more bits of info:
The backtrace points to line 2654 in relcache.c, in
   RelationCacheInitializePhase2()
There is a NULL dereference of "relation"
  => needNewCacheFile = false
     criticalRelcachesBuilt = true
=> nothing is happening before it enters the failure code block.
I can give you a core dump if anyone would like to see it, but it's 405 
MB after bzipping.
One last observation: a dump and restore of the DB seems to prevent it 
crashing for about a day.
Thank you for your help,
Richard
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sergey Manakov | 2009-09-24 09:05:59 | BUG #5078: returns setof functions fails after table structure altered | 
| Previous Message | Yaming Gu | 2009-09-24 05:59:40 | 答复: [BUGS] Encounter shared memory error when running createlang command! |