BUG #3362: xlog corruption just after initdb on irix

From: "Herve Boulouis" <amon(at)sockar(dot)homeip(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3362: xlog corruption just after initdb on irix
Date: 2007-06-05 14:42:54
Message-ID: 200706051442.l55EgshC014255@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 3362
Logged by: Herve Boulouis
Email address: amon(at)sockar(dot)homeip(dot)net
PostgreSQL version: 8.2.4
Operating system: IRIX 6.5.26
Description: xlog corruption just after initdb on irix
Details:

what I did:

- compile postgres 8.2.4 (MIPSPro 7.4.4m compiler)

export CC=c99
export CXX=CC
./configure --enable-integer-datetimes --with-perl --with-python

- compile and install, create pgsql user, ...

- initialize db

su - pgsql
export PGLIB=/usr/local/pgsql/lib
export PGDATA=$HOME/data
initdb

- trigger the bug

/etc/init.d/pgsql start
/etc/init.d/pgsql stop
/etc/init.d/pgsql start
/etc/init.d/pgsql stop
/etc/init.d/pgsql start
<signal 6 on postmaster startup, see logfile>

from that point, postmaster won't start unless another initdb is done.

- command used by the script to start/stop

su - pgsql -c "exec /usr/local/pgsql/bin/pg_ctl start -w -D
'/usr/local/pgsql/data'"
su - pgsql -c "/usr/local/pgsql/bin/pg_ctl stop -D '/usr/local/pgsql/data'
-s -m smart"

- logfile of the 5 start/stop operations :

* first start
Jun 5 15:22:59 6T:origin300 postgres[2007]: [1-1] LOG: could not create
IPv6 socket: Address family not supported by protocol family
Jun 5 15:22:59 6T:origin300 postgres[2007]: [2-1] LOG: could not create
socket for statistics collector: Address family not supported by protocol
family
Jun 5 15:22:59 6T:origin300 postgres[2007]: [3-1] LOG: trying another
address for the statistics collector
Jun 5 15:22:59 6T:origin300 postgres[2008]: [4-1] LOG: database system was
shut down at 2007-06-05 15:21:59 GMT
Jun 5 15:22:59 6T:origin300 postgres[2008]: [5-1] LOG: checkpoint record
is at 0/4866C8
Jun 5 15:22:59 6T:origin300 postgres[2008]: [6-1] LOG: redo record is at
0/4866C8; undo record is at 0/0; shutdown TRUE
Jun 5 15:22:59 6T:origin300 postgres[2008]: [7-1] LOG: next transaction
ID: 0/593; next OID: 10820
Jun 5 15:22:59 6T:origin300 postgres[2008]: [8-1] LOG: next MultiXactId:
1; next MultiXactOffset: 0
Jun 5 15:22:59 6T:origin300 postgres[2008]: [9-1] LOG: database system is
ready
* first stop
Jun 5 15:23:11 6T:origin300 postgres[2007]: [4-1] LOG: received smart
shutdown request
Jun 5 15:23:11 6T:origin300 postgres[2009]: [4-1] LOG: shutting down
Jun 5 15:23:11 6T:origin300 postgres[2009]: [5-1] LOG: database system is
shut down
* 2d start
Jun 5 15:23:17 6T:origin300 postgres[2027]: [1-1] LOG: could not create
IPv6 socket: Address family not supported by protocol family
Jun 5 15:23:17 6T:origin300 postgres[2027]: [2-1] LOG: could not create
socket for statistics collector: Address family not supported by protocol
family
Jun 5 15:23:17 6T:origin300 postgres[2027]: [3-1] LOG: trying another
address for the statistics collector
Jun 5 15:23:17 6T:origin300 postgres[2028]: [4-1] LOG: database system was
shut down at 2007-06-05 15:23:11 GMT
Jun 5 15:23:17 6T:origin300 postgres[2028]: [5-1] LOG: record with zero
length at 0/486718
* vvv I believe this line is not normal, as there was a clean shutdown
Jun 5 15:23:17 6T:origin300 postgres[2028]: [6-1] LOG: invalid primary
checkpoint record
Jun 5 15:23:17 6T:origin300 postgres[2028]: [7-1] LOG: using previous
checkpoint record at 0/4866C8
Jun 5 15:23:17 6T:origin300 postgres[2028]: [8-1] LOG: redo record is at
0/4866C8; undo record is at 0/0; shutdown TRUE
Jun 5 15:23:17 6T:origin300 postgres[2028]: [9-1] LOG: next transaction
ID: 0/593; next OID: 10820
Jun 5 15:23:17 6T:origin300 postgres[2028]: [10-1] LOG: next MultiXactId:
1; next MultiXactOffset: 0
Jun 5 15:23:17 6T:origin300 postgres[2028]: [11-1] LOG: database system
was not properly shut down; automatic recovery in progress
Jun 5 15:23:17 6T:origin300 postgres[2028]: [12-1] LOG: record with zero
length at 0/486718
Jun 5 15:23:17 6T:origin300 postgres[2028]: [13-1] LOG: redo is not
required
Jun 5 15:23:17 6T:origin300 postgres[2028]: [14-1] LOG: database system is
ready
* 2d stop
Jun 5 15:23:22 6T:origin300 postgres[2027]: [4-1] LOG: received smart
shutdown request
Jun 5 15:23:22 6T:origin300 postgres[2029]: [4-1] LOG: shutting down
Jun 5 15:23:22 6T:origin300 postgres[2029]: [5-1] LOG: database system is
shut down
* 3d start
Jun 5 15:23:25 6T:origin300 postgres[2047]: [1-1] LOG: could not create
IPv6 socket: Address family not supported by protocol family
Jun 5 15:23:25 6T:origin300 postgres[2047]: [2-1] LOG: could not create
socket for statistics collector: Address family not supported by protocol
family
Jun 5 15:23:25 6T:origin300 postgres[2047]: [3-1] LOG: trying another
address for the statistics collector
Jun 5 15:23:25 6T:origin300 postgres[2048]: [4-1] LOG: database system was
shut down at 2007-06-05 15:23:22 GMT
Jun 5 15:23:25 6T:origin300 postgres[2048]: [5-1] LOG: record with zero
length at 0/486768
Jun 5 15:23:25 6T:origin300 postgres[2048]: [6-1] LOG: invalid primary
checkpoint record
Jun 5 15:23:25 6T:origin300 postgres[2048]: [7-1] LOG: record with zero
length at 0/486718
Jun 5 15:23:25 6T:origin300 postgres[2048]: [8-1] LOG: invalid secondary
checkpoint record
* vvv this it gets worse
Jun 5 15:23:25 2T:origin300 postgres[2048]: [9-1] PANIC: could not locate
a valid checkpoint record
Jun 5 15:23:26 6T:origin300 postgres[2047]: [4-1] LOG: startup process
(PID 2048) was terminated by signal 6
Jun 5 15:23:26 6T:origin300 postgres[2047]: [5-1] LOG: aborting startup
due to startup process failure

I had already noticed that problem on this platform with 8.0 or 8.1, but
didn't report it at the time.

I kept the core file and here is the backtrace :

origin300:/usr/local/pgsql/data# dbx /usr/local/pgsql/bin/postgres core
dbx version 7.3.4 (86441_Nov11 MR) Nov 11 2002 11:31:55
Core from signal SIGABRT: Abort (see abort(3c))
(dbx) where
> 0 _kill(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/xlv41/6.5.26m/work/irix/lib/libc/libc_n32_M4/signal/kill.s":15,
0xfa54258]
1 _raise(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/xlv41/6.5.26m/work/irix/lib/libc/libc_n32_M4/signal/raise.c":27,
0xfad16dc]
2 abort(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/xlv41/6.5.26m/work/irix/lib/libc/libc_n32_M4/gen/abort.c":52, 0xfa6ef60]
3 errfinish(0x0, 0x0, 0x3, 0x5, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/utils/error/elog.c":451,
0x102d9678]
4 StartupXLOG(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/access/transam/xlog.c":4776,
0x10088638]
5 BootstrapMain(0x103b7524, 0x2, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/bootstrap/bootstrap.c":423,
0x100a3cd4]
6 StartChildProcess(0x2, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/postmaster/postmaster.c":3691,
0x101e6de8]
7 PostmasterMain(0x800, 0xfa49bf0, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1,
0x0)
["/opt/install/postgresql-8.2.4/src/backend/postmaster/postmaster.c":961,
0x101e267c]
8 main(0x3, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/main/main.c":188, 0x1017e684]
9 __start()
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177,
0x1003f2a8]

If necessary, I can provide access to an irix machine if some pgsql
developpers need it to test/maintain the irix port.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2007-06-05 16:06:28 Re: BUG #3362: xlog corruption just after initdb on irix
Previous Message Douglas Toltzman 2007-06-05 14:31:25 Re: BUG #3361: connect problem with server