My investigations of the postmaster Bus error

From: Martin Pitt <martin(at)piware(dot)de>
To: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: My investigations of the postmaster Bus error
Date: 2005-10-11 19:13:15
Message-ID: 20051011191315.GB11868@piware.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-patches

Hi PostgreSQL developers!

There have already been some reports about the mysterious Bus error
that postmaster dies with on some architectures. Since that bites
pretty hard, I did some investigations and tests on various
architectures with various configurations.

As background, Debian currently builds with gcc 4.0.2 by default, and
I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is
to build with -O2.

Here are the results:

* On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work
fine with all tested compiler versions (gcc 3.3.3 and 4.0.2).

* On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a
bus error when ran from initdb. It works fine as soon as I

- build with gcc 3.3 or
- build with -O0 or
- run postmaster through initdb under gdb (grumpf) or
- run postmaster through initdb under strace or
- run postmaster directly (not through initdb).

Yay Heisenbugs. :-/

Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2.

* And then there is MIPS, which really sucks. It constantly crashes
in all configurations I tried it with:

8.0 with gcc-4.0 -O2
8.0 with gcc-4.0 -O0
8.0 with gcc-3.3 -O2
8.0 with gcc-3.3 -O2 and --disable-spinlocks
7.4 with gcc-4.0 -O2 original without any patches
7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch

This also produces an usable backtrace:

Starting program:
/home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/bin/postmaster

Program received signal SIGBUS, Bus error.
0x006e4f80 in InitializeGUCOptions () at guc.c:2360
2360 *conf->variable =
conf->reset_val;
(gdb) bt
#0 0x006e4f80 in InitializeGUCOptions () at guc.c:2360
#1 0x005c7f68 in PostmasterMain (argc=1, argv=0x100539e0) at postmaster.c:439
#2 0x0056f874 in main (argc=1, argv=0x100539e0) at main.c:268

Some weeks ago I tracked down the particular variable it fails on
(some float variable; unfortunately I forgot the name, but if it is
important, I can redo the research), but I did not find any
datatype mismatch or similar obvious things.

Does anybody have an idea about these bus errors? Also, if somebody
wants to track down the MIPS bug: I can offer temporary ssh access to
a Debian sid with all required build dependencies, gdb, and the like
for debugging.

Thanks and have a nice day!

Martin

--
Martin Pitt http://www.piware.de
Ubuntu Developer http://www.ubuntu.com
Debian Developer http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Fuhr 2005-10-11 20:00:20 Re: .pgpass does not work for createlang
Previous Message Michael Fuhr 2005-10-11 18:02:57 Re: .pgpass does not work for createlang

Browse pgsql-patches by date

  From Date Subject
Next Message Jim C. Nasby 2005-10-11 23:10:29 Re: My investigations of the postmaster Bus error
Previous Message Martijn van Oosterhout 2005-10-10 14:52:16 Re: [PATCH] Using pread instead of lseek (with analysis)