Re: weird buildfarm failures on arm/mipsel and --with-tcl

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: weird buildfarm failures on arm/mipsel and --with-tcl
Date: 2007-01-09 07:24:19
Message-ID: 45A34323.4080100@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> one of my new buildfarm boxes (an Debian/Etch based ARM box) is
>> sometimes failing to stop the database during the regression tests:
>
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quagga&dt=2007-01-08%2003:03:03
>
>> this only seems to happen sometimes and only if --with-tcl is enabled on
>> quagga.
>
>> lionfish (my mipsel box) is able to trigger that on every build if I
>> enable --with-tcl but it is nearly impossible to debug it there because
>> of the low amount of memory and diskspace it has.
>
> Hm, could pl/tcl somehow be preventing the backend from exiting once
> it's run any pl/tcl stuff? I have no idea why though, and even less
> why it wouldn't be repeatable.
>
>> After the stopdb failure we still have those processes running:
>> pgbuild 3488 0.0 2.4 43640 6300 ? Ss 06:15 0:01
>> postgres: pgbuild pl_regression [local] idle
>
> Can you get a stack trace from this process?

(gdb) bt
#0 0x406b9d80 in __pthread_sigsuspend () from /lib/libpthread.so.0
#1 0x406b8a7c in __pthread_wait_for_restart_signal () from
/lib/libpthread.so.0
#2 0x406b91f8 in pthread_onexit_process () from /lib/libpthread.so.0
#3 0x40438658 in exit () from /lib/libc.so.6
#4 0x40438658 in exit () from /lib/libc.so.6
Previous frame identical to this frame (corrupt stack?)

>
>> pgbuild 3489 0.0 0.0 0 0 ? Z 06:15 0:00
>> [postgres] <defunct>
>
> This is a bit odd ... if that process is a direct child of the
> postmaster it should have been reaped promptly. Could it be a child
> of the other backend? If so, why was it started? Please try the
> ps again with whatever switch it needs to list parent process ID.

looks you are right - the defunct 3489 seems to be a child of 3488:

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 3389 18341 18341 ? -1 S 1001 0:03
/home/pgbuild/pgbuildfarm/HEAD/inst/bin/postgres -D data
3389 3391 3391 3391 ? -1 Ss 1001 0:00 postgres:
writer process
3389 3392 3392 3392 ? -1 Ss 1001 0:00 postgres: stats
collector process
3389 3488 3488 3488 ? -1 Ss 1001 0:01 postgres:
pgbuild pl_regression [local] idle
3488 3489 3488 3488 ? -1 Z 1001 0:00 [postgres]
<defunct>

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2007-01-09 08:39:31 Re: -f <output file> option for pg_dumpall
Previous Message Tom Lane 2007-01-09 06:59:52 Re: [COMMITTERS] pgsql: Widen the money type to 64 bits.