Re: weird buildfarm failures on arm/mipsel and --with-tcl

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: weird buildfarm failures on arm/mipsel and --with-tcl
Date: 2007-01-24 18:35:44
Message-ID: 45B7A700.3000802@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>> one of my new buildfarm boxes (an Debian/Etch based ARM box) is
>>> sometimes failing to stop the database during the regression tests:
>>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quagga&dt=2007-01-08%2003:03:03
>>> this only seems to happen sometimes and only if --with-tcl is enabled on
>>> quagga.
>>> lionfish (my mipsel box) is able to trigger that on every build if I
>>> enable --with-tcl but it is nearly impossible to debug it there because
>>> of the low amount of memory and diskspace it has.
>> Hm, could pl/tcl somehow be preventing the backend from exiting once
>> it's run any pl/tcl stuff? I have no idea why though, and even less
>> why it wouldn't be repeatable.
>>
>>> After the stopdb failure we still have those processes running:
>>> pgbuild 3488 0.0 2.4 43640 6300 ? Ss 06:15 0:01
>>> postgres: pgbuild pl_regression [local] idle
>> Can you get a stack trace from this process?
>
> (gdb) bt
> #0 0x406b9d80 in __pthread_sigsuspend () from /lib/libpthread.so.0
> #1 0x406b8a7c in __pthread_wait_for_restart_signal () from
> /lib/libpthread.so.0
> #2 0x406b91f8 in pthread_onexit_process () from /lib/libpthread.so.0
> #3 0x40438658 in exit () from /lib/libc.so.6
> #4 0x40438658 in exit () from /lib/libc.so.6
> Previous frame identical to this frame (corrupt stack?)
>
>
>
>>> pgbuild 3489 0.0 0.0 0 0 ? Z 06:15 0:00
>>> [postgres] <defunct>
>> This is a bit odd ... if that process is a direct child of the
>> postmaster it should have been reaped promptly. Could it be a child
>> of the other backend? If so, why was it started? Please try the
>> ps again with whatever switch it needs to list parent process ID.
>
> looks you are right - the defunct 3489 seems to be a child of 3488:
>
> PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
> 1 3389 18341 18341 ? -1 S 1001 0:03
> /home/pgbuild/pgbuildfarm/HEAD/inst/bin/postgres -D data
> 3389 3391 3391 3391 ? -1 Ss 1001 0:00 postgres:
> writer process
> 3389 3392 3392 3392 ? -1 Ss 1001 0:00 postgres: stats
> collector process
> 3389 3488 3488 3488 ? -1 Ss 1001 0:01 postgres:
> pgbuild pl_regression [local] idle
> 3488 3489 3488 3488 ? -1 Z 1001 0:00 [postgres]
> <defunct>

FWIW - I removed --with-tcl from quagga's configuration about two weeks
ago and it has not failed(for that reason) again. So the issue most
definitly looks like plptcl related ...

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2007-01-24 18:41:42 Re: About PostgreSQL certification
Previous Message Merlin Moncure 2007-01-24 18:30:03 Re: Default permissisons from schemas