isolation test deadlocking on buildfarm member coypu

From: Rémi Zara <remi_zara(at)mac(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: isolation test deadlocking on buildfarm member coypu
Date: 2011-07-16 15:50:45
Message-ID: 4082D6E2-8965-4D90-A3AC-E1008DFEF23F@mac.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Isolation tests seem to deadlock on buildfarm member coypu (NetBSD/powerpc 5.1).

Here are the process for one deadlock (several days) :

pgbuildfarm 2405 0.0 1.2 26948 7916 ? Is Wed02AM 0:00.25 postgres: pgbuildfarm isolationtest [local] INSERT waiting
pgbuildfarm 6559 0.0 0.4 10288 2684 ? I Wed02AM 0:00.05 ./pg_isolation_regress --psqldir=/home/pgbuildfarm/workdir/HEAD/inst/bin --inputdir=. --schedule=./isolation_schedule (pg_isolation_reg)
pgbuildfarm 8296 0.0 1.8 25924 11880 ? Ss Wed02AM 0:59.79 postgres: writer process
pgbuildfarm 10630 0.0 0.4 16212 2916 ? Ss Wed02AM 1:43.16 postgres: stats collector process
pgbuildfarm 11814 0.0 0.7 9084 4408 ? I Wed01AM 0:00.39 /usr/pkg/bin/perl ./run_branches.pl --run-one
pgbuildfarm 14785 0.0 0.2 3344 1276 ? I Wed02AM 0:00.03 gmake NO_LOCALE=1 installcheck
pgbuildfarm 15353 0.0 0.8 26948 4960 ? Ss Wed02AM 1:10.55 postgres: autovacuum launcher process
pgbuildfarm 16619 0.0 1.8 15992 11512 ? I Wed01AM 0:04.90 /usr/pkg/bin/perl ./run_build.pl --config build-farm.conf HEAD
pgbuildfarm 17532 0.0 0.5 25924 3052 ? Ss Wed02AM 0:34.49 postgres: wal writer process
pgbuildfarm 18073 0.0 1.0 25924 6844 ? I Wed02AM 0:19.83 /home/pgbuildfarm/workdir/HEAD/inst/bin/postgres -D data-C
pgbuildfarm 21310 0.0 1.2 26948 7948 ? Is Wed02AM 0:00.06 postgres: pgbuildfarm isolationtest [local] idle in transaction
pgbuildfarm 24137 0.0 0.2 3456 1064 ? Is Wed01AM 0:00.01 /bin/sh -c cd /home/pgbuildfarm/client-code && ./run_branches.pl --run-one
pgbuildfarm 26932 0.0 0.2 3456 1072 ? I Wed02AM 0:00.01 sh -c cd pgsql.16619/src/test/isolation && gmake NO_LOCALE=1 installcheck 2>&1
pgbuildfarm 28430 0.0 1.3 26948 8816 ? Ss Wed02AM 408:43.91 postgres: pgbuildfarm isolationtest [local] idle
pgbuildfarm 28809 0.0 0.4 10476 2844 ? R Wed02AM 27:48.24 ./isolationtester dbname=isolationtest

I killed one of the postgres process, and it lead to a failure of the buildfarm perl script (probably due to the 2GB+ log file), and persistance of the postgres cluster (which I had to clean manually)
I've kept the source and install directories, if useful for investigation.

Here are the processes for a second lock (several hours):

pgbuildfarm 1423 0.0 0.4 16212 2888 ? Ss 2:11PM 0:04.07 postgres: stats collector process
pgbuildfarm 1604 0.0 1.2 26948 7724 ? Is 2:13PM 0:00.05 postgres: pgbuildfarm isolationtest [local] idle in transaction
pgbuildfarm 2380 0.0 0.4 10288 2624 ? I 2:12PM 0:00.05 ./pg_isolation_regress --psqldir=/home/pgbuildfarm/workdir/HEAD/inst/bin --inputdir=. --schedule=./isolation_schedule (pg_isolation_reg)
pgbuildfarm 3156 0.0 0.8 26948 4920 ? Ss 2:11PM 0:02.71 postgres: autovacuum launcher process
pgbuildfarm 3253 0.0 0.5 25924 3044 ? Ss 2:11PM 0:01.65 postgres: wal writer process
pgbuildfarm 9889 0.0 1.8 25924 11856 ? Ss 2:11PM 0:04.87 postgres: writer process
pgbuildfarm 11359 0.0 1.4 26948 9032 ? Ss 2:13PM 15:16.63 postgres: pgbuildfarm isolationtest [local] idle
pgbuildfarm 13738 0.0 1.2 26948 7860 ? Ss 2:13PM 0:00.06 postgres: pgbuildfarm isolationtest [local] UPDATE waiting
pgbuildfarm 14031 0.0 1.0 25924 6800 ? S 2:11PM 0:01.82 /home/pgbuildfarm/workdir/HEAD/inst/bin/postgres -D data-C
pgbuildfarm 16555 0.0 0.4 10476 2764 ? S 2:13PM 1:01.77 ./isolationtester dbname=isolationtest
pgbuildfarm 18038 0.0 0.2 3456 1008 ? I 2:11PM 0:00.01 sh -c cd pgsql.20707/src/test/isolation && gmake NO_LOCALE=1 installcheck 2>&1
pgbuildfarm 19057 0.0 0.2 3456 1000 ? Is 1:30PM 0:00.01 /bin/sh -c cd /home/pgbuildfarm/client-code && ./run_branches.pl --run-one
pgbuildfarm 20707 0.0 1.8 15992 11460 ? I 1:30PM 0:04.92 /usr/pkg/bin/perl ./run_build.pl --config build-farm.conf HEAD
pgbuildfarm 21749 0.0 0.2 3344 1248 ? I 2:11PM 0:00.04 gmake NO_LOCALE=1 installcheck
pgbuildfarm 23054 0.0 0.7 9084 4352 ? I 1:30PM 0:00.38 /usr/pkg/bin/perl ./run_branches.pl --run-one
pgbuildfarm 2125 0.0 0.0 2652 0 ttyp0 R+ 5:09PM 0:00.00 grep pgbuildf (bash)
pgbuildfarm 9930 0.0 0.3 2652 2072 ttyp0 S 5:03PM 0:00.09 bash
pgbuildfarm 29194 0.0 0.1 3552 872 ttyp0 O+ 5:09PM 0:00.01 ps -aux
pgbuildfarm(at)cube:workdir$date
Sat Jul 16 17:10:13 CEST 2011

The log fills with:

[4e21807d.2c5f:1060074] LOG: execute isolationtester_waiting: SELECT 1 WHERE pg_stat_get_backend_waiting($1)
[4e21807d.2c5f:1060075] DETAIL: parameters: $1 = '5'
[4e21807d.2c5f:1060076] LOG: execute isolationtester_waiting: SELECT 1 WHERE pg_stat_get_backend_waiting($1)
[4e21807d.2c5f:1060077] DETAIL: parameters: $1 = '5'

I killed process 1604 (postgres: pgbuildfarm isolationtest [local] idle in transaction). Again, the buildfarm perlscript kicked into action. See http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=coypu&dt=2011-07-16%2011%3A30%3A02 for the logs

Regards,

Rémi Zara

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-07-16 15:51:54 Re: Commitfest Status: Sudden Death Overtime
Previous Message ktm@rice.edu 2011-07-16 15:26:25 Re: Mysterious server crashes