buildfarm instance bichir stuck

From: Robins Tharakan <tharakan(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: buildfarm instance bichir stuck
Date: 2021-04-07 05:43:43
Message-ID: CAEP4nAymAZP1VEBNoWAQca85ZtU5YxuwS95+Vu+XW+-eMfq_vQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Bichir's been stuck for the past month and is unable to run regression
tests since 6a2a70a02018d6362f9841cc2f499cc45405e86b.

It is interesting that that commit's a month old and probably no other
client has complained since, but diving in, I can see that it's been unable
to even start regression tests after that commit went in.

Note that Bichir is running on WSL1 (not WSL2) - i.e. Windows Subsystem for
Linux inside Windows 10 - and so isn't really production use-case. The only
run that actually got submitted to Buildfarm was from a few days back when
I killed it after a long wait - see [1].

Since yesterday, I have another run that's again stuck on CREATE DATABASE
(see outputs below) and although pstack not working may be a limitation of
the architecture / installation (unsure), a trace shows it is stuck at poll.

Tracing commits, it seems that the commit
6a2a70a02018d6362f9841cc2f499cc45405e86b broke things and I can confirm
that 'make check' works if I rollback to the preceding commit (
83709a0d5a46559db016c50ded1a95fd3b0d3be6 ).

Not sure if many agree but 2 things stood out here:
1) Buildfarm never got the message that a commit broke an instance. Ideally
I'd have expected buildfarm to have an optimistic timeout that could have
helped - for e.g. right now, the CREATE DATABASE is still stuck since 18
hrs.

2) bichir is clearly not a production use-case (it takes 5 hrs to complete
a HEAD run!), so let me know if this change is intentional (I guess I'll
stop maintaining it if so) but thought I'd still put this out in case
it interests someone.

-
thanks
robins

Reference:
1) Last run that I had to kill -
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&dt=2021-03-31%2012%3A00%3A05

#####################################################
The current run is running since yesterday.

postgres(at)WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$
tail -2 lastcommand.log
running on port 5678 with PID 8715
============== creating database "regression" ==============

postgres(at)WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$ date
Wed Apr 7 12:48:26 AEST 2021

postgres(at)WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$ ls
-la
total 840
drwxrwxr-x 1 postgres postgres 4096 Apr 6 09:00 .
drwxrwxr-x 1 postgres postgres 4096 Apr 6 08:55 ..
-rw-rw-r-- 1 postgres postgres 1358 Apr 6 08:55 SCM-checkout.log
-rw-rw-r-- 1 postgres postgres 91546 Apr 6 08:56 configure.log
-rw-rw-r-- 1 postgres postgres 40 Apr 6 08:55 githead.log
-rw-rw-r-- 1 postgres postgres 2890 Apr 6 09:01 lastcommand.log
-rw-rw-r-- 1 postgres postgres 712306 Apr 6 09:00 make.log

root(at)WSLv1:~# pstack 8729
8729: psql -X -c CREATE DATABASE "regression" TEMPLATE=template0
LC_COLLATE='C' LC_CTYPE='C' postgres
pstack: Bad address
failed to read target.

root(at)WSLv1:~# gdb -batch -ex bt -p 8729
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f41a8ea4c84 in __GI___poll (fds=fds(at)entry=0x7fffe13d7be8,
nfds=nfds(at)entry=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
#0 0x00007f41a8ea4c84 in __GI___poll (fds=fds(at)entry=0x7fffe13d7be8,
nfds=nfds(at)entry=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007f41a9bc8eb1 in poll (__timeout=<optimized out>, __nfds=1,
__fds=0x7fffe13d7be8) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2 pqSocketPoll (end_time=-1, forWrite=0, forRead=1, sock=<optimized out>)
at fe-misc.c:1133
#3 pqSocketCheck (conn=0x7fffd979a0b0, forRead=1, forWrite=0, end_time=-1)
at fe-misc.c:1075
#4 0x00007f41a9bc8ff0 in pqWaitTimed (forRead=<optimized out>,
forWrite=<optimized out>, conn=0x7fffd979a0b0, finish_time=<optimized out>)
at fe-misc.c:1007
#5 0x00007f41a9bc5ac9 in PQgetResult (conn=0x7fffd979a0b0) at
fe-exec.c:1963
#6 0x00007f41a9bc5ea3 in PQexecFinish (conn=0x7fffd979a0b0) at
fe-exec.c:2306
#7 0x00007f41a9bc5ef2 in PQexec (conn=<optimized out>,
query=query(at)entry=0x7fffd9799f70
"CREATE DATABASE \"regression\" TEMPLATE=template0 LC_COLLATE='C'
LC_CTYPE='C'") at fe-exec.c:2148
#8 0x00007f41aa21e7a0 in SendQuery (query=0x7fffd9799f70 "CREATE DATABASE
\"regression\" TEMPLATE=template0 LC_COLLATE='C' LC_CTYPE='C'") at
common.c:1303
#9 0x00007f41aa2160a6 in main (argc=<optimized out>, argv=<optimized out>)
at startup.c:369

#####################################################

Here we can see that 83709a0d5a46559db016c50ded1a95fd3b0d3be6 goes past
'CREATE DATABASE'
=======================
robins(at)WSLv1:~/proj/postgres/postgres$ git checkout
83709a0d5a46559db016c50ded1a95fd3b0d3be6
Previous HEAD position was 6a2a70a020 Use signalfd(2) for epoll latches.
HEAD is now at 83709a0d5a Use SIGURG rather than SIGUSR1 for latches.

robins(at)WSLv1:~/proj/postgres/postgres$ cd src/test/regress/

robins(at)WSLv1:~/proj/postgres/postgres/src/test/regress$ make -j4
NO_LOCALE=1 check
make -C ../../../src/backend generated-headers
rm -rf ./testtablespace
make[1]: Entering directory
'/home/robins/proj/postgres/postgres/src/backend'
make -C catalog distprep generated-header-symlinks
make -C utils distprep generated-header-symlinks
mkdir ./testtablespace
make[2]: Entering directory
'/home/robins/proj/postgres/postgres/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory
'/home/robins/proj/postgres/postgres/src/backend/utils'
make[2]: Entering directory
'/home/robins/proj/postgres/postgres/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory
'/home/robins/proj/postgres/postgres/src/backend/catalog'
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend'
make -C ../../../src/port all
rm -rf '/home/robins/proj/postgres/postgres'/tmp_install
make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/port'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/port'
make -C ../../../src/common all
make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/common'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/common'
make -C ../../../contrib/spi
make[1]: Entering directory
'/home/robins/proj/postgres/postgres/contrib/spi'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/contrib/spi'
/bin/mkdir -p '/home/robins/proj/postgres/postgres'/tmp_install/log
make -C '../../..'
DESTDIR='/home/robins/proj/postgres/postgres'/tmp_install install
>'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1
make -j1 checkprep
>>'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1
PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/bin:$PATH"
LD_LIBRARY_PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/li
b" ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir= --no-locale --dlpath=. --max-concurrent-tests=20
--schedule=./parallel_sched ule
============== removing existing temp instance ==============
============== creating temporary instance ==============
============== initializing database system ==============
============== starting postmaster ==============
running on port 58080 with PID 25879
============== creating database "regression" ==============
CREATE DATABASE
ALTER DATABASE
============== running regression test queries ==============
test tablespace ... ok 1239 ms
parallel group (20 tests): boolean char varchar name text int2 int4 int8
oid float4 float8 bit^CGNUmakefile:132: recipe for target 'check' failed
make: *** [check] Interrupt

But checking out 6a2a70a02018d6362f9841cc2f499cc45405e86b we can see that
it hangs at 'CREATE DATABASE'
=======================================
robins(at)WSLv1:~/proj/postgres/postgres/src/test/regress$ git checkout
6a2a70a02018d6362f9841cc2f499cc45405e86b
Previous HEAD position was 83709a0d5a Use SIGURG rather than SIGUSR1 for
latches.
HEAD is now at 6a2a70a020 Use signalfd(2) for epoll latches.
robins(at)WSLv1:~/proj/postgres/postgres/src/test/regress$ make -j4
NO_LOCALE=1 check
make -C ../../../src/backend generated-headers
rm -rf ./testtablespace
make[1]: Entering directory
'/home/robins/proj/postgres/postgres/src/backend'
make -C catalog distprep generated-header-symlinks
make -C utils distprep generated-header-symlinks
mkdir ./testtablespace
make[2]: Entering directory
'/home/robins/proj/postgres/postgres/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory
'/home/robins/proj/postgres/postgres/src/backend/utils'
make[2]: Entering directory
'/home/robins/proj/postgres/postgres/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory
'/home/robins/proj/postgres/postgres/src/backend/catalog'
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend'
make -C ../../../src/port all
rm -rf '/home/robins/proj/postgres/postgres'/tmp_install
make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/port'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/port'
make -C ../../../src/common all
make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/common'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/common'
make -C ../../../contrib/spi
make[1]: Entering directory
'/home/robins/proj/postgres/postgres/contrib/spi'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/robins/proj/postgres/postgres/contrib/spi'
/bin/mkdir -p '/home/robins/proj/postgres/postgres'/tmp_install/log
make -C '../../..'
DESTDIR='/home/robins/proj/postgres/postgres'/tmp_install install
>'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1
make -j1 checkprep
>>'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1
PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/bin:$PATH"
LD_LIBRARY_PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/lib"
../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir= --no-locale --dlpath=. --max-concurrent-tests=20
--schedule=./parallel_schedule
============== removing existing temp instance ==============
============== creating temporary instance ==============
============== initializing database system ==============
============== starting postmaster ==============
running on port 58080 with PID 26702
============== creating database "regression" ==============
stuck here ^^^
^CCancel request sent
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
command failed: "psql" -X -c "CREATE DATABASE \"regression\"
TEMPLATE=template0 LC_COLLATE='C' LC_CTYPE='C'" "postgres"
pg_ctl: PID file
"/home/robins/proj/postgres/postgres/src/test/regress/./tmp_check/data/postmaster.pid"
does not exist
Is server running?

pg_regress: could not stop postmaster: exit code was 256
GNUmakefile:132: recipe for target 'check' failed
make: *** [check] Interrupt

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-04-07 05:59:19 Re: MultiXact\SLRU buffers configuration
Previous Message osumi.takamichi@fujitsu.com 2021-04-07 05:22:22 RE: Stronger safeguard for archive recovery not to miss data