Re: Regression tests versus the buildfarm environment

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Regression tests versus the buildfarm environment
Date: 2010-08-11 10:55:29
Message-ID: 4C6281A1.5030403@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/11/2010 12:42 AM, Tom Lane wrote:
> There's an interesting buildfarm failure here:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-08-10%2023:46:10
> It appears to me that this was caused by the concurrent run of another
> buildfarm animal on the same physical machine, namely:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=colugos&dt=2010-08-11%2000:02:58
> Both animals are trying to test HEAD, which means that pg_regress
> defaults to the same postmaster port number in both builds:
>
> if (temp_install&& !port_specified_by_user)
>
> /*
> * To reduce chances of interference with parallel installations, use
> * a port number starting in the private range (49152-65535)
> * calculated from the version number.
> */
> port = 0xC000 | (PG_VERSION_NUM& 0x3FFF);
>
> We observe colugos successfully starting on that port:
>
> ============== starting postmaster ==============
> running on port 57332 with pid 47019
> ============== creating database "regression" ==============
> CREATE DATABASE
> ALTER DATABASE
> ... etc etc ...
>
> polecat comes along what must be only moments later, and tries to use
> the same port for its temp install:
>
> ============== starting postmaster ==============
> running on port 57332 with pid 47022
> ============== creating database "regression" ==============
> ERROR: duplicate key value violates unique constraint "pg_database_datname_index"
> DETAIL: Key (datname)=(regression) already exists.
> command failed: "/usr/local/src/build-farm-3.2/builds/HEAD/pgsql.15278/src/test/regress/./tmp_check/install//usr/local/src/build-farm-3.2/builds/HEAD/inst/bin/psql" -X -c "CREATE DATABASE \"regression\" TEMPLATE=template0 ENCODING='SQL_ASCII' LC_COLLATE='C' LC_CTYPE='C'" "postgres"
> pg_ctl: PID file "/usr/local/src/build-farm-3.2/builds/HEAD/pgsql.15278/src/test/regress/./tmp_check/data/postmaster.pid" does not exist
> Is server running?
>
> pg_regress: could not stop postmaster: exit code was 256
>
> Now the postmaster log shows that the second postmaster correctly
> recognized that the port number was already in use, so it bailed out:
>
> ================== pgsql.15278/src/test/regress/log/postmaster.log ===================
> [4c61f2d2.b7ae:1] FATAL: lock file "/tmp/.s.PGSQL.57332.lock" already exists
> [4c61f2d2.b7ae:2] HINT: Is another postmaster (PID 47019) using socket file "/tmp/.s.PGSQL.57332"?
>
> However, pg_regress failed to have a clue about what had happened,
> and bulled ahead trying to run the regression tests (against the
> postmaster started by the other pg_regress instance). A look at the
> code shows that it is merely trying to run psql, and if psql reports
> that it can connect to the specified port, then pg_regress thinks the
> postmaster started OK. Of course, psql was really reporting that it
> could connect to the other instance's postmaster.
>
>
> I've seen similar multiple-postmaster-interference symptoms before in
> the buildfarm, but never really understood the cause.
>
> I am not sure if there's anything very good we can do about the
> problem of pg_regress misidentifying the postmaster it's managed to
> connect to. A real solution would probably be much more trouble than
> it's worth, anyway. However, it does seem like we ought to be able to
> do something about two buildfarm critters defaulting to the same choice
> of port number. The buildfarm infrastructure goes to great lengths to
> pick nonconflicting port numbers for the "installed" postmasters it
> runs; but we're ignoring all that effort and just using a hardwired
> port number for "make check". This is dumb.
>
> pg_regress does have a --port argument that can be used to override
> that default. I don't know whether the buildfarm script calls
> pg_regress directly or does "make check". If the latter, we'd need to
> twiddle the Makefiles to allow a port number to get passed in. But
> this seems well worthwhile to me.
>
> Comments?
>
>

The buildfarm calls "make check".

Why not just add the configured port (DEF_PGPORT) into the calculation
of the port to run on?

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-08-11 11:44:00 Re: MERGE command for inheritance
Previous Message Robert Haas 2010-08-11 10:48:30 Re: Bug / shortcoming in has_*_privilege