Re: Problem with dblink regression test

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, mail(at)joeconway(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem with dblink regression test
Date: 2005-06-22 16:45:47
Message-ID: 20050622164547.GZ84822@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Wed, Jun 22, 2005 at 11:45:09AM -0400, Tom Lane wrote:
> "Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> > Tom Lane said:
> >> There are several buildfarm machines failing like this. I think a
> >> possible solution is for the postmaster to do putenv("PGPORT=nnn") so
> >> that libpq instances running in postmaster children will default to the
> >> local installation's actual port rather than some compiled-in default
> >> port.
>
> > If this diagnosis were correct, wouldn't every buildfarm member be failing
> > at the ContribCheck stage (if they get that far)? They all run on non
> > standard ports and all run the contrib installcheck suite if they can (this
> > is required, not optional). So if they show OK then they do not exhibit the
> > problem.
>
> Now that I'm a little more awake ...
>
> I think the difference between the working and not-working machines
> probably has to do with dynamic-linker configuration. You have the
> buildfarm builds using "configure --prefix=something
> --with-pgport=something". So, the copy of libpq.so installed into
> the prefix tree has the "right" default port. But on a machine with
> a regular installation of Postgres, there is also going to be a copy
> of libpq.so in /usr/lib or some such place ... and that copy thinks
> the default port is where the regular postmaster lives (eg 5432).
> When dblink.so is loaded into the backend, if the dynamic linker chooses
> to resolve its requirement for libpq.so by loading /usr/lib/libpq.so,
> then the wrong things happen.
>
> In the "make check" case this is masked because pg_regress.sh has set
> PGPORT in the postmaster's environment, and that will override the
> compiled-in default. But of course the contrib tests only work in
> "installcheck" mode.
>
> To believe this, you have to assume that "psql" links to the correct
> version (the test version) of libpq.so but dblink.so fails to do so.
> So it's only an issue on platforms where "rpath" works for executables
> but not for shared libraries. I haven't run down exactly which
> buildfarm machines have shown this symptom --- do you know offhand?
>
> (Thinks some more...) Another possibility is that on the failing
> machines, there is a system-wide PGPORT environment variable; however,
> unless you specify "-p" on the postmaster command line when you start
> the "installed" postmaster, I'd expect that to change where the
> postmaster puts its socket, so that's probably not the right answer.
>
> If this is the correct explanation, then fooling with PGPORT would
> mask this particular symptom, but it wouldn't fix the fundamental
> problem that we're loading the wrong version of libpq.so. Eventually
> that would come back to bite us (whenever dblink.so requires some
> feature that doesn't exist in older libpq.so versions).

Here's the info I have for my two machines (platypus and cuckoo), both
of which are exhibiting this behavior.

I manually ran the dblink regression on platypus to see what was going
on. If I added port=5682 to the connection string, it would properly
connect to the test database. Without that it complained that the
contrib_regression database didn't exist. After adding
contrib_regression to the default postgresql cluster on that machine it
then errored out saying that there was no buildfarm user, which is true
on the default install on that machine. $PGPORT isn't set globally or in
the buildfarm user account.

ISTM there's a couple ways a buildfarm machine could pass besides what
Tom's mentioned. If the machine doesn't have a default install at all
it's possible that dblink will act differently. It's also possible that
the default install has both the contrib_regression database and the
user that's running the buildfarm.

Is there a way to confirm which libpq.so psql and/or dblink.so has
linked to? Are there any other tests I could run to shed some light on
this?
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2005-06-22 16:47:05 Re: Problem with dblink regression test
Previous Message Steve Atkins 2005-06-22 16:23:17 Re: pl/pgsql: END verbosity

Browse pgsql-patches by date

  From Date Subject
Next Message Jim C. Nasby 2005-06-22 16:47:05 Re: Problem with dblink regression test
Previous Message Steve Atkins 2005-06-22 16:23:17 Re: pl/pgsql: END verbosity