Re: Yet another failure mode in pg_upgrade

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Yet another failure mode in pg_upgrade
Date: 2012-09-02 05:06:28
Message-ID: CA+TgmoY1XFhJ9WoENiY=-NNvSy4PjDPYmqt4-2NCy8gWUcmyZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 1, 2012 at 11:45 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Mon, Aug 13, 2012 at 12:46:43PM +0200, Magnus Hagander wrote:
>> On Mon, Aug 13, 2012 at 4:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > I've been experimenting with moving the Unix socket directory to
>> > /var/run/postgresql for the Fedora distribution (don't ask :-().
>> > It's mostly working, but I found out yet another way that pg_upgrade
>> > can crash and burn: it doesn't consider the possibility that the
>> > old or new postmaster is compiled with a different default
>> > unix_socket_directory than what is compiled into the libpq it's using
>> > or that pg_dump is using.
>> >
>> > This is another hazard that we could forget about if we had some way for
>> > pg_upgrade to run standalone backends instead of starting a postmaster.
>>
>> Yeah, that would be nice.
>>
>>
>> > But in the meantime, I suggest it'd be a good idea for pg_upgrade to
>> > explicitly set unix_socket_directory (or unix_socket_directories in
>> > HEAD) when starting the postmasters, and also explicitly set PGHOST
>> > to ensure that the client-side code plays along.
>>
>> That sounds like a good idea for other reasons as well - manual
>> connections attempting to get in during an upgrade will just fail with
>> a "no connection" error, which makes sense...
>>
>> So, +1.
>
> OK, I looked this over, and I have a patch, attached.
>
> Because we are already playing with socket directories, this patch creates
> the socket files in the current directory for upgrades and non-live
> checks, but not live checks. This eliminates the "someone accidentally
> connects" problem, at least on Unix, plus we are using port 50432
> already. This also turns off TCP connections for unix domain socket
> systems.
>
> For "live check" operation, you are checking a running server, so
> assuming the socket is in the current directory is not going to work.
> What the code does is to read the 5th line from the running server's
> postmaster.pid file, which has the socket directory in PG >= 9.1. For
> pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
> If the defaults are different between the two servers, the new binaries,
> e.g. pg_dump, will not work. The fix is for the user to set pg_upgrade
> -O to match the old socket directory, and set PGHOST before running
> pg_upgrade. I could not find a good way to generate a proper error
> message because we are blind to the socket directory in pre-9.1.
> Frankly, this is a problem if the old pre-9.1 server is running in a
> user-configured socket directory too, so a documentation addition seems
> right here.
>
> So, in summary, this patch moves the socket directory to the current
> directory all but live check operation, and handles different socket
> directories for old cluster >= 9.1. I have added a documentation
> mention of how to make this work for for pre-9.1 old servers.

I don't think this is reducing the number of failure modes; it's just
changing it from one set of obscure cases to a slightly different set
of obscure cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-09-02 05:13:19 Fwd: PATCH: psql boolean display
Previous Message Tom Lane 2012-09-02 00:47:41 Re: Estimated rows question