pg_upgrade improvements

From: Harold Giménez <harold(dot)gimenez(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_upgrade improvements
Date: 2012-04-05 02:26:58
Message-ID: CABQCq-Q9dotchbQsCdyYLBCEo1Z-ZnD_3-cgCt8ajeb5wGuLUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I've written a pg_upgrade wrapper for upgrading our users (heroku) to
postgres 9.1. In the process I encountered a specific issue that could
easily be improved. We've had this process work consistently for many users
both internal and external, with the exception of just a few for whom the
process fails and required manual handholding.

Before it performs the upgrade, the pg_upgrade program starts the old
cluster, does various checks, and then attempts to stop it. On occasion
stopping the cluster fails - I've posted command output on a gist [1].
Manually running the pg_upgrade shortly afterwards succeeds. We believe
stopping the cluster times out because there are other connections to the
cluster that are established in that small window. There could be incoming
connections for a number of reasons: either the user or the user's
applications are reestablishing connections, or something like collectd on
the localhost attempts to connect during that small window.

Possible workarounds on the current version:

* Add an iptables rule to temporarily reject connections from the outside.
This is not viable because in a multitenant environment a process may write
an iptables rule, and meanwhile another process may permanently save rules,
including the temporary one. We can defend against that, but it does add a
lot of complexity.
* Rewrite pg_hba.conf temporarily while the pg_upgrade script runs to
disallow any other connections.

A possible solution for pg_upgrade is for it to make pg_upgrade use the
--force flag when stopping the cluster to kick connections out. There is no
reason to be polite in this case. Another idea that was kicked around with
my colleagues was to start the cluster in single-user mode, or only allow
unix socket connections somewhere in /tmp. Anything that rejects other
connections would be helpful.

It would also be nice if the invocation of pg_ctl didn't pipe its output to
/dev/null. I'm sure it would contain information that would directly point
at the root cause and could've saved some debugging and hand waving time.

Finally, just a note that while we haven't performed a huge number of
upgrades yet, we have upgraded a few production systems and for the most
part it has worked great.

Regards,

-Harold

[1] https://gist.github.com/112c97378c490d8f70fc

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-05 02:56:15 Re: man pages for contrib programs
Previous Message Robert Haas 2012-04-05 00:53:04 Re: invalid search_path complaints