Invalid pg_upgrade error message during live check

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Invalid pg_upgrade error message during live check
Date: 2018-01-05 21:58:39
Message-ID: 20180105215839.GA25995@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pg_upgrade is able to run in --check mode when the old server is still
running. Unfortunately, all supported versions of pg_upgrade generate
an incorrect (but harmless) "failure" message when doing this:

$ pg_upgrade -b /u/pgsql.old/bin -B /u/pgsql/bin \
-d /u/pgsql.old/data/ -D /u/pgsql/data --link --check

--> *failure*
--> Consult the last few lines of "pg_upgrade_server.log" for
--> the probable cause of the failure.
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for invalid "unknown" user columns ok
Checking for hash indexes ok
Checking for roles starting with "pg_" ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok

*Clusters are compatible*

Embarrassingly, I found out about this bug when watching a presentation in
Frankfurt:

PostgreSQL-Upgrade: Best Practices
https://www.ittage.informatik-aktuell.de/programm/2017/postgresql-upgrade-best-practices/

and there is a blog entry about it too:

https://blog.dbi-services.com/does-pg_upgrade-in-check-mode-raises-a-failure-when-the-old-cluster-is-running/

(Yeah, it stinks to be me. :-) )

So, I looked into the code and it turns out that start_postmaster()
needs to run exec_prog() in a way that allows it to fail and continue
_without_ generating an error message. There are other places that need
to run exec_prog() and allow it to fail and continue _and_ generate an
error message.

To fix this, I modified the exec_prog() API to separately control
reporting and exiting-on errors. The attached patch does this and
generates the proper output:

$ pg_upgrade -b /u/pgsql.old/bin -B /u/pgsql/bin \
-d /u/pgsql.old/data/ -D /u/pgsql/data --link --check

Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for invalid "unknown" user columns ok
Checking for hash indexes ok
Checking for roles starting with "pg_" ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok

*Clusters are compatible*

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

Attachment Content-Type Size
pg_upgrade.diff text/x-diff 17.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-01-05 22:12:42 Re: unique indexes on partitioned tables
Previous Message Peter Eisentraut 2018-01-05 21:57:33 Re: [HACKERS] Proposal: Local indexes for partitioned table