Bug in point releases 9.3.6 and 9.2.10?

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Bug in point releases 9.3.6 and 9.2.10?
Date: 2015-03-12 23:42:24
Message-ID: CAM3SWZR_fAQVZ+d8oaJV5a62J+te26S4x4Oe0A82z73E_B02JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heroku Postgres runs provisioning code that performs certain actions
on roles when creating a new "fork" of an existing database. This
often causes the new fork to be on the latest point release, where the
database being forked was not.

We want to create a new role when this happens, for various reasons.
This occurs after recovery ends, but before the database has been
"unfenced". The template code that generates various ALTER ROLE
statements in our internal provisioning system - which has apparently
worked just fine for a long time - is:

db.execute("ALTER ROLE #{old_database_user} RENAME TO #{database_user}")
db.execute("ALTER ROLE #{database_user} PASSWORD '#{database_password}' LOGIN")
db.execute("CREATE ROLE \"#{old_database_user}\" PASSWORD
'#{old_database_password}' IN ROLE \"#{database_user}\" LOGIN")

I've seen multiple reports of apparent corruption, appearing as the
resulting ALTER ROLE statements are executed:

PG::DataCorrupted: ERROR: could not read block 0 in file
"global/12811": read only 0 of 8192 bytes
or:
PG::DataCorrupted: ERROR: could not read block 0 in file
"global/12785": read only 0 of 8192 bytes
or:
PG::DataCorrupted: ERROR: could not read block 0 in file
"global/12811": read only 0 of 8192 bytes

The only common factor is that this occurs on the latest point
releases (either 9.3.6 and 9.2.10, at least so far). In all cases I've
seen so far, the relation in question is the pg_auth_members heap
relation. For example:

redacteddb=# select pg_relation_filenode(oid), oid, relname, relkind
from pg_class where pg_relation_filenode(oid) = 12785;
pg_relation_filenode | oid | relname | relkind
----------------------+------+-----------------+---------
12785 | 1261 | pg_auth_members | r
(1 row)

Running "VACUUM FULL pg_auth_members;" has made the problem go away
(to the extent that the above code doesn't trip up and everything is
at least superficially okay) on at least one occasion. I'm currently
investigating how consistently that works as a short term remediation.

Theories?
--
Peter Geoghegan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2015-03-13 00:09:47 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Previous Message Josh Berkus 2015-03-12 22:34:59 Re: Moving Pivotal's Greenplum work upstream