Safer auto-initdb for RPM init script

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsqlrpms-hackers(at)pgfoundry(dot)org
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Safer auto-initdb for RPM init script
Date: 2006-08-25 13:19:52
Message-ID: 22918.1156511992@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We've seen more than one report of corruption of PG databases that
seemed to be due to the willingness of the RPM init script to run
initdb if it thinks the data directory isn't there. This is pretty
darn risky on an NFS volume, for instance, which might be offline
at the instant the script looks. The failure case is

- script doesn't see data directory
- script runs initdb and starts postmaster
- offline volume comes online
- KABOOM

The initdb creates a "local" database that's physically on the root
volume underneath the mountpoint directory for the intended volume.
After the mountable volume comes online, these files are shadowed
by the original database files. The problem is that by this point the
postmaster has a copy of pg_control in memory from the freshly-initdb'd
database, and that pg_control has a WAL end address and XID counter far
less than is correct for the real database. Havoc ensues, very probably
resulting in a hopelessly corrupt database.

I don't really want to remove the auto-initdb feature from the script,
because it's important not to drive away newbies by making Postgres
hard to start for the first time. But I think we'd better think about
ways to make it more bulletproof.

The first thought that comes to mind is to have the RPM install create
the data directory if not present and create a flag file in it showing
that it's safe to initdb. Then the script is allowed to initdb only
if it finds the directory and the flag file but not PG_VERSION.
Something like (untested off-the-cuff coding)

%post server

if [ ! -d $PGDATA ]; then
mkdir $PGDATA
touch $PGDATA/NO_DATABASE_YET
fi

and in initscript

if [ -d $PGDATA -a -f $PGDATA/NO_DATABASE_YET -a ! -f $PGDATA/PG_VERSION ] ; then
rm -f $PGDATA/NO_DATABASE_YET && initdb ...
fi

If the data directory is not mounted then the -d test would fail,
unless the directory is itself the mount point, in which case it
would be there but not contain the NO_DATABASE_YET file.

I can still imagine ways for this to fail, eg if you run an RPM
install or upgrade while your mountable data directory is offline.
But it ought to be an order of magnitude safer than things are now.
(Hm, maybe the %post script should only run during an RPM install,
not an upgrade.)

Comments? Anyone see a better way?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-08-25 13:29:59 Re: Tricky bugs in concurrent index build
Previous Message Gregory Stark 2006-08-25 13:10:11 Re: Tricky bugs in concurrent index build