Re: [GENERAL] Restart after power outage: createdb

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: lapham(at)jandr(dot)org
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Restart after power outage: createdb
Date: 2006-09-27 20:13:34
Message-ID: 6199.1159388014@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Jon Lapham <lapham(at)jandr(dot)org> writes in pgsql-general:
> If I run...
> sleep 3; echo starting; createdb bar
> ...and power off the VM while the "createdb bar" is running.

> Upon restart, about 50% of the time I can reproduce the following error
> message:

> [lapham(at)localhost ~]$ psql bar
> psql: FATAL: database "bar" does not exist
> [lapham(at)localhost ~]$ createdb bar
> createdb: database creation failed: ERROR: could not create directory
> "base/65536": File exists

What apparently is happening here is that the same OID has been assigned
to the new database both times. Even though the createdb didn't
complete, the directory it started to build is there and so there's a
filename collision.

> So, running "createdb bar" a second time works.

Yeah, because the OID counter has been advanced, and so the second
createdb uses a nonconflicting OID.

In theory this scenario should not happen, because a crash-and-restart
is supposed to guarantee that the OID counter comes up at or beyond
where it was before the crash.

After thinking about it for awhile, I believe the problem is that
CREATE DATABASE is breaking the "WAL rule": it's allowing a data change
(specifically, creation of the new DB subdirectory) to hit disk without
having guaranteed that associated WAL entries were flushed first.
Specifically, if we generated an XLOG_NEXTOID WAL entry to record the
consumption of an OID for the database, there isn't anything ensuring
that record gets to disk before the mkdir occurs. (ie, the comment in
XLogPutNextOid is correct as far as it goes, but it fails to account
for outside-the-database effects such as creation of a directory named
after the OID.) Hence after restart the OID counter might not get
advanced as far as it should have been.

We could fix this two different ways:

1. Put an XLogFlush into createdb() somewhere between making the
pg_database entry and starting to create subdirectories.

2. Check for conflicting database directories while assigning the OID,
comparable to what GetNewRelFileNode() does for table files.

#2 has some appeal because it could deal with random junk in
$PGDATA/base regardless of how the junk got there. However, to do that
in a really bulletproof way we'd have to check all the tablespace
directories too, and that's starting to get a tad tedious for something
that shouldn't happen anyway.

So I'm leaning to #1 as a suitably low-effort fix. Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gene Wirchenko 2006-09-27 20:25:27 Re: cyclical redundancy checksum algorithm(s)?
Previous Message Casey Duncan 2006-09-27 20:09:02 Re: postgress 8.1.4 deadlocking??

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-09-27 20:25:57 Re: [Pgbuildfarm-members] Buildfarm alarms
Previous Message Jim C. Nasby 2006-09-27 20:10:20 Re: [Pgbuildfarm-members] Buildfarm alarms