Re: [bug fix] PITR corrupts the database cluster

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [bug fix] PITR corrupts the database cluster
Date: 2013-07-24 13:54:48
Message-ID: 14015.1374674088@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> That's no different from CREATE TABLE / INDEX and DROP TABLE / INDEX. E.g. If you crash after CREATE TABLE but before COMMIT, the file is leaked. But it's just a waste of space, everything still works.

Well, it is different, because if you crash partway through dropping a
tablespace or database, you have inconsistent state.

> It would be nice to fix that leak, for tables and indexes too...

I'm inclined to think that this wouldn't be a good use of resources,
at least not at the individual table/index level. We'd surely be adding
some significant amount of overhead to normal operating paths, in order
to cover a case that really shouldn't happen in practice.

The only thing here that really bothers me is that a crash during DROP
DATABASE/TABLESPACE could leave us with a partially populated db/ts
that's still accessible through the system catalogs. We could however
do something to ensure that the db/ts is atomically removed from use
before we start dropping individual files. Then, if you get a crash,
there'd still be system catalog entries but they'd be pointing at
nothing, so the behavior would be clean and understandable whereas
right now it's not.

In the case of DROP TABLESPACE this seems relatively easy: drop or
rename the symlink before we start flushing individual files.
I'm not quite sure how to do it for DROP DATABASE though --- I thought
of renaming the database directory, say from "12345" to "12345.dead",
but if there are tablespaces in use then we might have a database
subdirectory in each one, so we couldn't rename them all atomically.
I guess one thing we could do is create a flag file, say
"dead.dont.use", in the database's default-tablespace directory, and
make new backends check for that before being willing to start up in
that database; then make sure that removal of that file is the last
step in DROP DATABASE.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-07-24 14:01:15 Re: [bug fix] PITR corrupts the database cluster
Previous Message Tim Kane 2013-07-24 13:22:44 Re: Suggestion for concurrent index creation using a single full scan operation