Quick Links

Re: [bug fix] PITR corrupts the database cluster

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>,MauMau <maumau307(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [bug fix] PITR corrupts the database cluster
Date:	2013-07-24 12:45:52
Message-ID:	e76864b6-c40f-4593-9eea-c9b9976a8000@email.android.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>On 2013-07-24 12:59:43 +0200, Andres Freund wrote:
>> > <Approach 2>
>> > Like the DROP TABLE/INDEX case, piggyback the directory deletion
>record on
>> > the transaction commit record, and eliminate the directory deletion
>record
>> > altogether.
>>
>> I don't think burdening commit records with that makes sense. It's
>just
>> not a common enough case.
>>
>> What we imo could do would be to drop the tablespaces in a *separate*
>> transaction *after* the transaction that removed the pg_tablespace
>> entry. Then an "incomplete actions" logic similar to btree and gin
>could
>> be used to remove the database directory if we crashed between the
>two
>> transactions.
>>
>> SO:
>> TXN1 does:
>> * remove catalog entries
>> * drop buffers
>> * XLogInsert(XLOG_DBASE_DROP_BEGIN)
>>
>> TXN2:
>> * remove_dbtablespaces
>> * XLogInsert(XLOG_DBASE_DROP_FINISH)
>>
>> The RM_DBASE_ID resource manager would then grow a rm_cleanup
>callback
>> (which would perform TXN2 if we failed inbetween) and a
>> rm_safe_restartpoint which would prevent restartpoints from occuring
>on
>> standby between both.
>>
>> The same should probably done for CREATE DATABASE because that
>currently
>> can result in partially copied databases lying around.
>
>And CREATE/DROP TABLESPACE.
>
>Not really related, but CREATE DATABASE's implementation makes me itch
>everytime I read parts of it...

I've been hoping that we could get rid of the rm_cleanup mechanism entirely. I eliminated it for gist a while back, and I've been thinking of doing the same for gin and btree. The way it works currently is buggy - while we have rm_safe_restartpoint to avoid creating a restartpoint at a bad moment, there is nothing to stop you from running a checkpoint while incomplete actions are pending. It's possible that there are page locks or something that prevent it in practice, but it feels shaky.

So I'd prefer a solution that doesn't rely on rm_cleanup. Piggybacking on commit record seems ok to me, though if we're going to have a lot of different things to attach there, maybe we need to generalize it somehow. Like, allow resource managers to attach arbitrary payload to the commit record, and provide a new rm_redo_commit function to replay them.

- Heikki

In response to

Re: [bug fix] PITR corrupts the database cluster at 2013-07-24 11:21:20 from Andres Freund

Responses

Re: [bug fix] PITR corrupts the database cluster at 2013-07-24 13:05:30 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2013-07-24 13:05:30	Re: [bug fix] PITR corrupts the database cluster
Previous Message	MauMau	2013-07-24 12:06:30	DATE type output does not follow datestyle parameter