Re: Make drop database safer

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexandra Wang <lewang(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make drop database safer
Date: 2019-03-06 08:58:23
Message-ID: CALfoeis3sjxqbGJbyF86uhH8cyrFTKeXYkZie_SZZV9Fy58eYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 11, 2019 at 3:55 PM Ashwin Agrawal <aagrawal(at)pivotal(dot)io> wrote:

>
> Thanks for the response and inputs.
>
> On Sat, Feb 9, 2019 at 4:51 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Hi,
>>
>> On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
>> > Current sequence of operations for drop database (dropdb())
>> > 1. Start Transaction
>> > 2. Make catalog changes
>> > 3. Drop database buffers
>> > 4. Forget database fsync requests
>> > 5. Checkpoint
>> > 6. Delete database directory
>> > 7. Commit Transaction
>> >
>> > Problem
>> > This sequence is unsafe from couple of fronts. Like if drop database,
>> > aborts (means system can crash/shutdown can happen) right after buffers
>> are
>> > dropped step 3 or step 4. The database will still exist and fully
>> > accessible but will loose the data from the dirty buffers. This seems
>> very
>> > bad.
>> >
>> > Operation can abort after step 5 as well in which can the entries
>> remain in
>> > catalog but the database is not accessible. Which is bad as well but
>> not as
>> > severe as above case mentioned, where it exists but some stuff goes
>> > magically missing.
>> >
>> > Repo:
>> > ```
>> > CREATE DATABASE test;
>> > \c test
>> > CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int);
>> > \c postgres
>> > DROP DATABASE test; <<====== kill the session after
>> DropDatabaseBuffers()
>> > (make sure to issue checkpoint before killing the session)
>> > ```
>> >
>> > Proposed ways to fix
>> > 1. CommitTransactionCommand() right after step 2. This makes it fully
>> safe
>> > as the catalog will have the database dropped. Files may still exist on
>> > disk in some cases which is okay. This also makes it consistent with the
>> > approach used in movedb().
>>
>> To me this seems bad. The current failure mode obviously isn't good, but
>> the data obviously isn't valuable, and just loosing track of an entire
>> database worth of data seems worse.
>>
>
> So, based on that response seems not loosing track to the files associated
> with the database is design choice we wish to achieve. Hence catalog having
> entry but data directory being deleted is fine behavior to have and doesn't
> need to be solved.
>
> > 2. Alternative way to make it safer is perform Checkpoint (step 5) just
>
>> > before dropping database buffers, to avoid the unsafe nature. Caveats of
>> > this solution is:
>> > - Performs IO for data which in success case anyways will get deleted
>> > - Still doesn't cover the case where catalog has the database entry but
>> > files are removed from disk
>>
>> That seems like an unacceptable slowdown.
>>
>
> Given dropping database should be infrequent operation and only addition
> IO cost is for buffers for that database itself as Checkpoint is anyways
> performed in later step, is it really unacceptable slowdown, compared to
> safety it brings ?
>
>
>>
>> > 3. One more fancier approach is to use pending delete mechanism used by
>> > relation drops, to perform these non-catalog related activities at
>> commit.
>> > Easily, the pending delete structure can be added boolean to convey
>> > database directory dropping instead of file. Given drop database can't
>> be
>> > performed inside transaction, not needed to be done this way, but this
>> > makes it one consistent approach used to deal with on-disk removal.
>>
>> ISTM we'd need to do something like this.
>>
>
> Given the above design choice to retain link to database files till
> actually deleted, not seeing why pending delete approach any better than
> approach 1. This approach will allow us to track the database oid in commit
> transaction xlog record but any checkpoint post the same still looses the
> reference to the database. Which is same case in approach 1 where separate
> xlog record XLOG_DBASE_DROP is written just after committing the
> transaction.
> When we proposed approach 3, we thought its functionally same as approach
> 1 just differs in implementation. But your preference to this approach and
> stating approach 1 as bad, reads as pending deletes approach is
> functionally different, we would like to hear more how?
>
> Considering the design choice we must meet, seems approach 2, moving
> Checkpoint from step 5 before step 3 would give us the safety desired and
> retain the desired link to the database till we actually delete the files
> for it.
>

Awaiting response on this thread, highly appreciate the inputs.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-06 09:09:45 Re: pg_basebackup against older server versions
Previous Message Devrim Gündüz 2019-03-06 08:55:12 pg_basebackup against older server versions