Re: Make drop database safer

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexandra Wang <lewang(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make drop database safer
Date: 2019-02-11 23:55:30
Message-ID: CALfoeiukQm+EKw=kkwWbT_zv242qHQzQwnQqNa1z24eMnEaO2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the response and inputs.

On Sat, Feb 9, 2019 at 4:51 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
> > Current sequence of operations for drop database (dropdb())
> > 1. Start Transaction
> > 2. Make catalog changes
> > 3. Drop database buffers
> > 4. Forget database fsync requests
> > 5. Checkpoint
> > 6. Delete database directory
> > 7. Commit Transaction
> >
> > Problem
> > This sequence is unsafe from couple of fronts. Like if drop database,
> > aborts (means system can crash/shutdown can happen) right after buffers
> are
> > dropped step 3 or step 4. The database will still exist and fully
> > accessible but will loose the data from the dirty buffers. This seems
> very
> > bad.
> >
> > Operation can abort after step 5 as well in which can the entries remain
> in
> > catalog but the database is not accessible. Which is bad as well but not
> as
> > severe as above case mentioned, where it exists but some stuff goes
> > magically missing.
> >
> > Repo:
> > ```
> > CREATE DATABASE test;
> > \c test
> > CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int);
> > \c postgres
> > DROP DATABASE test; <<====== kill the session after DropDatabaseBuffers()
> > (make sure to issue checkpoint before killing the session)
> > ```
> >
> > Proposed ways to fix
> > 1. CommitTransactionCommand() right after step 2. This makes it fully
> safe
> > as the catalog will have the database dropped. Files may still exist on
> > disk in some cases which is okay. This also makes it consistent with the
> > approach used in movedb().
>
> To me this seems bad. The current failure mode obviously isn't good, but
> the data obviously isn't valuable, and just loosing track of an entire
> database worth of data seems worse.
>

So, based on that response seems not loosing track to the files associated
with the database is design choice we wish to achieve. Hence catalog having
entry but data directory being deleted is fine behavior to have and doesn't
need to be solved.

> 2. Alternative way to make it safer is perform Checkpoint (step 5) just

> > before dropping database buffers, to avoid the unsafe nature. Caveats of
> > this solution is:
> > - Performs IO for data which in success case anyways will get deleted
> > - Still doesn't cover the case where catalog has the database entry but
> > files are removed from disk
>
> That seems like an unacceptable slowdown.
>

Given dropping database should be infrequent operation and only addition IO
cost is for buffers for that database itself as Checkpoint is anyways
performed in later step, is it really unacceptable slowdown, compared to
safety it brings ?

>
> > 3. One more fancier approach is to use pending delete mechanism used by
> > relation drops, to perform these non-catalog related activities at
> commit.
> > Easily, the pending delete structure can be added boolean to convey
> > database directory dropping instead of file. Given drop database can't be
> > performed inside transaction, not needed to be done this way, but this
> > makes it one consistent approach used to deal with on-disk removal.
>
> ISTM we'd need to do something like this.
>

Given the above design choice to retain link to database files till
actually deleted, not seeing why pending delete approach any better than
approach 1. This approach will allow us to track the database oid in commit
transaction xlog record but any checkpoint post the same still looses the
reference to the database. Which is same case in approach 1 where separate
xlog record XLOG_DBASE_DROP is written just after committing the
transaction.
When we proposed approach 3, we thought its functionally same as approach 1
just differs in implementation. But your preference to this approach and
stating approach 1 as bad, reads as pending deletes approach is
functionally different, we would like to hear more how?

Considering the design choice we must meet, seems approach 2, moving
Checkpoint from step 5 before step 3 would give us the safety desired and
retain the desired link to the database till we actually delete the files
for it.

Thanks,
Ashwin and Alex

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-02-11 23:57:51 Re: pg11.1: dsa_area could not attach to segment
Previous Message Michael Paquier 2019-02-11 23:46:04 Re: Logical replication and restore from pg_basebackup