Re: Complete data erasure

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Complete data erasure
Date: 2020-02-05 10:19:14
Message-ID: CA+fd4k5Ymr3XkuHz6vzyqtbbBQA9SFGFKiJFYT5+BB0i92uWMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 4 Feb 2020 at 09:53, tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
> From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
> > That's not really what I meant - let me explain. When I said DROP TABLE
> > should do everything as usual, that includes catalog changes. I.e. after
> > the commit there would not be any remaining entries in system catalogs
> > or anything like that.
> >
> > The only thing we'd do differently is that instead of unlinking the
> > relfilenode segments, we'd move the relfilenode to a persistent queue
> > (essentially a regular table used as a queue relfilenodes). The
> > background worker would watch the queue, and when it gets a new
> > relfilenode it'd "delete" the data and then remove the relfilenode from
> > the queue.
> >
> > So essentially others would not be able to even see the (now dropped)
> > object, they could create new object with the same name etc.
>
> That sounds good. I think we can also follow the way the WAL archiver does its job, instead of using a regular table. That is, when the transaction that performed DROP TABLE commits, it puts the data files in the "trash bin," which is actually a filesystem directory. Or, it just renames the data files in the original directory by appending some suffix such as ".del". Then, the background worker scans the trash bin or the data directory to erase the file content and delete the file.
>
> The trash bin mechanism may open up the application for restoring mistakenly dropped tables, a feature like Oracle's Flash Drop. The dropping transaction puts the table metadata (system catalog data or DDL) in the trash bin as well as the data file.
>
>
> > I imagine we might provide a way to wait for the deletion to actually
> > complete (can't do that as part of the DROP TABLE, though), so that
> > people can be sure when the data is actually gone (for scripts etc.).
> > A simple function waiting for the queue to get empty might be enough, I
> > guess, but maybe not.
>
> Agreed, because the user should expect the disk space to be available after DROP TABLE has been committed. Can't we really make the COMMIT to wait for the erasure to complete? Do we have to use an asynchronous erasure method with a background worker? For example, COMMIT performs:
>
> 1. Writes a commit WAL record, finalizing the system catalog change.
> 2. Puts the data files in the trash bin or renames them.
> 3. Erase the file content and delete the file. This could take a long time.
> 4. COMMIT replies success to the client.
>
> What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes during step 3. If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays the startup.)

Please note that we need to erase files not only when dropping or
truncating tables but also when aborting the transaction that created
a new table. If user wants to sure the data is actually erased they
needs to wait for rollback as well that could be ROLLBACK command by
user or an error during transaction etc.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-02-05 10:35:28 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Amit Langote 2020-02-05 09:53:19 Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side