RE: Complete data erasure

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Complete data erasure
Date: 2020-02-04 00:53:44
Message-ID: TY2PR01MB50849087BA692135DE58FE87FE030@TY2PR01MB5084.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
> That's not really what I meant - let me explain. When I said DROP TABLE
> should do everything as usual, that includes catalog changes. I.e. after
> the commit there would not be any remaining entries in system catalogs
> or anything like that.
>
> The only thing we'd do differently is that instead of unlinking the
> relfilenode segments, we'd move the relfilenode to a persistent queue
> (essentially a regular table used as a queue relfilenodes). The
> background worker would watch the queue, and when it gets a new
> relfilenode it'd "delete" the data and then remove the relfilenode from
> the queue.
>
> So essentially others would not be able to even see the (now dropped)
> object, they could create new object with the same name etc.

That sounds good. I think we can also follow the way the WAL archiver does its job, instead of using a regular table. That is, when the transaction that performed DROP TABLE commits, it puts the data files in the "trash bin," which is actually a filesystem directory. Or, it just renames the data files in the original directory by appending some suffix such as ".del". Then, the background worker scans the trash bin or the data directory to erase the file content and delete the file.

The trash bin mechanism may open up the application for restoring mistakenly dropped tables, a feature like Oracle's Flash Drop. The dropping transaction puts the table metadata (system catalog data or DDL) in the trash bin as well as the data file.

> I imagine we might provide a way to wait for the deletion to actually
> complete (can't do that as part of the DROP TABLE, though), so that
> people can be sure when the data is actually gone (for scripts etc.).
> A simple function waiting for the queue to get empty might be enough, I
> guess, but maybe not.

Agreed, because the user should expect the disk space to be available after DROP TABLE has been committed. Can't we really make the COMMIT to wait for the erasure to complete? Do we have to use an asynchronous erasure method with a background worker? For example, COMMIT performs:

1. Writes a commit WAL record, finalizing the system catalog change.
2. Puts the data files in the trash bin or renames them.
3. Erase the file content and delete the file. This could take a long time.
4. COMMIT replies success to the client.

What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes during step 3. If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays the startup.)

> I think this depends on what our requirements are.
>
> My assumption is that when you perform this "secure data erasure" on the
> primary, you probably also want to erase the data on the replica. But if
> the instances use different file systems (COW vs. non-COW, ...) the
> exact thing that needs to happen may be different. Or maybe the replica
> does not need to do anything, making it noop?

We can guide the use of non-COW file systems on both the primary and standby in the manual.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-02-04 01:31:27 Re: Experimenting with hash join prefetch
Previous Message Andres Freund 2020-02-04 00:26:46 Re: Is custom MemoryContext prohibited?