Re: Complete data erasure

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Complete data erasure
Date: 2020-01-28 23:24:56
Message-ID: 20200128232456.yi5yzbpgtaz6sskf@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote:
>Greetings,
>
>* asaba(dot)takanori(at)fujitsu(dot)com (asaba(dot)takanori(at)fujitsu(dot)com) wrote:
>> From: Stephen Frost <sfrost(at)snowman(dot)net>
>> > * asaba(dot)takanori(at)fujitsu(dot)com (asaba(dot)takanori(at)fujitsu(dot)com) wrote:
>> > > This feature erases data area just before it is returned to the OS (“erase”
>> > means that overwrite data area to hide its contents here)
>> > > because there is a risk that the data will be restored by attackers if it is returned
>> > to the OS without being overwritten.
>> > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
>> >
>> > Looking at this fresh, I wanted to point out that I think Tom's right-
>> > we aren't going to be able to reasonbly support this kind of data
>> > erasure on a simple DROP TABLE or TRUNCATE.
>> >
>> > > I want users to be able to customize the erasure method for their security
>> > policies.
>> >
>> > There's also this- but I think what it means is that we'd probably have
>> > a top-level command that basically is "ERASE TABLE blah;" or similar
>> > which doesn't operate during transaction commit but instead marks the
>> > table as "to be erased" and then perhaps "erasure in progress" and then
>> > "fully erased" (or maybe just back to 'normal' at that point). Making
>> > those updates will require the command to perform its own transaction
>> > management which is why it can't be in a transaction itself but also
>> > means that the data erasure process doesn't need to be done during
>> > commit.
>> >
>> > > My idea is adding a new parameter erase_command to postgresql.conf.
>> >
>> > Yeah, I don't think that's really a sensible option or even approach.
>>
>> I think erase_command can also manage the state of a table.
>> The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress")
>> erase_command is executed not during a transaction but when unlink() is executed.
>
>I really don't see what the advantage of having this be configurable is.
>In addition, an external command's actions wouldn't be put through the
>WAL meaning that replicas would have to be dealt with in some other way
>beyind regular WAL and that seems like it'd just be ugly.
>
>> (for example, after a transaction that has done DROP TABLE)
>
>We certainly can't run external commands during transaction COMMIT, so
>this can't be part of a regular DROP TABLE.
>

IMO the best solution would be that the DROP TABLE does everything as
usual, but instead of deleting the relfilenode it moves it to some sort
of queue. And then a background worker would "erase" these relfilenodes
outside the COMMIT.

And yes, we need to do this in a way that works with replicas, i.e. we
need to WAL-log it somehow. And it should to be done in a way that works
when the replica is on a different type of filesystem.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-01-28 23:36:03 Hash join not finding which collation to use for string hashing
Previous Message Robert Willis 2020-01-28 23:02:27 psqlODBC development