Re: Anonymized database dumps

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Kiriakos Georgiou <kg(dot)postgresql(at)olympiakos(dot)com>
Cc: pgsql-general Forums <pgsql-general(at)postgresql(dot)org>
Subject: Re: Anonymized database dumps
Date: 2012-03-19 21:55:27
Message-ID: 20120319175527.5eb99af57c91b932f561f97a@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to Kiriakos Georgiou <kg(dot)postgresql(at)olympiakos(dot)com>:

> The data anonymizer process is flawed because you are one misstep away from data spillage.

In our case, it's only one layer.

Other layers that exist:
* The systems where this test data is instantiated can't send email
* The systems where this exist have limited access (i.e., not all
developers can access it, and it's not used for typical testing --
only for specific testing that requires production-like data)

You are correct, however, in that there's always the danger of
spillage if new sensitive data is added and the sanitation script
is not properly updated. It's part of the ongoing overhead of
maintaining such a system.

> Sensitive data should be stored encrypted to begin. For test databases you or your developers can invoke a process that replaces the real encrypted data with fake encrypted data (for which everybody has the key/password.) Or if the overhead is too much (ie billions of rows), you can have different decrypt() routines on your test databases that return fake data without touching the real encrypted columns.

The thing is, this process has the same potential data spillage
issues as sanitizing the data. I find it intriguing, however, and
I'm going to see if there are places where this approach might
have advantages over our current one.

Since much of our sensitive data is already de-identified, it
provides an additional level of protection on that level as well.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Davis 2012-03-19 22:07:02 Re: pg_upgrade + streaming replication ?
Previous Message Kiriakos Georgiou 2012-03-19 21:35:44 Re: Anonymized database dumps