Re: Anonymized database dumps

From: Kiriakos Georgiou <kg(dot)postgresql(at)olympiakos(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: pgsql-general Forums <pgsql-general(at)postgresql(dot)org>
Subject: Re: Anonymized database dumps
Date: 2012-03-20 00:48:44
Message-ID: B573707D-4223-4CFB-9172-935E1322475D@olympiakos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mar 19, 2012, at 5:55 PM, Bill Moran wrote:

>
>> Sensitive data should be stored encrypted to begin. For test databases you or your developers can invoke a process that replaces the real encrypted data with fake encrypted data (for which everybody has the key/password.) Or if the overhead is too much (ie billions of rows), you can have different decrypt() routines on your test databases that return fake data without touching the real encrypted columns.
>
> The thing is, this process has the same potential data spillage
> issues as sanitizing the data.

Not really, in the modality I describe the sensitive data is always encrypted in the database and "useless" because nobody will have the private key or know the password that protects it other than the ops subsystems that require access.
So even if you take an ops dump, load it to a test box, and walk away, you are good. If your developers/testers want to play with the data they will be forced to over-write and "stage" test encrypted data they can decrypt, or call a "fake" decrypt() that gives them test data (eg: joins to a test data table.)

Kiriakos

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-03-20 03:27:08 Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)
Previous Message Merlin Moncure 2012-03-19 22:31:40 Re: nice'ing the postgres COPY backend process to make pg_dumps run more "softly"