PRIVATE columns

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Subject: PRIVATE columns
Date: 2012-12-12 18:12:27
Message-ID: CA+U5nMJtFsNdm7fp=s2w07nSFSRKt9yrXmU=g040fOP8pDpEiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Currently, ANALYZE collects data on all columns and stores these
samples in pg_statistic where they can be seen via the view pg_stats.

In some cases we have data that is private and we do not wish others
to see it, such as patient names. This becomes more important when we
have row security.

Perhaps that data can be protected, but it would be even better if we
simply didn't store value-revealing statistic data at all. Such
private data is seldom the target of searches, or if it is, it is
mostly evenly distributed anyway.

It would be good if we could collect the overall stats
* NULL fraction
* average width
* ndistinct
yet without storing either the MFVs or histogram.
Doing that would avoid inadvertent leaking of potentially private information.

SET STATISTICS 0
simply skips collection of statistics altogether

To implement this, one way would be to allow

ALTER TABLE foo
ALTER COLUMN foo1 SET STATISTICS PRIVATE;

Or we could use another magic value like -2 to request this case.

(Yes, I am aware we could use a custom datatype with a custom
typanalyze for this, but that breaks other things)

Thoughts?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2012-12-12 19:13:04 Re: PRIVATE columns
Previous Message Jan Urbański 2012-12-12 17:36:54 Re: [PATCH] PL/Python: Add spidata to all spiexceptions