Re: PRIVATE columns

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Subject: Re: PRIVATE columns
Date: 2012-12-13 09:32:30
Message-ID: CA+U5nM+oTDdT4c_KSGLRJdMsgfvH6B0-N-0G98eVxR9cGms6fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 December 2012 20:57, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> SET STATISTICS 0 seems like a sufficient solution for people who don't
> trust the have_column_privilege() protection in the pg_stats view.

The point here is that a user may *have* privilege on the column and
have rights to see some, but not all, rows of the table.

But we cannot apply row level security to individual column values, so
neither the row nor column security applies here and it appears there
is a greater level of risk at this point.

> In practice I think this is a waste of time, though. Anyone who can
> bypass the view restriction can probably just read the original table.

Where the row security would apply.

> (I suppose we could consider marking pg_stats as a security_barrier
> view to make this even safer. Not sure it's worth the trouble though;
> the interesting columns are anyarray so it's hard to do much with them
> mechanically.)

I'm trying to respond in useful ways to your statements that row
security might not be very secure.

Please advise.

>> It would be good if we could collect the overall stats
>> * NULL fraction
>> * average width
>> * ndistinct
>> yet without storing either the MFVs or histogram.
>
> Do you have any evidence whatsoever that that's worth the trouble?
> I'd bet against it.

All I can say is that uniformly distributed data that is accessed only
by equality has no need of MFVs or histograms. Much personal data is
so evenly distributed as to make it not worth storing and in some
cases, it isn't. We don't search for credit cards with a BETWEEN, so
estimating end of ranges isn't needed.

Yet knowing number of distinct values is important to ensure that we
use an index scan. Without stats we tend to do a bitmapindexscan,
which seems to be significantly more expensive in practice.

> And if we're being paranoid, who's to say that
> those numbers couldn't reveal useful data in themselves?

I'm talking about privacy. Knowing there are 226,768 credit cards in a
table, 0% of them are NULL and they are on average 16 digits wide
tells me nothing about individual credit card numbers. Same with
patient names.

In edge cases we might infer something more when mixed with some
external knowledge, but that's a matter for the military.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2012-12-13 09:35:47 Re: PRIVATE columns
Previous Message Heikki Linnakangas 2012-12-13 09:31:39 Re: [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader