Quick Links

Re: Multivariate MCV stats can leak data to unprivileged users

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Multivariate MCV stats can leak data to unprivileged users
Date:	2019-05-20 13:32:24
Message-ID:	23581.1558359144@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Oh, right. It still has the disadvantage that it obfuscates the actual
>> data stored in the pg_stats_ext_data (or whatever would it be called),
>> so e.g. functions would have to do additional checks to make sure it
>> actually is the right statistic type. For example pg_mcv_list_items()
>> could not rely on receiving pg_mcv_list values, as per the signature,
>> but would have to check the value.

> Yes. In fact, since the user-accessible view would want to expose
> datatypes specific to the stats kinds rather than bytea or cstring
> values, we would need SQL-callable conversion functions for each kind:

It seems like people are willfully misunderstanding my suggestion.
You'd only need *one* conversion function, which would look at the
embedded ID field and then emit the appropriate text representation.
I don't see a reason why we'd have the separate pg_ndistinct etc. types
any more at all.

> Also this model presupposes that all future stats kinds are most
> conveniently represented in a single column, but maybe that won't be
> the case. It's conceivable that a future stats kind would benefit from
> splitting its data across multiple columns.

Hm, that's possible I suppose, but it seems a little far-fetched.
You could equally well argue that pg_ndistinct etc. should have been
broken down into smaller types, but we didn't.

> Yes, I think it is an EAV model. I think EAV models do have their
> place, but I think that's largely where adding new columns is a common
> operation and involves adding little to no extra code. I don't think
> either of those is true for extended stats. What we've seen over the
> last couple of years is that adding each new stats kind is a large
> undertaking, involving lots of new code. That alone is going to limit
> just how many ever get added, and compared to that effort, adding new
> columns to the catalog is small fry.

I can't argue with that --- the make-work is just a small part of the
total. But it's still make-work.

Anyway, it was just a suggestion, and if people don't like it that's
fine. But I don't want it to be rejected on the basis of false
arguments.

regards, tom lane

In response to

Re: Multivariate MCV stats can leak data to unprivileged users at 2019-05-20 07:33:49 from Dean Rasheed

Responses

Re: Multivariate MCV stats can leak data to unprivileged users at 2019-05-20 14:45:17 from Tomas Vondra
Re: Multivariate MCV stats can leak data to unprivileged users at 2019-05-20 15:09:24 from Dean Rasheed

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2019-05-20 13:37:47	Re: Organisational structure
Previous Message	Robert Haas	2019-05-20 13:23:46	Re: Statistical aggregate functions are not working with PARTIAL aggregation