Re: Multivariate MCV stats can leak data to unprivileged users

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Date: 2019-05-20 07:33:49
Message-ID: CAEZATCWdjDpuDkA8xnGzDAAmDekVONj2ZegqdAkjiQE4honh2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Oh, right. It still has the disadvantage that it obfuscates the actual
> data stored in the pg_stats_ext_data (or whatever would it be called),
> so e.g. functions would have to do additional checks to make sure it
> actually is the right statistic type. For example pg_mcv_list_items()
> could not rely on receiving pg_mcv_list values, as per the signature,
> but would have to check the value.
>

Yes. In fact, since the user-accessible view would want to expose
datatypes specific to the stats kinds rather than bytea or cstring
values, we would need SQL-callable conversion functions for each kind:

* to_pg_ndistinct(pg_extended_stats_ext_data) returns pg_ndistinct
* to_pg_dependencies(pg_extended_stats_ext_data) returns pg_dependencies
* to_pg_mcv(pg_extended_stats_ext_data) returns pg_mcv
* ...

and each of these would throw an error if it weren't given an extended
stats object of the right kind. Then to extract MCV data, you'd have
to do pg_mcv_list_items(to_pg_mcv(ext_data)), and presumably there'd
be something similar for histograms.

IMO, that's not a nice model, compared to just having columns of the
right types in the first place.

Also this model presupposes that all future stats kinds are most
conveniently represented in a single column, but maybe that won't be
the case. It's conceivable that a future stats kind would benefit from
splitting its data across multiple columns.

> Of course, I don't expect to have too many such functions, but overall
> this approach with a single type feels a bit too like EAV to my taste.
>

Yes, I think it is an EAV model. I think EAV models do have their
place, but I think that's largely where adding new columns is a common
operation and involves adding little to no extra code. I don't think
either of those is true for extended stats. What we've seen over the
last couple of years is that adding each new stats kind is a large
undertaking, involving lots of new code. That alone is going to limit
just how many ever get added, and compared to that effort, adding new
columns to the catalog is small fry.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-05-20 07:59:05 Re: Statistical aggregate functions are not working with PARTIAL aggregation
Previous Message Peter Eisentraut 2019-05-20 06:58:50 Re: remove doc/bug.template?