| From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Multivariate MCV stats can leak data to unprivileged users | 
| Date: | 2019-05-19 22:44:59 | 
| Message-ID: | 20190519224459.7dsdnv5gsfmnm4em@development | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sun, May 19, 2019 at 02:14:54PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> On Sun, May 19, 2019 at 10:28:43AM -0400, Tom Lane wrote:
>>> No, wait, scratch that.  We could fold the three existing types
>>> pg_ndistinct, pg_dependencies, pg_mcv_list into one new type, say
>>> "pg_stats_ext_data", where the actual storage would need to have an
>>> ID field (so we'd waste a byte or two duplicating the externally
>>> visible stxkind field inside stxdata).  The output function for this
>>> type is just a switch over the existing code.  The big advantage of
>>> this way compared to the current approach is that adding a new
>>> ext-stats type requires *zero* work with adding new catalog entries.
>>> Just add another switch case in pg_stats_ext_data_out() and you're
>>> done.
>
>> The annoying thing is that this undoes the protections provided by special
>> data types generated only in internally. It's not possible to generate
>> e.g. pg_mcv_list values in user code (except for C code, at which points
>> all bets are off anyway). By abandoning this and reverting to bytea anyone
>> could craft a bytea and claim it's a statistic value.
>
>That would have been true of the original proposal, but not of this
>modified one.
>
Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.
Of course, I don't expect to have too many such functions, but overall
this approach with a single type feels a bit too like EAV to my taste.
I think Dean is right we should not expect many more statistic types
than what we already have - a histogram, and perhaps one or two more. So
I agree with Dean the current design with separate statistic types is
not such a big issue.
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2019-05-19 22:55:06 | Do we expect tests to work with default_transaction_isolation=serializable | 
| Previous Message | David Rowley | 2019-05-19 22:36:43 | Re: Statistical aggregate functions are not working with PARTIAL aggregation |