Re: Multivariate MCV stats can leak data to unprivileged users

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Date: 2019-05-20 15:09:24
Message-ID: CAEZATCXtSSJmZGNiP8N1=hyHhrE3vM6w53qTpoiJgy6Jtai_3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 20 May 2019 at 14:32, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> > On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> Oh, right. It still has the disadvantage that it obfuscates the actual
> >> data stored in the pg_stats_ext_data (or whatever would it be called),
> >> so e.g. functions would have to do additional checks to make sure it
> >> actually is the right statistic type. For example pg_mcv_list_items()
> >> could not rely on receiving pg_mcv_list values, as per the signature,
> >> but would have to check the value.
>
> > Yes. In fact, since the user-accessible view would want to expose
> > datatypes specific to the stats kinds rather than bytea or cstring
> > values, we would need SQL-callable conversion functions for each kind:
>
> It seems like people are willfully misunderstanding my suggestion.

I'm more than capable of inadvertently misunderstanding, without the
need to willfully do so :-)

> You'd only need *one* conversion function, which would look at the
> embedded ID field and then emit the appropriate text representation.
> I don't see a reason why we'd have the separate pg_ndistinct etc. types
> any more at all.

Hmm, OK. So then would you also make the user-accessible view agnostic
about the kinds of stats supported in the same way, returning zero or
more rows per STATISTICS object, depending on how many kinds of stats
have been built? That would have the advantage of never needing to
change the view definition again, as more stats kinds are supported.

We'd need to change pg_mcv_list_items() to accept a pg_stats_ext_data
value rather than a pg_mcv value, and it would be the user's
responsibility to call it if they wanted to see the contents of the
MCV list (I was originally thinking that we'd include a call to
pg_mcv_list_items() in the view definition, so that it produced
friendlier looking output, since the default textual representation of
an MCV list is completely opaque, unlike the other stats kinds).
Actually, I can see another advantage to not including
pg_mcv_list_items() in the view definition -- in the future, we may
dream up a better version of pg_mcv_list_items(), like say one that
produced JSON, and then we'd regret using the current function.

> Anyway, it was just a suggestion, and if people don't like it that's
> fine. But I don't want it to be rejected on the basis of false
> arguments.

To be clear, I'm not intentionally rejecting your idea. I'm merely
trying to fully understand the implications.

At this stage, perhaps it would be helpful to prototype something for
comparison.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-05-20 15:23:51 Re: Statistical aggregate functions are not working with PARTIAL aggregation
Previous Message Tomas Vondra 2019-05-20 14:45:17 Re: Multivariate MCV stats can leak data to unprivileged users