Re: Multivariate MCV stats can leak data to unprivileged users

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Date: 2019-05-19 09:49:03
Message-ID: CAEZATCUcoAKrn-1NFkDN7md-wt6CQev0JXa+Tv31T9Jjkxgx5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 19 May 2019 at 00:48, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> >
> > > I think we have four options - rework it before beta1, rework it after
> > > beta1, rework it in PG13 and leave it as it is now.
> >
> > Yup, that's about what the options are. I'm just voting against
> > "change it in v13". If we're going to change it, then the fewer
> > major versions that have the bogus definition the better --- and
> > since we're changing that catalog in v12 anyway, users will see
> > fewer distinct behaviors if we do this change too.
> >
> > It's very possibly too late to get it done before beta1,
> > unfortunately. But as Andres noted, post-beta1 catversion
> > bumps are hardly unusual, so I do not think "rework after
> > beta1" is unacceptable.
>
> Agreed.
>

Yes, that makes sense.

I think we shouldn't risk trying to get this into beta1, but let's try
to get it done as soon as possible after that.

Actually, it doesn't appear to be as big a change as I had feared. As
a starter for ten, here's a patch doing the basic split, moving the
extended stats data into a new catalog pg_statistic_ext_data (I'm not
particularly wedded to that name, it's just the first name that came
to mind).

With this patch the catalogs look like this:

\d pg_statistic_ext
Table "pg_catalog.pg_statistic_ext"
Column | Type | Collation | Nullable | Default
--------------+------------+-----------+----------+---------
oid | oid | | not null |
stxrelid | oid | | not null |
stxname | name | | not null |
stxnamespace | oid | | not null |
stxowner | oid | | not null |
stxkeys | int2vector | | not null |
stxkind | "char"[] | | not null |
Indexes:
"pg_statistic_ext_name_index" UNIQUE, btree (stxname, stxnamespace)
"pg_statistic_ext_oid_index" UNIQUE, btree (oid)
"pg_statistic_ext_relid_index" btree (stxrelid)

\d pg_statistic_ext_data
Table "pg_catalog.pg_statistic_ext_data"
Column | Type | Collation | Nullable | Default
-----------------+-----------------+-----------+----------+---------
stxoid | oid | | not null |
stxndistinct | pg_ndistinct | C | |
stxdependencies | pg_dependencies | C | |
stxmcv | pg_mcv_list | C | |
Indexes:
"pg_statistic_ext_data_stxoid_index" UNIQUE, btree (stxoid)

I opted to create/remove pg_statistic_ext_data tuples at the same time
as the pg_statistic_ext tuples, in CreateStatistics() /
RemoveStatisticsById(), so then it's easier to see that they're in a
one-to-one relationship, and other code doesn't need to worry about
the data tuple not existing. The other option would be to defer
inserting the data tuple to ANALYZE.

I couldn't resist moving the code block that declares
pg_statistic_ext's indexes in indexing.h to the right place, assuming
that file is (mostly) sorted alphabetically by catalog name. This puts
the extended stats entries just after the normal stats entries which
seems preferable.

This is only a very rough first draft (e.g., no doc updates), but it
passes all the regression tests.

Regards,
Dean

Attachment Content-Type Size
split-up-pg-statistics-ext.patch text/x-patch 32.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Akim Demaille 2019-05-19 12:47:32 Remove useless associativity/precedence from parsers
Previous Message David Rowley 2019-05-19 08:18:38 Re: Statistical aggregate functions are not working with PARTIAL aggregation