Re: pg_dump dump catalog ACLs

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump dump catalog ACLs
Date: 2016-04-25 04:39:09
Message-ID: 20160425043909.GW10850@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah, all,

* Noah Misch (noah(at)leadboat(dot)com) wrote:
> On Fri, Apr 22, 2016 at 12:31:41PM -0400, Stephen Frost wrote:
> > After looking through the code a bit, I realized that there are a lot of
> > object types which don't have ACLs at all but which exist in pg_catalog
> > and were being analyzed because the bitmask for pg_catalog included ACLs
> > and therefore was non-zero.
> >
> > Clearing that bit for object types which don't have ACLs improved the
> > performance for empty databases quite a bit (from about 3s to a bit
> > under 1s on my laptop). That's a 42-line patch, with comment lines
> > being half of that, which I'll push once I've looked into the other
> > concerns which were brought up on this thread.
>
> That's good news.

Attached patch-set includes this change in patch #2.

Patch #1 is the fix for the incorrect WHERE clause.

> > Much of the remaining inefficiancy is how we query for column
> > information one table at a time (that appears to be around 300ms of the
> > 900ms or so total time). I'm certainly interested in improving that but
> > that would require adding more complex data structures to pg_dump than
> > what we use currently (we'd want to grab all of the columns we care
> > about in an entire schema and store it locally and then provide a way to
> > look up that information, etc), so I'm not sure it'd be appropriate to
> > do now.
>
> I'm not sure, either; I'd need to see more to decide. If I were you, I would
> draft a patch to the point where the community can see the complexity and the
> performance change. That should be enough to inform the choice among moving
> forward with the proposed complexity, soliciting other designs, reverting the
> original changes, or accepting for 9.6 the slowdown as it stands at that time.
> Other people may have more definite opinions already.

I'll look at doing this once I'm done with the regression test
improvements (see below).

* Noah Misch (noah(at)leadboat(dot)com) wrote:
> On Fri, Apr 22, 2016 at 08:24:53PM -0400, Stephen Frost wrote:
> > * Noah Misch (noah(at)leadboat(dot)com) wrote:
> > > I wrote the attached test script to verify which types of ACLs dump/reload
> > > covers. Based on the commit message, I expected these results:
> > >
> > > Dumped: type, class, attribute, proc, largeobject_metadata,
> > > foreign_data_wrapper, foreign_server,
> > > language(in-extension), namespace(in-extension)
> > > Not dumped: database, tablespace,
> > > language(from-initdb), namespace(from-initdb)
> > >
> > > Did I interpret the commit message correctly? The script gave these results:
> > >
> > > Dumped: type, class, attribute, namespace(from-initdb)
> > > Not dumped: database, tablespace, proc, language(from-initdb)
> >
> > You interpreted the commit message correctly and in a number of cases
> > the correct results are generated, but there's an issue in the WHERE
> > clause being used for some of the object types.
>
> I'm wondering whether it would be a slightly better design to treat
> language(from-initdb) like language(in-extension). Likewise for namespaces.
> The code apparently already exists to dump from-initdb namespace ACLs. Is
> there a user interface design reason to want to distinguish the from-initdb
> and in-extension cases?

Reviewing the code, we do record from-initdb namespace privileges, and
we also record in-extension namespace privileges (see
ExecGrant_Namespace()).

We also record from-initdb language ACLs (initdb.c:2071) and
in-extension language ACLs (see ExecGrant_Language()), so I'm not
entirely sure what, if any, issue exists here either.

For both, we also grab the ACL info vs. pg_init_privs in pg_dump.

The issue in these cases is a bit of an interesting one- they are not
part of a namespace; this patch was focused on allowing users to modify,
specifically, the ACLs of objects in the 'pg_catalog' namespace.

For languages, I believe that means that we simply need to modify the
selectDumpableProcLang() function to use the same default we use for the
'pg_catalog' namespace- DUMP_COMPONENT_ACL, instead of
DUMP_COMPONENT_NONE.

What's not clear to me is what, if any, issue there is with namespaces.

Certainly, in my testing at least, if you do:

REVOKE CREATE ON SCHEMA public FROM public;

Then you get the appropriate commands from pg_dump to implement the
resulting ACLs on the public schema. If you change the permissions back
to match what is there at initdb-time (or you just don't change them),
then there aren't any GRANT or REVOKE commands from pg_dump, but that's
correct, isn't it?

> > That's relatively straight-forward to fix, but I'm going to put
> > together a series of TAP tests to go with these fixes. While I tested
> > various options and bits and pieces of the code while developing, this
> > really needs a proper test suite that runs through all of these
> > permutations with each change.
>
> Sounds great. Thanks.
>
> > I'm planning to have that done by the end of the weekend.

In the attached patch-set, patch #3 includes support for

src/bin/pg_dump: make check

implemented using the TAP testing system. There are a total of 360
tests, generally covering:

Various invalid sets of command-line options.

Valid command-line options (generally independently used):

(no options- defaults)
--clean
--clean --if-exists
--data-only
--format=c (tested with pg_restore)
--format=d (tested with pg_restore)
--format=t (tested with pg_restore)
--format=d --jobs=2 (tested with pg_restore)
--exclude-schema
--exclude-table
--no-privileges
--no-owner
--schema
--schema-only

Note that this is only including tests for basic schemas, tables, data,
and privileges, so far. I believe it's pretty obvious that we want to
include all object types and include testing of extensions, I just
haven't gotten there yet and figured now would be a good time to solicit
feedback on the approach I've used.

I'm not sure how valuable it is to test all the different combinations
of command-line options, though clearly some should be tested (eg:
include-schema, exclude table in that schema, and similar cases).

I'm planning to work on adding in other object types and, once that's
more-or-less complete, adding in a test for extensions and then adding
tests for GRANT/REVOKE on from-initdb and in-extension objects.

Thoughts?

Thanks!

Stephen

Attachment Content-Type Size
pg_dump_improvements_v1.patch text/x-diff 27.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2016-04-25 04:57:57 Re: checkpoint_flush_after documentation inconsistency
Previous Message Tom Lane 2016-04-25 03:31:22 Re: Why doesn't src/backend/port/win32/socket.c implement bind()?