Re: Possible performance regression with pg_dump of a large number of relations

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Luke Cowell <lcowell(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Possible performance regression with pg_dump of a large number of relations
Date: 2018-01-13 03:39:09
Message-ID: 20180113033909.GZ2416@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings Jeff & Luke,

* Jeff Janes (jeff(dot)janes(at)gmail(dot)com) wrote:
> Sorry, that query reflects some munging I did to it. The real part added
> to the query is:
>
> EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid
> = pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname =
> 'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND
> ((SELECT array_agg(acl) FROM (SELECT
> unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT
> unnest(coalesce(pip.initprivs,acldefault('c',c.relowner)))) as foo) IS NOT
> NULL OR (SELECT array_agg(acl) FROM (SELECT
> unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT
> SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner)))) as foo) IS
> NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_ac

Yes, this is to check if any of the rights on the table or any of its'
columns has been changed from what it's initial rights are as recorded
in pg_init_privs.

I've been playing around with this a bit tonight trying to think of a
way to avoid doing this work and it occurs to me that we really only
need to worry about initprivs on objects in schemas that are either
there at init time, or from extensions. Not all of the objects in the
system can have init-privs because the only way to get init-privs is
at initdb time or from an extension creating a new object.

As such, I've reworked the query (but not yet put it into pg_dump to
run it through the regression tests) to look like this (for anyone else
who wants to take a look at it and play with it):

EXPLAIN ANALYZE
SELECT c.tableoid, c.oid, c.relname,
(SELECT pg_catalog.array_agg(acl ORDER BY row_n)
FROM (SELECT acl, row_n
FROM pg_catalog.unnest(
coalesce(c.relacl,pg_catalog.acldefault(
CASE WHEN c.relkind = 'S'
THEN 's'
ELSE 'r' END::"char",c.relowner)))
WITH ORDINALITY AS perm(acl,row_n)
WHERE NOT EXISTS (
SELECT 1 FROM
pg_catalog.unnest(
coalesce(pip.initprivs,pg_catalog.acldefault(
CASE WHEN c.relkind = 'S'
THEN 's'
ELSE 'r' END::"char",c.relowner)))
AS init(init_acl)
WHERE acl = init_acl)) as foo
WHERE nsp.nspname IN ('information_schema','pg_catalog')
OR ext.oid IS NOT NULL)
AS relacl,
(SELECT pg_catalog.array_agg(acl ORDER BY row_n)
FROM (SELECT acl, row_n
FROM pg_catalog.unnest(
coalesce(pip.initprivs,pg_catalog.acldefault(
CASE WHEN c.relkind = 'S'
THEN 's'
ELSE 'r' END::"char",c.relowner)))
WITH ORDINALITY AS initp(acl,row_n)
WHERE NOT EXISTS (
SELECT 1 FROM pg_catalog.unnest(
coalesce(c.relacl,pg_catalog.acldefault(
CASE WHEN c.relkind = 'S'
THEN 's'
ELSE 'r' END::"char",c.relowner)))
AS permp(orig_acl)
WHERE acl = orig_acl)) as foo
WHERE nsp.nspname IN ('information_schema','pg_catalog')
OR ext.oid IS NOT NULL)
as rrelacl,
NULL AS initrelacl,
NULL as initrrelacl,
c.relkind,
c.relnamespace,
(SELECT rolname
FROM pg_catalog.pg_roles
WHERE oid = c.relowner) AS rolname,
c.relchecks,
c.relhastriggers,
c.relhasindex,
c.relhasrules,
c.relhasoids,
c.relrowsecurity,
c.relforcerowsecurity,
c.relfrozenxid,
c.relminmxid,
tc.oid AS toid,
tc.relfrozenxid AS tfrozenxid,
tc.relminmxid AS tminmxid,
c.relpersistence,
c.relispopulated,
c.relreplident,
c.relpages,
CASE WHEN c.reloftype <> 0
THEN c.reloftype::pg_catalog.regtype
ELSE NULL END AS reloftype,
d.refobjid AS owning_tab,
d.refobjsubid AS owning_col,
(SELECT spcname
FROM pg_tablespace t
WHERE t.oid = c.reltablespace) AS reltablespace,
array_remove(
array_remove(
c.reloptions,'check_option=local'),
'check_option=cascaded') AS reloptions,
CASE WHEN 'check_option=local' = ANY (c.reloptions)
THEN 'LOCAL'::text
WHEN 'check_option=cascaded' = ANY (c.reloptions)
THEN 'CASCADED'::text ELSE NULL END AS checkoption,
tc.reloptions AS toast_reloptions,
c.relkind = 'S' AND
EXISTS (SELECT 1 FROM pg_depend WHERE classid = 'pg_class'::regclass
AND objid = c.oid AND objsubid = 0
AND refclassid = 'pg_class'::regclass
AND deptype = 'i') AS is_identity_sequence,
EXISTS (SELECT 1 FROM pg_attribute at
LEFT JOIN pg_init_privs pip
ON (c.oid = pip.objoid
AND pip.classoid = 'pg_class'::regclass
AND pip.objsubid = at.attnum)
WHERE
(nsp.nspname IN ('information_schema','pg_catalog')
OR ext.oid IS NOT NULL)
AND
at.attrelid = c.oid AND (
(SELECT pg_catalog.array_agg(acl ORDER BY row_n)
FROM (SELECT acl, row_n
FROM pg_catalog.unnest(
coalesce(at.attacl,
pg_catalog.acldefault('c',c.relowner)))
WITH ORDINALITY AS perm(acl,row_n)
WHERE
NOT EXISTS (
SELECT 1 FROM pg_catalog.unnest(
coalesce(pip.initprivs,
pg_catalog.acldefault('c',c.relowner)))
AS init(init_acl)
WHERE acl = init_acl)
) as foo) IS NOT NULL
OR
(SELECT pg_catalog.array_agg(acl ORDER BY row_n)
FROM (SELECT acl, row_n
FROM pg_catalog.unnest(
coalesce(pip.initprivs,
pg_catalog.acldefault('c',c.relowner)))
WITH ORDINALITY AS initp(acl,row_n)
WHERE NOT EXISTS (
SELECT 1 FROM pg_catalog.unnest(
coalesce(at.attacl,pg_catalog.
acldefault('c',c.relowner)))
AS permp(orig_acl)
WHERE acl = orig_acl)) as foo) IS NOT NULL
OR NULL IS NOT NULL
OR NULL IS NOT NULL)) AS changed_acl,
pg_get_partkeydef(c.oid) AS partkeydef,
c.relispartition AS ispartition,
pg_get_expr(c.relpartbound, c.oid) AS partbound
FROM pg_class c
JOIN pg_namespace nsp ON (c.relnamespace = nsp.oid)
LEFT JOIN pg_extension ext ON (nsp.oid = ext.extnamespace)
LEFT JOIN pg_depend d
ON (c.relkind = 'S'
AND d.classid = c.tableoid
AND d.objid = c.oid
AND d.objsubid = 0
AND d.refclassid = c.tableoid
AND d.deptype IN ('a', 'i'))
LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid)
LEFT JOIN pg_init_privs pip
ON (c.oid = pip.objoid
AND pip.classoid = 'pg_class'::regclass
AND pip.objsubid = 0)
WHERE c.relkind in ('r', 'S', 'v', 'c', 'm', 'f', 'p')
ORDER BY c.oid;

This ends up dropping the query time from around 8.6s on my system to
about 1s, with a test rig of 1000 schemas and 100,000 tables.

Unfortunately, the way pg_dump is structured today, it really wouldn't
be easy to have it only run this query for the tables in the schema(s)
requested because it expects to be able to look for dependencies in the
data structure which results from this query (as I recall, it's been a
while since I looked, but I remember trying to figure out a way to do
that and it certainly didn't look easy to do).

There's another query that's currently taking about 4s on my system that
I believe could benefit from a similar treatment to reduce the work the
query is having to do. Hopefully with those two changes we can get
pg_dump runtime on this use-case back closer to what it was before these
changes went in.

If folks get a chance to take a look at the query and/or test, that'd be
great. I'll try to work up an actual patch to pg_dump this weekend to
run it through the regression tests and see if anything breaks.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-01-13 04:01:38 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Robert Haas 2018-01-13 02:42:49 Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?