Re: Possible performance regression with pg_dump of a large number of relations

From: Felipe Sateler <fsateler(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Luke Cowell <lcowell(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Possible performance regression with pg_dump of a large number of relations
Date: 2020-01-18 22:46:11
Message-ID: 20200118224611.xgwxdlvs7egflxqm@felipedell
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Stephen, everyone,

On Fri, Jan 12, 2018 at 10:39:09PM -0500, Stephen Frost wrote:
> Greetings Jeff & Luke,
>
> * Jeff Janes (jeff(dot)janes(at)gmail(dot)com) wrote:
> > Sorry, that query reflects some munging I did to it. The real part added
> > to the query is:
> >
> > EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid
> > = pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname =
> > 'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND
> > ((SELECT array_agg(acl) FROM (SELECT
> > unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT
> > unnest(coalesce(pip.initprivs,acldefault('c',c.relowner)))) as foo) IS NOT
> > NULL OR (SELECT array_agg(acl) FROM (SELECT
> > unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT
> > SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner)))) as foo) IS
> > NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_ac
>
> Yes, this is to check if any of the rights on the table or any of its'
> columns has been changed from what it's initial rights are as recorded
> in pg_init_privs.
>
> I've been playing around with this a bit tonight trying to think of a
> way to avoid doing this work and it occurs to me that we really only
> need to worry about initprivs on objects in schemas that are either
> there at init time, or from extensions. Not all of the objects in the
> system can have init-privs because the only way to get init-privs is
> at initdb time or from an extension creating a new object.
>
> As such, I've reworked the query (but not yet put it into pg_dump to
> run it through the regression tests) to look like this (for anyone else
> who wants to take a look at it and play with it):
> <snip sql>
>
> This ends up dropping the query time from around 8.6s on my system to
> about 1s, with a test rig of 1000 schemas and 100,000 tables.

The effect seems quite milder here. It reduces from 8s to 5.7s. Still an
improvement though.

I'm using pg_dump 11.6. Has something been merged since this time and
now?

My use case is similar to the OP: I dump each schema separately, which
causes long delays between dumps.

>
> Unfortunately, the way pg_dump is structured today, it really wouldn't
> be easy to have it only run this query for the tables in the schema(s)
> requested because it expects to be able to look for dependencies in the
> data structure which results from this query (as I recall, it's been a
> while since I looked, but I remember trying to figure out a way to do
> that and it certainly didn't look easy to do).

This would be nice. Such a filter (when applied manually) reduces the query
time significantly.

--
Saludos,
Felipe Sateler

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-18 23:05:49 Re: Patch to document base64 encoding
Previous Message Thomas Munro 2020-01-18 20:52:21 Re: should crash recovery ignore checkpoint_flush_after ?