Re: Possible performance regression with pg_dump of a large number of relations

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Luke Cowell <lcowell(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Possible performance regression with pg_dump of a large number of relations
Date: 2018-01-12 16:01:53
Message-ID: CAMkU=1x7hhZrY5yL9ZdvOoETJN5=3WKNCPD37q+s0wnmAfVKRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 11, 2018 at 5:26 PM, Luke Cowell <lcowell(at)gmail(dot)com> wrote:

> I've been troubleshooting an issue with slow pg_dump times on postgres
> 9.6.6. I believe something changed between 9.5.10 and 9.6.6 that has made
> dumps significantly slower for databases with a large number of relations.
> I posted this in irc and someone suggested that I should post this here.
> I'm sorry if this isn't the right place.
>
> To simulate the issue I generated 150,000 relations spread across 1000
> schemas (this roughly reflects my production setup).
>
> ```ruby
> File.write "many_relations.sql", (150000 / 150).times.flat_map {|n|
> [
> "create schema s_#{n};",
> 150.times.map do |t|
> "create table s_#{n}.test_#{t} (id int);"
> end
> ]
> }.join("\n")
> ```
>
> I have 2 identical pieces of hardware. I've installed 9.5 on one and 9.6
> on the other. I've run the same generated piece of sql in a fresh database
> on both systems.
>
> On my 9.5.10 system:
> > time pg_dump -n s_10 testing > /dev/null
> real 0m5.492s
> user 0m1.424s
> sys 0m0.184s
>
> On my 9.6.6 system:
> > time pg_dump -n s_10 testing > /dev/null
> real 0m27.342s
> user 0m1.748s
> sys 0m0.248s
>

I don't get quite as large a regression as you do, from 6s to 19s. It
looks like there are multiple of them, but the biggest is caused by:

commit 5d589993cad212f7d556d52cc1e42fe18f65b057
Author: Stephen Frost <sfrost(at)snowman(dot)net>
Date: Fri May 6 14:06:50 2016 -0400

pg_dump performance and other fixes

That commit covered a few different things, and I don't what improvement it
mentions is the one that motivated this, but the key change was to add this
query:

EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid
= pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname =
'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND
at.attnum>0 and ((SELECT count(acl) FROM (SELECT
unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT
unnest(coalesce(pip.initprivs,acldefault('c',c.relowner)))) as foo) >1 OR
(SELECT count(acl) FROM (SELECT
unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT
SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner)))) as foo)
>0))AS changed_acl

Considering it runs 2 subqueries for every column (including the 6 hidden
system columns) of every table, even ones that don't end up getting dumped
out, it is no wonder it is slow.

If you were just dumping the database with 150,000 objects, I wouldn't
worry about a 20 second regression. But I assume you intend to loop over
every schema and dump each individually?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-12 16:03:04 Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Previous Message Teodor Sigaev 2018-01-12 15:51:48 Re: [HACKERS] PoC: custom signal handler for extensions