Skip site navigation (1) Skip section navigation (2)

Re: pg_dump and thousands of schemas

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Craig James <cjames(at)emolecules(dot)com>, Hugo <hugo(dot)tech(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-05-25 16:56:17
Message-ID: 24045.1337964977@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> There is an operation in pg_dump which is O(#_of_schemata_in_db *
> #_of_table_in_db), or something like that.
> The attached very crude patch reduces that to
> O(log_of_#_of_schemata_in_db * #_of_table_in_db)

> I was hoping this would be a general improvement.  It doesn't seem be.
>  But it is a very substantial improvement in the specific case of
> dumping one small schema out of a very large database.

Your test case in
<CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA(at)mail(dot)gmail(dot)com>
shows pretty conclusively that findNamespace is a time sink for large
numbers of schemas, so that seems worth fixing.  I don't like this
patch though: we already have infrastructure for this in pg_dump,
namely buildIndexArray/findObjectByOid, so what we should do is use
that not invent something new.  I will go see about doing that.

> It seems like dumping one schema would be better optimized by not
> loading up the entire database catalog, but rather by restricting to
> just that schema at the catalog stage.

The reason pg_dump is not built that way is that considerations like
dump order dependencies are not going to work at all if it only looks
at a subset of the database.  Of course, dependency chains involving
objects not dumped might be problematic anyway, but I'd still want it
to do the best it could.

> For dumping entire databases, It looks like the biggest problem is
> going to be LockReassignCurrentOwner in the server.  And that doesn't
> seem to be easy to fix, as any change to it to improve pg_dump will
> risk degrading normal use cases.

I didn't try profiling the server side, but pg_dump doesn't use
subtransactions so it's not clear to me why LockReassignCurrentOwner
would get called at all ...

			regards, tom lane

In response to

Responses

pgsql-performance by date

Next:From: Jeff JanesDate: 2012-05-25 17:41:19
Subject: Re: pg_dump and thousands of schemas
Previous:From: Jeff JanesDate: 2012-05-25 15:53:45
Subject: Re: pg_dump and thousands of schemas

pgsql-hackers by date

Next:From: Jeff JanesDate: 2012-05-25 17:41:19
Subject: Re: pg_dump and thousands of schemas
Previous:From: Josh BerkusDate: 2012-05-25 16:54:46
Subject: Re: pg_stat_statements temporary file

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group