Re: pg_dump and thousands of schemas

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Craig James <cjames(at)emolecules(dot)com>, Hugo <hugo(dot)tech(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-05-25 16:56:17
Message-ID: 24045.1337964977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> There is an operation in pg_dump which is O(#_of_schemata_in_db *
> #_of_table_in_db), or something like that.
> The attached very crude patch reduces that to
> O(log_of_#_of_schemata_in_db * #_of_table_in_db)

> I was hoping this would be a general improvement. It doesn't seem be.
> But it is a very substantial improvement in the specific case of
> dumping one small schema out of a very large database.

Your test case in
<CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA(at)mail(dot)gmail(dot)com>
shows pretty conclusively that findNamespace is a time sink for large
numbers of schemas, so that seems worth fixing. I don't like this
patch though: we already have infrastructure for this in pg_dump,
namely buildIndexArray/findObjectByOid, so what we should do is use
that not invent something new. I will go see about doing that.

> It seems like dumping one schema would be better optimized by not
> loading up the entire database catalog, but rather by restricting to
> just that schema at the catalog stage.

The reason pg_dump is not built that way is that considerations like
dump order dependencies are not going to work at all if it only looks
at a subset of the database. Of course, dependency chains involving
objects not dumped might be problematic anyway, but I'd still want it
to do the best it could.

> For dumping entire databases, It looks like the biggest problem is
> going to be LockReassignCurrentOwner in the server. And that doesn't
> seem to be easy to fix, as any change to it to improve pg_dump will
> risk degrading normal use cases.

I didn't try profiling the server side, but pg_dump doesn't use
subtransactions so it's not clear to me why LockReassignCurrentOwner
would get called at all ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2012-05-25 17:41:19 Re: pg_dump and thousands of schemas
Previous Message Josh Berkus 2012-05-25 16:54:46 Re: pg_stat_statements temporary file

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2012-05-25 17:41:19 Re: pg_dump and thousands of schemas
Previous Message Jeff Janes 2012-05-25 15:53:45 Re: pg_dump and thousands of schemas