From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Craig James <cjames(at)emolecules(dot)com>, Hugo <hugo(dot)tech(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: pg_dump and thousands of schemas |
Date: | 2012-05-25 16:56:17 |
Message-ID: | 24045.1337964977@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> There is an operation in pg_dump which is O(#_of_schemata_in_db *
> #_of_table_in_db), or something like that.
> The attached very crude patch reduces that to
> O(log_of_#_of_schemata_in_db * #_of_table_in_db)
> I was hoping this would be a general improvement. It doesn't seem be.
> But it is a very substantial improvement in the specific case of
> dumping one small schema out of a very large database.
Your test case in
<CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA(at)mail(dot)gmail(dot)com>
shows pretty conclusively that findNamespace is a time sink for large
numbers of schemas, so that seems worth fixing. I don't like this
patch though: we already have infrastructure for this in pg_dump,
namely buildIndexArray/findObjectByOid, so what we should do is use
that not invent something new. I will go see about doing that.
> It seems like dumping one schema would be better optimized by not
> loading up the entire database catalog, but rather by restricting to
> just that schema at the catalog stage.
The reason pg_dump is not built that way is that considerations like
dump order dependencies are not going to work at all if it only looks
at a subset of the database. Of course, dependency chains involving
objects not dumped might be problematic anyway, but I'd still want it
to do the best it could.
> For dumping entire databases, It looks like the biggest problem is
> going to be LockReassignCurrentOwner in the server. And that doesn't
> seem to be easy to fix, as any change to it to improve pg_dump will
> risk degrading normal use cases.
I didn't try profiling the server side, but pg_dump doesn't use
subtransactions so it's not clear to me why LockReassignCurrentOwner
would get called at all ...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2012-05-25 17:41:19 | Re: pg_dump and thousands of schemas |
Previous Message | Josh Berkus | 2012-05-25 16:54:46 | Re: pg_stat_statements temporary file |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2012-05-25 17:41:19 | Re: pg_dump and thousands of schemas |
Previous Message | Jeff Janes | 2012-05-25 15:53:45 | Re: pg_dump and thousands of schemas |