Re: pg_dump and thousands of schemas

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Craig James <cjames(at)emolecules(dot)com>, Hugo <hugo(dot)tech(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-05-25 15:40:04
Message-ID: CAMkU=1yjA-xQXnFTQVWa5jq-JPXyRdm5RXQQtADDm4T7vKhHoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, May 24, 2012 at 8:54 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote:

>> pg_dump itself seems to have a lot of quadratic portions (plus another
>> one on the server which it hits pretty heavily), and it hard to know
>> where to start addressing them.  It seems like addressing the overall
>> quadratic nature might be a globally better option, but addressing
>> just the problem with dumping one schema might be easier to kluge
>> together.
>
> Postgres 9.2 will have some speedups for pg_dump scanning large
> databases --- that might help.

Those speed ups don't seem to apply here, though. I get the same
performance in 9.0.7 as 9.2.beta1.

There is an operation in pg_dump which is O(#_of_schemata_in_db *
#_of_table_in_db), or something like that.

The attached very crude patch reduces that to
O(log_of_#_of_schemata_in_db * #_of_table_in_db)

I was hoping this would be a general improvement. It doesn't seem be.
But it is a very substantial improvement in the specific case of
dumping one small schema out of a very large database.

It seems like dumping one schema would be better optimized by not
loading up the entire database catalog, but rather by restricting to
just that schema at the catalog stage. But I haven't dug into those
details.

For dumping entire databases, It looks like the biggest problem is
going to be LockReassignCurrentOwner in the server. And that doesn't
seem to be easy to fix, as any change to it to improve pg_dump will
risk degrading normal use cases.

If we want to be able to efficiently dump entire databases in a
scalable way, it seems like there should be some way to obtain a
data-base-wide AccessShare lock, which blocks AccessExclusive locks on
any object in the database, and turns ordinary object-level
AccessShare lock requests into no-ops. I don't think you can get
hundreds of thousands of simultaneously held and individually recorded
AccessShare locks without causing bad things to happen.

Cheers,

Jeff

Attachment Content-Type Size
pg_dump_ns_search.patch application/octet-stream 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2012-05-25 15:53:45 Re: pg_dump and thousands of schemas
Previous Message Tom Lane 2012-05-25 15:22:49 Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Spiegelberg 2012-05-25 15:52:22 Millions of relations (from Maximum number of sequences that can be created)
Previous Message Bruce Momjian 2012-05-25 15:18:30 Re: pg_dump and thousands of schemas