Re: pg_dump and thousands of schemas

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Hugo <hugo(dot)tech(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-05-28 21:24:26
Message-ID: CAMkU=1zxd_vn5DPw9W6cKe==f=pxScbQE=cd7Gq69eO0YjGsTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sat, May 26, 2012 at 9:12 PM, Hugo <Nabble> <hugo(dot)tech(at)gmail(dot)com> wrote:
> Here is a sample dump that takes a long time to be written by pg_dump:
> http://postgresql.1045698.n5.nabble.com/file/n5710183/test.dump.tar.gz
> test.dump.tar.gz
> (the file above has 2.4Mb, the dump itself has 66Mb)
>
> This database has 2,311 schemas similar to those in my production database.
> All schemas are empty,

This dump does not reload cleanly. It uses many roles which it
doesn't create. Also, the schemata are not empty, they have about 20
tables apiece.

I created the missing roles with all default options.

Doing a default pg_dump took 66 minutes.

> but pg_dump still takes 3 hours to finish it on my
> computer. So now you can imagine my production database with more than
> 20,000 schemas like that. Can you guys take a look and see if the code has
> room for improvements?

There is a quadratic behavior in pg_dump's "mark_create_done". This
should probably be fixed, but in the mean time it can be circumvented
by using -Fc rather than -Fp for the dump format. Doing that removed
17 minutes from the run time.

I'm working on a patch to reduce the LockReassignCurrentOwner problem
in the server when using pg_dump with lots of objects. Using a
preliminary version for this, in conjunction with -Fc, reduced the
dump time to 3.5 minutes.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2012-05-28 22:12:46 Re: Bogus nestloop rows estimate in 8.4.7
Previous Message Tom Lane 2012-05-28 21:20:37 Re: FDW / list of needed columns, WHERE conditions (in PlanForeignScan)

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-05-28 22:26:36 Re: pg_dump and thousands of schemas
Previous Message Alejandro Carrillo 2012-05-28 18:24:13 Recover rows deleted