Re: pg_dump and thousands of schemas

From: Craig James <cjames(at)emolecules(dot)com>
To: Hugo <hugo(dot)tech(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-05-24 15:21:36
Message-ID: CAFwQ8rdZ3LjiwdDuzNR3vN=OryQ6fzDb08hzeKzNbOzrQXNPbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, May 24, 2012 at 12:06 AM, Hugo <Nabble> <hugo(dot)tech(at)gmail(dot)com> wrote:

> Hi everyone,
>
> We have a production database (postgresql 9.0) with more than 20,000
> schemas
> and 40Gb size. In the past we had all that information in just one schema
> and pg_dump used to work just fine (2-3 hours to dump everything). Then we
> decided to split the database into schemas, which makes a lot of sense for
> the kind of information we store and the plans we have for the future. The
> problem now is that pg_dump takes forever to finish (more than 24 hours)
> and
> we just can't have consistent daily backups like we had in the past. When I
> try to dump just one schema with almost nothing in it, it takes 12 minutes.
> When I try to dump a big schema with lots of information, it takes 14
> minutes. So pg_dump is clearly lost in the middle of so many schemas. The
> load on the machine is low (it is a hot standby replica db) and we have
> good
> configurations for memory, cache, shared_buffers and everything else. The
> performance of the database itself is good, it is only pg_dump that is
> inefficient for the task. I have found an old discussion back in 2007 that
> seems to be quite related to this problem:
>
>
> http://postgresql.1045698.n5.nabble.com/5-minutes-to-pg-dump-nothing-tp1888814.html
>
> It seems that pg_dump hasn't been tested with a huge number of schemas like
> that. Does anyone have a solution or suggestions? Do you know if there are
> patches specific for this case?
>

How many total relations do you have? I don't know if there is a limit to
the number of schemas, but I suspect when you went from one schema to
20,000 schemas, you also went from N relations to 20000*N relations.

Somewhere between 100,000 and 1 million total relations, Postgres starts to
have trouble. See this thread:

http://permalink.gmane.org/gmane.comp.db.postgresql.performance/33254

(Why is it that Google can't find these archives on postgresql.org?)

Craig

> Thanks in advance,
> Hugo
>
> -----
> Official Nabble Administrator - we never ask for passwords.
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-05-24 15:22:10 Re: Creating multiple indexes in one table scan.
Previous Message Peter Geoghegan 2012-05-24 15:20:45 Re: pg_stat_statments queryid

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Crawford 2012-05-24 15:53:47 Re: High load average in 64-core server , no I/O wait and CPU is idle
Previous Message Rajesh Kumar. Mallah 2012-05-24 12:58:07 Re: High load average in 64-core server , no I/O wait and CPU is idle