Re: [HACKERS] pg_dump and thousands of schemas

From: Denis <socsam(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] pg_dump and thousands of schemas
Date: 2012-11-07 10:42:52
Message-ID: 1352284972611-5730998.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Tom Lane-2 wrote
> Denis &lt;

> socsam@

> &gt; writes:
>> Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it
>> takes from 10 to 15 seconds when I am doing backup.
>
>> Sort (cost=853562.04..854020.73 rows=183478 width=219) (actual
>> time=5340.477..5405.604 rows=183924 loops=1)
>
> Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
> that you've got 183924 tables. That's going to take some time no matter
> what.
>
> It does seem like we could make some small changes to optimize that
> query a little bit, but they're not going to result in any amazing
> improvement overall, because pg_dump still has to deal with all the
> tables it's getting back. Fundamentally, I would ask whether you really
> need so many tables. It seems pretty likely that you have lots and lots
> of basically-identical tables. Usually it would be better to redesign
> such a structure into fewer tables with more index columns.
>
>> Here is the output of "pg_dump -s" test.dump
>> &lt;http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump&gt;
>
> This dump contains only 1 schema and 43 tables, so I don't think it's
> for the database you're having trouble with ...
>
> regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (

> pgsql-performance@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

I wonder why pg_dump has to have deal with all these 183924 tables, if I
specified to dump only one scheme: "pg_dump -n schema_name" or even like
this to dump just one table "pg_dump -t 'schema_name.comments' " ?

We have a web application where we create a schema with a number of tables
in it for each customer. This architecture was chosen to ease the process of
backup/restoring data. Sometimes clients ask us to restore data for the last
month or roll back to last week's state. This task is easy to accomplish
then the client's data is isolated in a schema/DB. If we put all the clients
data in one table - operations of this kind will be much harder to perform.
We will have to restore a huge DB with an enormously large tables in it to
find the requested data.
Different clients have different activity rate and we can select different
backup strategies according to it. This would be impossible in case we keep
all the clients data in one table.
Besides all the above mentioned, the probability of massive data corruption
(if an error in our web application occurs) is much higher.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730998.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-11-07 12:01:27 Re: WIP patch for hint bit i/o mitigation
Previous Message Atri Sharma 2012-11-07 10:31:37 Re: WIP patch for hint bit i/o mitigation

Browse pgsql-performance by date

  From Date Subject
Next Message Rodrigo Rosenfeld Rosas 2012-11-07 11:16:09 Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Previous Message Mark Kirkwood 2012-11-07 04:16:20 Re: HT on or off for E5-26xx ?