Skip site navigation (1) Skip section navigation (2)

Re: pg_dump and thousands of schemas

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: jeff(dot)janes(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-06-13 01:45:25
Message-ID: 20120613.104525.2089875055094693929.t-ishii@sraoss.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
> On Tue, Jun 12, 2012 at 1:54 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>>> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>>>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>>>>>> management in the server.  What I fixed so far on the pg_dump side
>>>>>> should be enough to let partial dumps run at reasonable speed even if
>>>>>> the whole database contains many tables.  But if psql is taking
>>>>>> AccessShareLock on lots of tables, there's still a problem.
>>>>>
>>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>>> it only takes 3 seconds. Comments?
>>>>
>>>> Could you rebase this?  I tried doing it myself, but must have messed
>>>> it up because it got slower rather than faster.
>>>
>>> OK, I found the problem.  In fixing a merge conflict, I had it execute
>>> the query every time it appended a table, rather than just at the end.
>>>
>>> With my proposed patch in place, I find that for a full default dump
>>> your patch is slightly faster with < 300,000 tables, and slightly
>>> slower with > 300,000.  The differences are generally <2% in either
>>> direction.  When it comes to back-patching and partial dumps, I'm not
>>> really sure what to test.
>>>
>>> For the record, there is still a quadratic performance on the server,
>>> albeit with a much smaller constant factor than the Reassign one.  It
>>> is in get_tabstat_entry.  I don't know if is worth working on that in
>>> isolation--if PG is going to try to accommodate 100s of thousands of
>>> table, there probably needs to be a more general way to limit the
>>> memory used by all aspects of the rel caches.
>>
>> I would like to test your patch and w/without my patch. Could you
>> please give me the patches? Or do you have your own git repository?
> 
> The main patch is in the commit fest as "Resource Owner reassign Locks
> for the sake of pg_dump"
> 
> My re-basing of your patch is attached.

I tested your patches with current master head. The result was pretty
good. Before it took 125 minutes (with 9.2 devel) to dump 100k empty
tables and now it takes only less than 4 minutes!

$ time pg_dump test >/dev/null

real	3m56.412s
user	0m12.059s
sys	0m3.571s

Good job!

Now I applied rebased pg_dump patch.

real	4m1.779s
user	0m11.621s
sys	0m3.052s

Unfortunately I see no improvement. Probably my patch's value is for
dumping against older backend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Responses

pgsql-performance by date

Next:From: Siddharth ShahDate: 2012-06-14 15:15:58
Subject: High CPU Usage
Previous:From: Jeff JanesDate: 2012-06-12 15:33:12
Subject: Re: pg_dump and thousands of schemas

pgsql-hackers by date

Next:From: Tom LaneDate: 2012-06-13 01:52:56
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Previous:From: Craig RingerDate: 2012-06-13 01:18:58
Subject: Re: Minimising windows installer password confusion

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group