Skip site navigation (1) Skip section navigation (2)

Re: pg_dump and thousands of schemas

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: jeff(dot)janes(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-06-12 08:54:25
Message-ID: 20120612.175425.2118167759512265051.t-ishii@sraoss.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>>>> management in the server.  What I fixed so far on the pg_dump side
>>>> should be enough to let partial dumps run at reasonable speed even if
>>>> the whole database contains many tables.  But if psql is taking
>>>> AccessShareLock on lots of tables, there's still a problem.
>>>
>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>> it only takes 3 seconds. Comments?
>>
>> Could you rebase this?  I tried doing it myself, but must have messed
>> it up because it got slower rather than faster.
> 
> OK, I found the problem.  In fixing a merge conflict, I had it execute
> the query every time it appended a table, rather than just at the end.
> 
> With my proposed patch in place, I find that for a full default dump
> your patch is slightly faster with < 300,000 tables, and slightly
> slower with > 300,000.  The differences are generally <2% in either
> direction.  When it comes to back-patching and partial dumps, I'm not
> really sure what to test.
> 
> For the record, there is still a quadratic performance on the server,
> albeit with a much smaller constant factor than the Reassign one.  It
> is in get_tabstat_entry.  I don't know if is worth working on that in
> isolation--if PG is going to try to accommodate 100s of thousands of
> table, there probably needs to be a more general way to limit the
> memory used by all aspects of the rel caches.

I would like to test your patch and w/without my patch. Could you
please give me the patches? Or do you have your own git repository?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Responses

pgsql-performance by date

Next:From: Shaun ThomasDate: 2012-06-12 14:54:28
Subject: Performance of pg_basebackup
Previous:From: Jeff DavisDate: 2012-06-11 23:53:57
Subject: Re: Performance of CLUSTER

pgsql-hackers by date

Next:From: Thom BrownDate: 2012-06-12 09:47:24
Subject: Re: pg_basebackup --xlog compatibility break
Previous:From: Simon RiggsDate: 2012-06-12 08:52:43
Subject: Re: Skip checkpoint on promoting from streaming replication

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group