Quick Links

Re: pg_dump and thousands of schemas

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-performance(at)postgresql(dot)org
Subject:	Re: pg_dump and thousands of schemas
Date:	2012-06-11 16:32:52
Message-ID:	CAMkU=1xK=VLzyEeqSVrOxpEuoD7A0tOBA2-BRjXBsCNPkJB5Dw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>>> management in the server. What I fixed so far on the pg_dump side
>>> should be enough to let partial dumps run at reasonable speed even if
>>> the whole database contains many tables. But if psql is taking
>>> AccessShareLock on lots of tables, there's still a problem.
>>
>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>> it only takes 3 seconds. Comments?
>
> Could you rebase this? I tried doing it myself, but must have messed
> it up because it got slower rather than faster.

OK, I found the problem. In fixing a merge conflict, I had it execute
the query every time it appended a table, rather than just at the end.

With my proposed patch in place, I find that for a full default dump
your patch is slightly faster with < 300,000 tables, and slightly
slower with > 300,000. The differences are generally <2% in either
direction. When it comes to back-patching and partial dumps, I'm not
really sure what to test.

For the record, there is still a quadratic performance on the server,
albeit with a much smaller constant factor than the Reassign one. It
is in get_tabstat_entry. I don't know if is worth working on that in
isolation--if PG is going to try to accommodate 100s of thousands of
table, there probably needs to be a more general way to limit the
memory used by all aspects of the rel caches.

Cheers,

Jeff

In response to

Re: pg_dump and thousands of schemas at 2012-06-10 23:47:41 from Jeff Janes

Responses

Re: pg_dump and thousands of schemas at 2012-06-12 08:54:25 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joshua Berkus	2012-06-11 17:02:01	Re: 9.2 final
Previous Message	Alvaro Herrera	2012-06-11 16:20:13	Re: [COMMITTERS] pgsql: Run pgindent on 9.2 source tree in preparation for first 9.3

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeff Davis	2012-06-11 23:53:57	Re: Performance of CLUSTER
Previous Message	Shaun Thomas	2012-06-11 14:44:23	Re: Performance of CLUSTER