Re: Further pg_upgrade analysis for many tables

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Further pg_upgrade analysis for many tables
Date: 2012-11-26 22:26:42
Message-ID: 20121126222642.GG23214@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 14, 2012 at 10:08:15AM -0500, Bruce Momjian wrote:
> > I agree that parallel restore for schemas is a hard problem. But I
> > didn't mean parallelism within the restore, I meant that we could
> > start both postmasters and pipe the output from dump directly to
> > restore. This way the times for dumping and restoring can overlap.
>
> Wow, that is a very creative idea. The current code doesn't do that,
> but this has the potential of doubling pg_upgrade's speed, without
> adding a lot of complexity. Here are the challenges of this approach:
>
> * I would need to log the output of pg_dumpall as it is passed to psql
> so users can debug problems
>
> * pg_upgrade never runs the old and new clusters at the same time for
> fear that it will run out of resources, e.g. shared memory, or if they
> are using the same port number. We can make this optional and force
> different port numbers.
>
> Let me work up a prototype in the next few days and see how it performs.
> Thanks for the great idea.

I have developed the attached proof-of-concept patch to test this idea.
Unfortunately, I got poor results:

---- pg_upgrade ----
dump restore dmp|res git dmp/res
1 0.12 0.07 0.13 11.16 13.03
1000 3.80 2.83 5.46 18.78 20.27
2000 5.39 5.65 13.99 26.78 28.54
4000 16.08 12.40 28.34 41.90 44.03
8000 32.77 25.70 57.97 78.61 80.09
16000 57.67 63.42 134.43 158.49 165.78
32000 131.84 176.27 302.85 380.11 389.48
64000 270.37 708.30 1004.39 1085.39 1094.70

The last two columns show the patch didn't help at all, and the third
column shows it is just executing the pg_dump, then the restore, not in
parallel, i.e. column 1 + column 2 ~= column 3.

Testing pg_dump for 4k tables (16 seconds) shows the first row is not
output by pg_dump until 15 seconds, meaning there can't be any
parallelism with a pipe. (Test script attached.) Does anyone know how
to get pg_dump to send some output earlier? In summary, it doesn't seem
pg_dump makes any attempt to output its data early. pg_dump.c has some
details:

/*
* And finally we can do the actual output.
*
* Note: for non-plain-text output formats, the output file is written
* inside CloseArchive(). This is, um, bizarre; but not worth changing
* right now.
*/
if (plainText)
RestoreArchive(fout);

CloseArchive(fout);

FYI, log_min_duration_statement shows queries taking 11.2 seconds, even
without the network overhead --- not sure how that can be optimized.

I will now test using PRIMARY KEY and custom dump format with pg_restore
--jobs to see if I can get parallelism that way.

A further parallelism would be to allow multiple database to be
dump/restored at the same time. I will test for that once this is done.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pipe.diff text/x-diff 9.3 KB
test_many_tables text/plain 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-11-26 22:42:21 Re: New statistics for WAL buffer dirty writes
Previous Message Jeff Davis 2012-11-26 21:57:09 Re: Removing PD_ALL_VISIBLE