Re: multi-worker pg_restore was: 8.3 / 8.2.6 restore comparison

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Luke Lonergan <llonergan(at)greenplum(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multi-worker pg_restore was: 8.3 / 8.2.6 restore comparison
Date: 2008-02-26 23:50:28
Message-ID: 20080226155028.6e82be92@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 26 Feb 2008 18:39:53 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> > However one observation that I am going to (try) to test is that we
> > are spending a lot of time waiting for the last thread to finish.
>
> IOW you haven't balanced the work given to each thread very well?
> Or is there something else happening?
>
> How exactly are you allocating tasks to threads in this prototype,
> anyway?
>

Right there is no balance here. Let me explain what I did. I performed
a pg_restore -l to get the TOC file. I then broke that file up into
five other files.

prefix = schema (minus indexes, constraints)
data = data
pk = primary keys
index = indexes
triggers_constraints = well triggers and cosntraints (foreign keys in
this instance)

The first step of the script loads prefix. It then splits the data
file into -n- number of files and launches -n- number of
pg_restore processes with -L.

It runs through all data, then starts on pk in the exact same manner
and then indxex etc...

Everything moves along each step very quickly until the last thread. So
I could burn through all of data in 60 minutes except for the last
three tables in the 24th file which will take 30 minutes on their own
(arbitrary numbers). While I am waiting on the last 3 tables, nothing
is happening. We are in a holding pattern for the 24 connection pk load
to start. What should happen is as each TABLE DATA line finishes the
appropriate CONSTRAINT (pk) and INDEX are also built.

The two single threaded processes although you could probably make them
multi threaded are prefix and triggers constraints.

Sincerely,

Joshua D. Drake

- --
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHxKXGATb/zqfZUUQRAjePAJ9xx6ea+Vo4J5T3CxLYRfKj2Cm1gQCeOjbU
DHcAzEsVpedyUnJjUuL7DI8=
=ynM0
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-02-26 23:59:16 Re: Required make version
Previous Message Andrew Dunstan 2008-02-26 23:50:16 Re: Required make version