Re: 8.3 / 8.2.6 restore comparison

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Luke Lonergan <llonergan(at)greenplum(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 8.3 / 8.2.6 restore comparison
Date: 2008-02-24 17:47:15
Message-ID: 47C1ADA3.60104@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> At some point, I think we have to bite the bullet and find a way to use
>> multiple CPUs for a single load. I don't have any good ideas or plans
>> for that, but hopefully someone does.
>
> As already mentioned upthread, we could do that today, with zero backend
> changes, by making pg_restore drive multiple sessions. Now there are
> scenarios where this wouldn't help too much --- eg, a database with only
> one enormous table. We couldn't parallelize the loading of that table,
> although we could parallelize creation of its indexes. But for an
> example such as JD is complaining about, we ought to have no trouble
> thrashing his disks into the ground ;-)

Bring it on ! but I would note that with the current pg_restore I was
not able to trash my disks. I only used four backends (I could have used
8) but even with that, I was only doing ~ 45M a second. So if I double,
I still have bandwidth. It would still be an huge improvement though.

>
> What you would need for this is an -Fc or -Ft dump, because a plain
> script output from pg_dump doesn't carry any dependency information,

I would focus on -Fc. With the limitations of -Ft this would be a good
way to start phasing -Ft out.

> much less any index of where in the file different bits of data are.
> Just armwaving, I envision a multiprocess pg_restore like this:
>
> * one controller process that tracks the not-yet-loaded
> TOC items in the dump
> * N child processes that just transmit a selected TOC item
> to a connected backend, and then ask the controller
> what to do next

> Most likely, the bottleneck with this sort of thing would be multiple
> parallel reads from the pg_dump archive file. Possibly the controller
> process could be taught to schedule COPY and CREATE INDEX operations
> so that not too many processes are trying to read lots of archive
> data at the same time.

A less hacker and more DBA bottleneck will be to limit the number of
backends being created for restore. We don't really want to have more
than one backend per CPU, otherwise we just start switching.

Sincerely,

Joshua D. Drake

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-02-24 17:57:30 Re: Behaviour of rows containg not-null domains in plpgsql
Previous Message Joshua D. Drake 2008-02-24 17:40:11 Re: 8.3 / 8.2.6 restore comparison