Re: Idea for improving speed of pg_restore

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Idea for improving speed of pg_restore
Date: 2003-09-16 02:55:28
Message-ID: m3wuc9saqn.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A long time ago, in a galaxy far, far away, ron(dot)l(dot)johnson(at)cox(dot)net (Ron Johnson) wrote:
> Pardon if this has already been suggested and shot down as impossible
> or too difficult, etc.

None of this is forcibly the problem.

The _big_ problem is that this isn't necessarily going to improve the
speed of the would-be restore.

I have brought servers to their knees by trying to restore multiple
databases at once.

The reason why this isn't likely to work is that the "multitasking"
that you need to maximize is not of the CPUs, but rather of the disk
drives.

Restoring a database involves, for each table:
1. Reading table data from the source file;
2. Writing data to the database file for the table;
3. After that, reading the database file data, and
4. Writing the sorted bits to the index file.
5. Along with all of this, HEFTY amounts of updates to WAL.

Ideal performance for this involves having 1. and 2. take place
concurrently, where a sequential set of reads from one disk lead to
more-or-less sequential writes to the "database drive" and to the "WAL
drive."

Multitask that and it is easy for the I/O patterns to go bad, where
one process is trying to sequence writes to one table whilst another
is writing to another, so you thereby have the disk seeking back and
forth between the one and the other.

There might well be some merit to having the capability to work on
several indices on the same table all at once, so that after reading
through the table once, _three_ indices are generated, mostly in
memory, and then consecutively spilled to disk. But it may be just as
well to let the shared memory cache do the work so that the table gets
read in once, to generate the first index, and then is already in
cache, which will surely help performance for generating the
subsequent indices.

That's a longwinded way of saying that doing multiple concurrent mass
data loads sucks pretty bad.
--
"aa454","@","freenet.carleton.ca"
http://www3.sympatico.ca/cbbrowne/spiritual.html
"If it can't be abused, it's not freedom. A man may be in as just
possession of truth as of a city, and yet be forced to surrender."
-- Thomas Browne

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Benoît Costes 2003-09-16 07:35:04 Question.
Previous Message Christopher Browne 2003-09-16 02:23:01 Re: Odd behaviour -- Index scan vs. seq. scan