Re: Importing Large Amounts of Data

From: Curt Sampson <cjs(at)cynic(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Importing Large Amounts of Data
Date: 2002-04-16 01:32:14
Message-ID: Pine.NEB.4.43.0204161017210.498-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 15 Apr 2002, Tom Lane wrote:

> > I'm not looking for "runs a bit faster;" five percent either way
> > makes little difference to me. I'm looking for a five-fold performance
> > increase.
>
> You are not going to get it from this; where in the world did you get
> the notion that data integrity costs that much?

Um...the fact that MySQL imports the same data five times as fast? :-)

Note that this is *only* related to bulk-importing huge amounts of
data. Postgres seems a little bit slower than MySQL at building
the indicies afterwards, but this would be expected since (probably
due to higher tuple overhead) the size of the data once in postgres
is about 75% larger than in MySQL: 742 MB vs 420 MB. I've not done
any serious testing of query speed, but the bit of toying I've done
with it shows no major difference.

> Have you tried all the usual speedup hacks? Turn off fsync, if you
> really think you do not care about crash integrity; use COPY FROM STDIN
> to bulk-load data, not retail INSERTs; possibly drop and recreate
> indexes rather than updating them piecemeal; etc. You should also
> consider not declaring foreign keys, as the runtime checks for reference
> validity are pretty expensive.

Yes, I did all of the above. (This was all mentioned in my initial
message, except for turning off foreign key constraints--but the
table has no foreign keys.)

What I'm thinking would be really cool would be to have an "offline"
way of creating tables using a stand-alone program that would write
the files at, one hopes, near disk speed. Maybe it could work by
creating the tables in a detached tablespace, and then you'd attach
the tablespace when you're done. It might even be extended to be
able to do foreign key checks, create indicies, and so on. (Foreign
key checks would be useful; I'm not sure that creating indicies
would be any faster than just doing it after the tablespace is
attached.)

This would be particularly useful for fast restores of backups.
Downtime while doing a restore is always a huge pain for large
databases.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-04-16 01:37:12 Re: YADP - Yet another Dependency Patch
Previous Message Tatsuo Ishii 2002-04-16 01:20:33 multibyte support by default