Re: Importing Large Amounts of Data

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Importing Large Amounts of Data
Date: 2002-04-15 14:26:50
Message-ID: 22794.1018880810@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Curt Sampson <cjs(at)cynic(dot)net> writes:
> On Mon, 15 Apr 2002, Christopher Kings-Lynne wrote:
>> CREATE TABLE WITHOUT OIDS ...

> As you can see from the schema I gave later in my message, that's
> exactly what I did. But does this actually avoid allocating the
> space in the on-disk tuples? What part of the code deals with this?
> It looks to me like the four bytes for the OID are still allocated
> in the tuple, but not used.

Curt is correct: WITHOUT OIDS does not save any storage. Having two
different formats for the on-disk tuple header seemed more pain than
the feature was worth. Also, because of alignment considerations it
would save no storage on machines where MAXALIGN is 8. (Possibly my
thinking is colored somewhat by the fact that that's so on all my
favorite platforms ;-).)

However, as for the NULL values bitmap: that's already compacted out
when not used, and always has been AFAIK.

>> It's a bit hard to say "just turn off all the things that ensure your data
>> integrity so it runs a bit faster", if you actually need data integrity.

> I'm not looking for "runs a bit faster;" five percent either way
> makes little difference to me. I'm looking for a five-fold performance
> increase.

You are not going to get it from this; where in the world did you get
the notion that data integrity costs that much? When the WAL stuff
was added in 7.1, we certainly did not see any five-fold slowdown.
If anything, testing seemed to indicate that WAL sped things up.
A lot would depend on your particular scenario of course.

Have you tried all the usual speedup hacks? Turn off fsync, if you
really think you do not care about crash integrity; use COPY FROM STDIN
to bulk-load data, not retail INSERTs; possibly drop and recreate
indexes rather than updating them piecemeal; etc. You should also
consider not declaring foreign keys, as the runtime checks for reference
validity are pretty expensive.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-04-15 14:32:10 Re: Inefficient handling of LO-restore + Patch
Previous Message Tom Lane 2002-04-15 14:15:06 Re: Importing Large Amounts of Data