Large import via temporary table: all at once, or split into chunks?

From: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
To: pgsql-general(at)postgresql(dot)org, pgsql-jdbc(at)postgresql(dot)org
Subject: Large import via temporary table: all at once, or split into chunks?
Date: 2002-12-10 16:38:52
Message-ID: 5.1.0.14.2.20021210112440.032f3ad8@pop.pexicom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello all,

(We're using JDBC to do this, and server 7.2.1-2woody2 on Debian 3.0)

We regularly handle large imports of data. (Questions at the end.)

The data goes into a table with these salient features:
* About 100 million rows at this time
* About 25 columns: mostly VARCHAR, two TIMESTAMP (one DEFAULT now()), some
INTEGER, some boolean
* One index on a SERIAL column (the primary key)
* One index on two int columns (a secondary key and the primary key,
combined index)
* One index on a VARCHAR column
* WITHOUT OIDs

The way we currently import the new records is:

1) Create a temporary table - WITHOUT OIDS and without any indices

2) INSERT INTO temptable VALUES (...); for each piece of data, and usually
500 of these in one JDBC statement, for efficiency

3) INSERT INTO finaltable (secondaryKey, columns...) SELECT ###, * FROM
temptable; - where ### is the salient secondary key from above, and the
serial column is populated by the DEFAULT

4) Drop the temporary table

With 7.2 JDBC, we can't use the COPY functionality to speed it up.

My questions are these: When doing the above process for, say, a million to
ten million rows (our typical size), how expensive is step #3? It takes
four hours to complete on a 1 million row table, and I have no idea how
badly it is harming concurrency, since it's moving so much data in one
transaction. Would I be better off importing some lesser number of rows to
the temporary table in step 2 (how many?) and then doing step 3, then
truncating the temporary table and starting over with step 2 again? How do
these choices affect the sizing of the transaction log parameters?

Thanks,

Doug

Browse pgsql-general by date

  From Date Subject
Next Message Greg Copeland 2002-12-10 16:56:32 Re: 7.4 Wishlist
Previous Message Hegyvari Krisztian 2002-12-10 16:12:37 Re: ExecEvalExpr: unknown expression type 108