> I've a table with ~2M records. Very infrequently records are
> added/modified in this table. Very regularly I must load data from
> another source into this table (overwriting all existing records that
> are included in the new data).
> My current approach is taking way too long; in fact, I've not been able
> to complete the import job (it's been running for days now).
Here's what I would do:
1. import data into holding table
2. index PK on holding table
3. insert into swap table all records from old table which are not in holding
4. insert into swap table all records from holding table
5. drop old table
6. rename holding table to old table; index.
I think the approach you're taking now is both more complex and more
labor-intensive. Other comments:
> INSERT INTO new_data VALUES ('2', 'n');
> INSERT INTO new_data VALUES ('3', 'n');
> INSERT INTO new_data VALUES ('4', 'n');
> INSERT INTO new_data VALUES ('6', 'n');
You're not really using INSERTs, are you?
> -- Copy existing records to a temporary table, 'foo'
> CREATE TEMP TABLE foo AS SELECT * FROM existing_data;
> CREATE INDEX foo_service_account
> ON foo (service_account);
> -- delete existing records
> DELETE FROM existing_data;
> -- copy existing data that are not in new data
> -- ***** This is where my psql script churns for days *****
> INSERT INTO existing_data
> SELECT * FROM foo
> WHERE service_account NOT IN
> (SELECT service_account FROM existing_data);
Again with the "foo". I think that in genericizing your example you've made
it a bit nonsensical.
PostgreSQL @ Sun
In response to
sfpug by date
|Next:||From: David Fetter||Date: 2007-01-18 04:39:24|
|Subject: Re: VoIP recommendations|
|Previous:||From: Jeff Frost||Date: 2007-01-18 01:33:06|
|Subject: Re: Post-mature optimization...|