Quick Links

Post-mature optimization...

From:	Eric Walstad <eric(at)ericwalstad(dot)com>
To:	sfpug(at)postgresql(dot)org
Subject:	Post-mature optimization...
Date:	2007-01-18 01:23:52
Message-ID:	45AECC28.4080000@ericwalstad.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	sfpug

Hey gang,

Goal:
I've a table with ~2M records. Very infrequently records are
added/modified in this table. Very regularly I must load data from
another source into this table (overwriting all existing records that
are included in the new data).

Problem:
My current approach is taking way too long; in fact, I've not been able
to complete the import job (it's been running for days now).

Question:
Do you have any suggestions for how to improve the performance of the job?

Details:
Below is some example SQL that describes what I'm trying to do and how
I'm currently trying to do it. Assume that data in the
'service_account' field is unique and that the goal is to keep anything
that's NOT in the import data ('new_data') and overwrite the rest with
comparison based on the 'service_account' field only.

Thanks in advance for any pointers and/or suggestions,

Eric.

-- Create the sample data
DROP TABLE existing_data;
CREATE TEMP TABLE existing_data (
service_account CHARACTER VARYING(255) NOT NULL,
status CHAR(1)
);
CREATE INDEX existing_data_service_account
ON existing_data (service_account);
INSERT INTO existing_data VALUES ('1', 'e');
INSERT INTO existing_data VALUES ('2', 'e');
INSERT INTO existing_data VALUES ('3', 'e');
INSERT INTO existing_data VALUES ('4', 'e');
INSERT INTO existing_data VALUES ('5', 'e');

DROP TABLE new_data;
CREATE TEMP TABLE new_data (
service_account CHARACTER VARYING(255) NOT NULL,
status CHAR(1)
);
INSERT INTO new_data VALUES ('2', 'n');
INSERT INTO new_data VALUES ('3', 'n');
INSERT INTO new_data VALUES ('4', 'n');
INSERT INTO new_data VALUES ('6', 'n');

-- Copy existing records to a temporary table, 'foo'
CREATE TEMP TABLE foo AS SELECT * FROM existing_data;
CREATE INDEX foo_service_account
ON foo (service_account);

-- delete existing records
DELETE FROM existing_data;

-- Here the current script deletes existing constraints
-- and indexes and then VACUUM ANALYZE existing_data;

-- import new data
-- this is done with a COPY dest_table (<field list>)
-- FROM stdin; in the existing script
INSERT INTO existing_data
SELECT * FROM new_data;

-- All is good up to this point...

-- copy existing data that are not in new data
-- ***** This is where my psql script churns for days *****
INSERT INTO existing_data
SELECT * FROM foo
WHERE service_account NOT IN
(SELECT service_account FROM existing_data);

-- Verify the process worked.
-- The existing_data table should now have new data for
-- service_accounts 2-4 and 6; it should have old data
-- for service_accounts 1 and 5:
SELECT * FROM existing_data ORDER BY service_account;
-- service_account | status
-- -----------------+--------
-- 1 | e
-- 2 | n
-- 3 | n
-- 4 | n
-- 5 | e
-- 6 | n
-- (6 rows)

Responses

Re: Post-mature optimization... at 2007-01-18 01:33:06 from Jeff Frost
Re: Post-mature optimization... at 2007-01-18 02:53:56 from Josh Berkus

Browse sfpug by date

	From	Date	Subject
Next Message	Jeff Frost	2007-01-18 01:33:06	Re: Post-mature optimization...
Previous Message	Eric Walstad	2007-01-18 00:16:37	Re: [sf-perl] Trustworthy laptop repair?