Re: Post-mature optimization...

From: Jeff Frost <jeff(at)frostconsultingllc(dot)com>
To: Eric Walstad <eric(at)ericwalstad(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: Post-mature optimization...
Date: 2007-01-18 01:33:06
Message-ID: Pine.LNX.4.64.0701171732220.14094@discord.home.frostconsultingllc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

Eric,

What does the explain output look like for this select:

SELECT * FROM foo WHERE service_account NOT IN
(SELECT service_account FROM existing_data);

Also, how many service accounts are there in your real data?

On Wed, 17 Jan 2007, Eric Walstad wrote:

> Hey gang,
>
> Goal:
> I've a table with ~2M records. Very infrequently records are
> added/modified in this table. Very regularly I must load data from
> another source into this table (overwriting all existing records that
> are included in the new data).
>
> Problem:
> My current approach is taking way too long; in fact, I've not been able
> to complete the import job (it's been running for days now).
>
> Question:
> Do you have any suggestions for how to improve the performance of the job?
>
> Details:
> Below is some example SQL that describes what I'm trying to do and how
> I'm currently trying to do it. Assume that data in the
> 'service_account' field is unique and that the goal is to keep anything
> that's NOT in the import data ('new_data') and overwrite the rest with
> comparison based on the 'service_account' field only.
>
> Thanks in advance for any pointers and/or suggestions,
>
> Eric.
>
>
> -- Create the sample data
> DROP TABLE existing_data;
> CREATE TEMP TABLE existing_data (
> service_account CHARACTER VARYING(255) NOT NULL,
> status CHAR(1)
> );
> CREATE INDEX existing_data_service_account
> ON existing_data (service_account);
> INSERT INTO existing_data VALUES ('1', 'e');
> INSERT INTO existing_data VALUES ('2', 'e');
> INSERT INTO existing_data VALUES ('3', 'e');
> INSERT INTO existing_data VALUES ('4', 'e');
> INSERT INTO existing_data VALUES ('5', 'e');
>
> DROP TABLE new_data;
> CREATE TEMP TABLE new_data (
> service_account CHARACTER VARYING(255) NOT NULL,
> status CHAR(1)
> );
> INSERT INTO new_data VALUES ('2', 'n');
> INSERT INTO new_data VALUES ('3', 'n');
> INSERT INTO new_data VALUES ('4', 'n');
> INSERT INTO new_data VALUES ('6', 'n');
>
>
> -- Copy existing records to a temporary table, 'foo'
> CREATE TEMP TABLE foo AS SELECT * FROM existing_data;
> CREATE INDEX foo_service_account
> ON foo (service_account);
>
> -- delete existing records
> DELETE FROM existing_data;
>
> -- Here the current script deletes existing constraints
> -- and indexes and then VACUUM ANALYZE existing_data;
>
> -- import new data
> -- this is done with a COPY dest_table (<field list>)
> -- FROM stdin; in the existing script
> INSERT INTO existing_data
> SELECT * FROM new_data;
>
> -- All is good up to this point...
>
> -- copy existing data that are not in new data
> -- ***** This is where my psql script churns for days *****
> INSERT INTO existing_data
> SELECT * FROM foo
> WHERE service_account NOT IN
> (SELECT service_account FROM existing_data);
>
> -- Verify the process worked.
> -- The existing_data table should now have new data for
> -- service_accounts 2-4 and 6; it should have old data
> -- for service_accounts 1 and 5:
> SELECT * FROM existing_data ORDER BY service_account;
> -- service_account | status
> -- -----------------+--------
> -- 1 | e
> -- 2 | n
> -- 3 | n
> -- 4 | n
> -- 5 | e
> -- 6 | n
> -- (6 rows)
>
>
>

--
Jeff Frost, Owner <jeff(at)frostconsultingllc(dot)com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

In response to

Responses

Browse sfpug by date

  From Date Subject
Next Message Josh Berkus 2007-01-18 02:53:56 Re: Post-mature optimization...
Previous Message Eric Walstad 2007-01-18 01:23:52 Post-mature optimization...