Re: ETL optimization

From: Bricklen Anderson <BAnderson(at)PresiNET(dot)com>
To: Jacques Caron <jc(at)directinfos(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: ETL optimization
Date: 2005-06-23 22:04:33
Message-ID: 42BB31F1.7000303@PresiNET.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jacques Caron wrote:
>
> I have a similar situation, and the solution I use (though I haven't
> really tested many different situations):
> - have a trigger ON INSERT which does:
> UPDATE set whatever_value=NEW.whatever_value,... WHERE
> whatever_key=NEW.whatever.key AND...
> IF FOUND THEN
> RETURN NULL;
> ELSE
> RETURN NEW;
> END IF;
> - use COPY
>
> For optimal performance, a different trigger function is created for
> each table, which allows the query plan of the UPDATE to be cached.
>
> Let us know how that works out for you and if you find a better solution!
>
> Jacques.
>
Hi Jacques, thanks for the suggestion. I've previously tested triggers under a
variety of situations and there was no way that they would work under the load
we currently have, and the much greater load that we will be expecting soon
(~40x increase in data).

I'm in the process of testing the delete scenario right now, and at first blush
seems to perform fairly well. 2.5 million rows before aggregation, and 171000
after, in a little under 7 minutes.

Currently testing again with about 18.5 million rows. A drawback by using the
delete method is that we cannot do any of the aggregation incrementally, but so
far that hasn't been a big help anyways. I still need to test the performance of
concurrent querying against the destination table whilst the aggregation is
occurring.

--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dennis Bjorklund 2005-06-24 04:27:21 Re: ETL optimization
Previous Message Jacques Caron 2005-06-23 21:56:27 Re: ETL optimization