Skip site navigation (1) Skip section navigation (2)

Re: Performace Optimization for Dummies

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performace Optimization for Dummies
Date: 2006-09-28 20:15:03
Message-ID: efhag4$1n9d$1@news.hub.org (view raw or flat)
Thread:
Lists: pgsql-performance
The deduplication process requires so many programmed procedures that it 
runs on the client. Most of the de-dupe lookups are not "straight" lookups, 
but calculated ones emplying fuzzy logic. This is because we cannot dictate 
the format of our input data and must deduplicate with what we get.

This was one of the reasons why I went with PostgreSQL in the first place, 
because of the server-side programming options. However, I saw incredible 
performance hits when running processes on the server and I partially 
abandoned the idea (some custom-buiilt name-comparison functions still run 
on the server).

I am using Tcl on both the server and the client. I'm not a fan of Tcl, but 
it appears to be quite well implemented and feature-rich in PostgreSQL. I 
find PL/pgsql awkward - even compared to Tcl. (After all, I'm just a 
programmer...  we do tend to be a little limited.)

The import program actually runs on the server box as a db client and 
involves about 3000 lines of code (and it will certainly grow steadily as we 
add compatability with more import formats). Could a process involving that 
much logic run on the db server, and would there really be a benefit?

Carlo


""Jim C. Nasby"" <jim(at)nasby(dot)net> wrote in message 
news:20060928184538(dot)GV34238(at)nasby(dot)net(dot)(dot)(dot)
> On Thu, Sep 28, 2006 at 01:53:22PM -0400, Carlo Stonebanks wrote:
>> > are you using the 'copy' interface?
>>
>> Straightforward inserts - the import data has to transformed, normalised 
>> and
>> de-duped by the import program. I imagine the copy interface is for more
>> straightforward data importing. These are - buy necessity - single row
>> inserts.
>
> BTW, stuff like de-duping is something you really want the database -
> not an external program - to be doing. Think about loading the data into
> a temporary table and then working on it from there.
> -- 
> Jim Nasby                                            jim(at)nasby(dot)net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 



In response to

Responses

pgsql-performance by date

Next:From: Andrew SullivanDate: 2006-09-28 20:17:10
Subject: Re: slow queue-like empty table
Previous:From: Merlin MoncureDate: 2006-09-28 20:06:56
Subject: Re: Performace Optimization for Dummies

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group