Re: Performace Optimization for Dummies

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performace Optimization for Dummies
Date: 2006-09-28 19:47:15
Message-ID: efh8s6$1dlc$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> So your program first transforms the data and then inserts it? And it is
> the transforming process which is running select statements that is slow?

There are cross-referencing and deduplication processes. Does this person
have an office at this exact address? In a similarily named building in the
same zip code? City? What is the similarity of the building or enterprise
names? Is there a person with a similar name with the same type of
professional license nearby? We basically look for the statistical
likelyhood that they already exist to decide whether to update their data,
or insert a new data element.

These are all extremely soft queries and require left outer joins with all
of the related tables that would contain this data (the left outer join
tells us whether the related element satisfied the search condition). As I
mentioned, as the data comes in, we examine what we have to work with and
modify the tables and columns we can check - which is what I meant by " the
SQL queries are mutable, every imported row can change the structure of a
SQL query as the program adds and subtracts search conditions to the SQL
command text before execution."

> If that is the case you could use duration logging to find the slow select
> statement, and then you could post an EXPLAIN ANALYZE of the select.

I'm pretty sure I know who the culprit is, and - like I said, it comes from
a section of code that creates a mutable statement. However, everyone is
being so helpful and I should post this data as soon as I can.

> One question off the top of my head is are you using regular expressions
> for
> your fuzzy logic if so do your indexes have the right operator classes?
> (see http://www.postgresql.org/docs/8.1/static/indexes-opclass.html)

I am using regular expressions and fuzzy logic, but mostly on the client
side (I have a Tcl implementation of levenshtein, for example). I don't
think you can use indexes on functions such as levenshtein, because it
requires a parameter only available at execution time. The link you sent me
was very interesting - I will definitely reconsider my query logic if I can
optimise regular expression searches on the server. Thanks!

Carlo

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2006-09-28 20:06:56 Re: Performace Optimization for Dummies
Previous Message Cedric Boudin 2006-09-28 19:41:38 archive wal's failure and load increase.