Performace Optimization for Dummies

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performace Optimization for Dummies
Date: 2006-09-28 16:44:10
Message-ID: efgu4v$276m$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I am a software developer who is acting in a (temporary) dba role for a
project. I had recommended PostgreSQL be brought in to replace the proposed
MySQL DB - I chose PostgreSQL because of its reputation as a more stable
solution than MySQL.

At this early stage in the project, we are initializing our portal's
database with millions of rows of imported data in over 50 different
flattened tables; each table's structure is unique to the data provider.
This requires a pretty complex import program, because the data must be
matched semantically, not literally. Even with all of the expression
matching and fuzzy logic in the code,our performance statistics show that
the program spends over 75% of its time in SQL queries looking for matching
and/or duplicate data.

The import is slow - and degrades as the tables grow. With even more
millions of rows in dozens of import tables to come, the imports will take
forever. My ability to analyse the queries is limited; because of the nature
of the import process, the SQL queries are mutable, every imported row can
change the structure of a SQL query as the program adds and subtracts search
conditions to the SQL command text before execution. The import program is
scripted in Tcl. An attempt to convert our queries to prepared queries
(curiousy) did not bring any performance improvements, and we converted back
to simplify the code.

We urgently need a major performance improvement. We are running the
PostgreSQL 8.1.4 on a Windows 2003 x64 Server on a dual processor, dual core
3.2Ghz Xeon box with 4gb RAM and a RAID (sorry, I don't know what type) disc
subsystem. Sorry about the long intro, but here are my questions:

1) Are we paying any big penalties by running Windows vs LINUX (or any other
OS)?

2) Has the debate over PostgreSQL and Xeon processors been settled? Is this
a factor?

3) Are there any easy-to-use performance analysis/optimisation tools that we
can use? I am dreaming of one that could point out problems and suggest
and.or effect solutions.

4) Can anyone recommend any commercial PostgreSQL service providers that may
be able to swiftly come in and assist us with our performance issues?

Below, please find what I believe are the configuration settings of interest
in our system

Any help and advice will be much appreciated. TIA,

Carlo

max_connections = 100
shared_buffers = 50000
work_mem = 32768
maintenance_work_mem = 32768
checkpoint_segments = 128
effective_cache_size = 10000
random_page_cost = 3
stats_start_collector = on
stats_command_string = on
stats_row_level = on
autovacuum = on

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2006-09-28 17:11:31 Re: Performace Optimization for Dummies
Previous Message Jim C. Nasby 2006-09-28 16:15:49 Re: Problems with inconsistant query performance.