Re: Performace Optimization for Dummies

From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performace Optimization for Dummies
Date: 2006-09-28 21:04:21
Message-ID: 6B702C4D-C102-44E3-841C-BF176EAF1CAB@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Sep 28, 2006, at 12:10 PM, Carlo Stonebanks wrote:

>
>> Are you wrapping all this in a transaction?
>
> Yes, the transactions can typically wrap 1 to 10 single-table,
> single-row
> inserts and updates.
>
>
>> You're doing some dynamically generated selects as part of the
>> "de-duping" process? They're probably the expensive bit. What
>> do those queries tend to look like?
>
> Without a doubt, this is the expensive bit.

If you could give some samples of those queries here I suspect
people could be a lot more helpful with some optimisations, or
at least pinpoint where the performance issues are likely to be.

>
>> Are you analysing the table periodically? If not, then you might
>> have statistics based on an empty table, or default statistics, which
>> might cause the planner to choose bad plans for those selects.
>
> Now there's something I didn't know - I thought that analysis and
> planning
> was done with every select, and the performance benefit of prepared
> statements was to plan-once, execute many. I can easily put in a
> periodic
> analyse statement. I obviously missed how to use analyze properluy, I
> thought it was just human-readable output - do I understand
> correctly, that
> it can be used to get the SQL server to revaluate its plan based on
> newer
> statistics - even on non-prepared queries?

I think you're confusing "explain" and "analyze". "Explain" gives you
human readable output as to what the planner decided to do with the
query you give it.

"Analyze" samples the data in tables and stores the statistical
distribution
of the data, and estimates of table size and that sort of thing for the
planner to use to decide on a good query plan. You need to run
analyze when the statistics or size of a table has changed
significantly,
so as to give the planner the best chance of choosing an appropriate
plan.

If you're not running analyze occasionally then the planner will be
working on default stats or empty table stats and will tend to avoid
indexes. I don't know whether autovacuum will also analyze tables
for you automagically, but it would be a good idea to analyze the table
every so often, especially early on in the load - as the stats
gathered for
a small table will likely give painful performance once the table has
grown a lot.

>
>> Talking of which, are there indexes on the table? Normally you
>> wouldn't have indexes in place during a bulk import, but if you're
>> doing selects as part of the data load process then you'd be forcing
>> sequential scans for every query, which would explain why it gets
>> slower as the table gets bigger.
>
> There are indexes for every obvious "where this = that" clauses. I
> don't
> believe that they will work for ilike expressions.

If you're doing a lot of "where foo ilike 'bar%'" queries, with the
pattern
anchored to the left you might want to look at using a functional index
on lower(foo) and rewriting the query to look like "where lower(foo)
like
lower('bar%')".

Similarly if you have many queries where the pattern is anchored
at the right of the string then a functional index on the reverse of the
string can be useful.

Cheers,
Steve

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Carlo Stonebanks 2006-09-28 21:04:31 Re: Performace Optimization for Dummies
Previous Message Merlin Moncure 2006-09-28 20:55:57 Re: Performace Optimization for Dummies