Re: Performance Tuning Large PL/PGSQL Stored Procedure

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance Tuning Large PL/PGSQL Stored Procedure
Date: 2010-03-26 12:06:49
Message-ID: b42b73151003260506m1393f7d1t5a2f3b0d1a7c0dc3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Mar 25, 2010 at 11:56 PM, Eliot Gable
<egable+pgsql-performance(at)gmail(dot)com> wrote:
>>
>> How many rows min/max/avg are coming back in your refcursors?  Are you
>> using cursors in order to return multiple complex data structures
>> (sets, etc) in a single function call?
>>
>
> I think the largest number of rows is around 40. Most are substantially
> smaller. However, most of them have about two dozen or more columns, and I
> have already shortened the list of columns to the minimum possible. The
> average number of rows is around 10, but the largest sets of rows also have
> the most columns. I'm using the cursors in order to obtain multiple complex
> data structures in a single function call.

ok, small sets. yes, passing them back to the client as arrays is
probably going to be faster. It's a trivial change to your proc. you
have to define a type for your array element the way we are going to
use it. you can use a composite type or a table (I prefer a table).

create table mystuff_t
(
a text,
b int,
c timestamptz
);

create function myproc([...], mystuffs out mystuff_t[])
[inside proc]

replace your cursor declaration with this:

select array
(
select (a,b,c)::mystuff_t from [...]
) into mystuffs;

code an alternate version of the function and then inside libpq
execute the query in binary and discard the results, timing the
results and comparing to how you run your query now also discarding
the results. we want to time it this way because from timing it from
psql includes the time to print out the array in text format which we
can avoid with libpqtypes (which we are not going to mess with, until
we know there is a resaon to go in this direction). We do need to
include the time to turn around and fetch the data from the
refcursors. If you see at least a 10-20% improvement, it warrants
further effort IMO (and say goodbye to refcursors forever).

>> WITH clauses can make your queries much easier to read and yield great
>> speedups if you need to access the table expression multiple times
>> from other parts of the query.  however, in some cases you can get
>> into trouble because a standard set of joins is going to give the
>> planner the most flexibility in terms of query optimization.
>>
>
> So far, every case I have converted to WITH clauses has resulted in more
> than double the speed (half the time required to perform the query). The
> main reason appears to be from avoiding calculating JOIN conditions multiple
> times in different parts of the query due to the UNION and EXCEPT clauses.

I have a hard time believing that unless there are other factors
compromising the planner like bad statistics or a non optimal query or
you are dealing with a relatively special case.

'EXCEPT' btw is also an optimization target. maybe think about
converting to 'letf join where rightcol is null' or something like
that. not 100% sure, I think some work was done recently on except so
this advice may not be as true as it used to be, and possibly moot if
the number of rows being considered by except is very small.

> So, you are saying that I can return a complex type as a result which
> contains arrays of other complex types and just use my single SELECT command
> to retrieve the whole data set? That would be much simpler and I imagine
> must faster.

yes, however you will want to receive as few complex types as
possible, meaning your result set should still have multiple columns.
reducing the number of columns is not an optimization target. in
other words, do the minimal amount of stacking necessary to allow
single query extraction of data.

> I really am chasing milliseconds here, and I appreciate all your feedback.
> You've given me a relatively large number of possible optimizations I can
> try out. I will definitely try out the libpqtypes. That sounds like a
> promising way to further cut down on execution time. I think most of my
> performance penalty is in transfering the results back to the C++
> application.

yes. I've suggested libpqtypes to a number of people on the lists,
and you are what i'd consider the ideal candidate. libpqtypes will
completely transform the way you think about postgresql and libpq.
good luck. if you need help setting it up you can email me privately
or on the libpqtypes list.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2010-03-26 12:18:25 Re: Performance Tuning Large PL/PGSQL Stored Procedure
Previous Message Eliot Gable 2010-03-26 03:56:35 Re: Performance Tuning Large PL/PGSQL Stored Procedure