Re: Replacing Cursors with Temporary Tables

From: Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Replacing Cursors with Temporary Tables
Date: 2010-04-23 20:42:57
Message-ID: n2qbf6923ed1004231342l61f766fax8fb2a482aa3b0faf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

To answer the question of whether calling a stored procedure adds any
significant overhead, I built a test case and the short answer is that it
seems that it does:

CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS
$BODY$
DECLARE
temp INTEGER;
BEGIN
FOR i IN 1..1000 LOOP
SELECT 1 AS id INTO temp;
END LOOP;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS
$BODY$
DECLARE
BEGIN
RETURN QUERY SELECT 1 AS id;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS
$BODY$
DECLARE
temp INTEGER;
BEGIN
FOR i IN 1..1000 LOOP
temp := Test2A();
END LOOP;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

EXPLAIN ANALYZE SELECT * FROM Test1();
"Function Scan on test1 (cost=0.00..0.26 rows=1 width=4) (actual
time=6.568..6.569 rows=1 loops=1)"
"Total runtime: 6.585 ms"

EXPLAIN ANALYZE SELECT * FROM Test2B();
"Function Scan on test2b (cost=0.00..0.26 rows=1 width=4) (actual
time=29.006..29.007 rows=1 loops=1)"
"Total runtime: 29.020 ms"

So, when chasing milliseconds, don't call sub functions if it can
realistically and easily be avoided. I only have one operation/algorithm
broken out into another stored procedure because I call it in about 8
different places and it is 900+ lines long. While everything else could be
broken out into different stored procedures to make it easier to analyze the
whole set of code and probably make it easier to maintain, it does not make
sense from a performance perspective. Each different logical group of
actions that would be in its own stored procedure is only ever used once in
the whole algorithm, so there is no good code re-use going on. Further,
since the containing stored procedure gets called by itself hundreds or even
thousands of times per second on a production system, the nested calls to
individual sub-stored procedures would just add extra overhead for no real
gain. And, from these tests, it would be significant overhead.

On Thu, Apr 22, 2010 at 4:57 PM, Eliot Gable <
egable+pgsql-performance(at)gmail(dot)com <egable%2Bpgsql-performance(at)gmail(dot)com>>wrote:

> I appreciate all the comments.
>
> I will perform some benchmarking before doing the rewrite to be certain of
> how it will impact performance. At the very least, I think can say for
> near-certain now that the indexes are not going to help me given the
> particular queries I am dealing with and limited number of records the temp
> tables will have combined with the limited number of times I will re-use
> them.
>
>
> On Thu, Apr 22, 2010 at 10:42 AM, Merlin Moncure <mmoncure(at)gmail(dot)com>wrote:
>
>> On Thu, Apr 22, 2010 at 10:11 AM, Merlin Moncure <mmoncure(at)gmail(dot)com>
>> wrote:
>> > The timings are similar, but the array returning case:
>> > *) runs in a single statement. If this is executed from the client
>> > that means less round trips
>> > *) can be passed around as a variable between functions. temp table
>> > requires re-query
>> > *) make some things easier/cheap such as counting the array -- you get
>> > to call the basically free array_upper()
>> > *) makes some things harder. specifically dealing with arrays on the
>> > client is a pain UNLESS you expand the array w/unnest() or use
>> > libpqtypes
>> > *) can nest. you can trivially nest complicated sets w/arrays
>> > *) does not require explicit transaction mgmt
>>
>>
>> I neglected to mention perhaps the most important point about the array
>> method:
>> *) does not rely on any temporary resources.
>>
>> If you write a lot of plpsql, you will start to appreciate the
>> difference in execution time between planned and unplanned functions.
>> The first time you run a function in a database session, it has to be
>> parsed and planned. The planning time in particular for large-ish
>> functions that touch a lot of objects can exceed the execution time of
>> the function. Depending on _any_ temporary resources causes plan mgmt
>> issues because the database detects that a table in the old plan is
>> gone ('on commit drop') and has to re-plan. If your functions are
>> complex/long and you are counting milliseconds, then that alone should
>> be enough to dump any approach that depends on temp tables.
>>
>> merlin
>>
>
>
>
> --
> Eliot Gable
>
> "We do not inherit the Earth from our ancestors: we borrow it from our
> children." ~David Brower
>
> "I decided the words were too conservative for me. We're not borrowing from
> our children, we're stealing from them--and it's not even considered to be a
> crime." ~David Brower
>
> "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live;
> not live to eat.) ~Marcus Tullius Cicero
>

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our
children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2010-04-23 21:01:21 Re: Replacing Cursors with Temporary Tables
Previous Message Cédric Villemain 2010-04-23 19:22:08 Re: Optimization idea