Replacing Cursors with Temporary Tables

From: Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Replacing Cursors with Temporary Tables
Date: 2010-04-21 20:16:18
Message-ID: n2ybf6923ed1004211316ib844d82bn7a63bd3292ca80a0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have previously discussed my very long PL/PGSQL stored procedure on this
list. However, without getting into too many details, I have another
performance-related question.

The procedure currently uses cursors to return multiple result sets to the
program executing the procedure. Basically, I do this:

BEGIN;
SELECT * FROM stored_proc();
FETCH ALL FROM cursor1;
FETCH ALL FROM cursor2;
FETCH ALL FROM cursor3;
etc.
COMMIT;

However, there are some cases in the stored procedure where some of the
result sets returned by these cursors are also needed as inputs to
additional queries. To use them, I am currently doing:

FOR temp IN cursorX LOOP
-- Some code that pushes the current temp record onto the end of an array
END LOOP;
OPEN cursorX;
MOVE FIRST FROM cursorX;

Then, when I need to use the results in a query, I do something like:

SELECT * FROM table1 INNER JOIN (SELECT * FROM unnest(result_array)) AS
table2 ON ( blah blah ) WHERE blah

This seems extremely inefficient to me. First, I'm not sure of the penalty
for unnesting an array into a SET OF object. Second, the set of records
returned from unnesting would not be indexed for the join which means a
sequential scan. Third, building the array in the first place using
array_append seems extremely inefficient. Fourth, opening the cursor twice
seems like it would execute the query twice, though given the placement and
context, it's probably got it cached somewhere (hopefully). I'm sure there
are probably other things I am overlooking.

Instead of doing things this way, I think using temporary tables is really
what I want. I am thinking that instead of doing this cursor BS, I can do
something like:

CREATE TEMPORARY TABLE table2 WITH (OIDS=FALSE) ON COMMIT DROP AS (
SELECT * FROM blah blah blah -- whatever the cursor is defined as doing
);
ALTER TABLE table2 ADD PRIMARY KEY (id);
CREATE INDEX table2_blah_idx ON table2 USING btree (blah);
ANALYZE table2;

Then, when I need to use the results in another query, I could do:

SELECT * FROM table1 INNER JOIN table2 ON ( blah blah ) WHERE blah

This would use the indexes and the primary key appropriately. I could also
ensure that the order of the information in the temporary table is such that
it facilitates any joining, where clauses, or order by clauses on the
additional queries. Finally, to get results into my application, I would
then do:

BEGIN;
SELECT * FROM stored_procedure();
SELECT * FROM temp_table1;
SELECT * FROM temp_table2;
SELECT * FROM temp_table3;
etc
COMMIT;

However, this is a fairly major re-write of how things are done. Before I
spend the time to do all that re-writing, can anyone share some insight on
where / how I might expect to gain performance from this conversion and also
speak to some of the overhead (if any) in using temporary tables like this
(building them, creating indexes on them, analyzing them, then dropping them
on commit)? It is worth mentioning that the data being stored in these
temporary tables is probably <1,000 records for all tables involved. Most
will probably be <100 records. Some of these temporary tables will be joined
to other tables up to 4 more times throughout the rest of the stored
procedure. Most will be generated and then retrieved only from outside the
stored procedure. Obviously, I would not create indexes on or analyze the
temporary tables being retrieved only from outside the stored procedure.
Indexes and primary keys will only be created on the tables that are joined
to other tables and have WHERE conditions applied to them.

I have done a lot of Googling on temporary tables and cursors in PostgreSQL,
but I have found only very limited discussion as to performance differences
with respect to how I'm planning on using them, and I am unsure about the
quality of the information given that most of it is 4+ years out of date and
posted on various expert exchanges and not on this pgsql-performance list.

One final question:

In this conversion to temporary table use, there are a couple of cases where
I would prefer to do something like:

prepare blah(blah blah) as select blah;

Then, I want to call this prepared statement multiple times, passing a
different argument value each time. The only reason to do this would be to
save writing code and to ensure that updating the select statement in once
place covers all places where it is used. However, I am concerned it might
incur a performance hit by re-preparing the query since I assume that having
this inside the PL/PGSQL procedure means it is already prepared once. Can
anyone speak to this? I know that I could put it in a separate stored
procedure, but then the question becomes, does that add extra overhead? Or,
in different words, is it similar to the difference between an inlined
function and a non-inlined function in C?

I would greatly appreciate any insights to these questions/issues.

Thanks in advance for any assistance anyone can provide.

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our
children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Wong 2010-04-22 01:10:35 Re: [PERFORM] Dbt2 with postgres issues on CentOS-5.3‏
Previous Message Kris Jurka 2010-04-21 18:00:53 Re: SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set