Avoid long-running transactions in a long-running stored procedure?

From: "David Crane" <davidc(at)donorschoose(dot)org>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Avoid long-running transactions in a long-running stored procedure?
Date: 2008-02-15 01:15:21
Message-ID: 41ED0E73B2268F4D9E4081FAB5ED05FD04420FC7@midas.utopiasystems.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Once per quarter, we need to load a lot of data, which causes many
updates across the database. We have an online transaction
processing-style application, which we really want to stay up during the
update job.

The programmer coded a stored procedure which does the job well ...
logically. But as a single PL/pgSQL stored procedure, it is one
long-running transaction. At least, that is my interpretation of
http://www.postgresql.org/docs/8.0/interactive/plpgsql-porting.html#CO.P
LPGSQL-PORTING-COMMIT - and in fact, we do get errors when we try little
BEGIN-COMMIT blocks inside a stored procedure.

A single long-running transaction would be bad in production. A long
run time = OK, but long-running transaction = site outage.

So I'm asking for advice on whether I can break this into small
transactions without too much of a rewrite. Roughly, the algorithm is:

(1) One job dumps the data from the external source into a load table.

(2) Another job calls the stored procedure, which uses a cursor to
traverse the load table. A loop for each record:

a. Processes a lot of special cases, with inserts and/or updates to
many tables.

Unless this can be done within PL/pgSQL, I will have the programmer
refactor job (2) so that the loop is in a java program, and the
"normalization" logic in (a) - the guts of the loop - remain in a
smaller stored procedure. The java loop will call that stored procedure
once per row of the load table, each call in a separate transaction.
That would both preserve the bulk of the PL/pgSQL code and keep the
normalization logic close to the data. So the runtime will be
reasonable, probably somewhat longer than his single monolithic stored
procedure, but the transactions will be short.

We don't need anything like SERIALIZATION transaction isolation of the
online system from the entire load job.

Thanks for any ideas,

David Crane

DonorsChoose.org

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2008-02-15 01:29:18 Re: Avoid long-running transactions in a long-running stored procedure?
Previous Message Tom Lane 2008-02-15 00:02:32 Re: Query slows after offset of 100K