configuring queries for concurrent updates

From: Robert Poor <rdpoor(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: configuring queries for concurrent updates
Date: 2012-06-23 06:33:02
Message-ID: CAGHqdqWJdw0cG6+-HjpX+v2wE4-4+udm5aCMUMFS5rSaGpQ1uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

[std_disclaimer]I'm not a DBA and I'm running PostgreSQL on a
quad-core Intel laptop.  You may read on after you stop
laughing...[/std_disclaimer]

I've written a version of UPSERT designed to import a large number of
records. It works in two steps: it UPDATEs incumbent records that
share common keys with the new records, then INSERTs new records that
do not share common keys with the incumbent records. The gist of it
is:

SAVEPOINT ...;
-- update records from subquery that share common keys with tbl
UPDATE tbl
  SET f1 = X.f1, f2 = X.f2, ...
  FROM (subquery) AS X
 WHERE tbl.k1 = X.k1 AND tbl.k2 = X.k2 AND ...;
-- insert records from subquery that do NOT share common keys with tbl
    INSERT INTO tbl (f1, f2, ...)
         SELECT Y.f1, Y.f2, ...
           FROM (subquery) AS Y
LEFT OUTER JOIN tbl
             ON tbl.k1 = Y.k1 AND tbl.k2 = Y.k2 AND ...
          WHERE tbl.id IS NULL;
RELEASE SAVEPOINT ...;

The (subquery) frequently generates 5000 records for update/insert.
Three Unix processes are running the import process concurrently, all
updating/inserting records into a common table.

Most of the time this works, but I've observed examples where an
import process will
* get a "PG::Error: ERROR: current transaction is aborted, commands
ignored until end of transaction block"
* get a "PG::Error: ERROR: deadlock detected"
* go catatonic for several minutes

This leads me to believe that my SAVEPOINT / RELEASE SAVEPOINT is not
the correct way to protect the system from multiprocessing mischief.
I've read Tom Lane's presentation on concurrency
(http://www.postgresql.org/files/developer/concurrency.pdf), but I
haven't been able to figure out the right solution.

[For what it's worth, with the data I'm importing, INSERTs will far
outnumber the UPDATES.]

Since I someday plan to spin this code up on a big system with >> 3
import processes, I'd like to get this right.

Hints and solutions warmly accepted. Thanks.

- rdp

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jasen Betts 2012-06-23 08:51:33 Re: efficiency of wildcards at both ends
Previous Message Jeff Davis 2012-06-23 00:11:53 Re: retrieving function raise messages in ecpg embedded sql code