Quick Links

configuring queries for concurrent updates

From:	Robert Poor <rdpoor(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	configuring queries for concurrent updates
Date:	2012-06-23 06:33:02
Message-ID:	CAGHqdqWJdw0cG6+-HjpX+v2wE4-4+udm5aCMUMFS5rSaGpQ1uw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

[std_disclaimer]I'm not a DBA and I'm running PostgreSQL on a
quad-core Intel laptop. You may read on after you stop
laughing...[/std_disclaimer]

I've written a version of UPSERT designed to import a large number of
records. It works in two steps: it UPDATEs incumbent records that
share common keys with the new records, then INSERTs new records that
do not share common keys with the incumbent records. The gist of it
is:

SAVEPOINT ...;
-- update records from subquery that share common keys with tbl
UPDATE tbl
SET f1 = X.f1, f2 = X.f2, ...
FROM (subquery) AS X
WHERE tbl.k1 = X.k1 AND tbl.k2 = X.k2 AND ...;
-- insert records from subquery that do NOT share common keys with tbl
INSERT INTO tbl (f1, f2, ...)
SELECT Y.f1, Y.f2, ...
FROM (subquery) AS Y
LEFT OUTER JOIN tbl
ON tbl.k1 = Y.k1 AND tbl.k2 = Y.k2 AND ...
WHERE tbl.id IS NULL;
RELEASE SAVEPOINT ...;

The (subquery) frequently generates 5000 records for update/insert.
Three Unix processes are running the import process concurrently, all
updating/inserting records into a common table.

Most of the time this works, but I've observed examples where an
import process will
* get a "PG::Error: ERROR: current transaction is aborted, commands
ignored until end of transaction block"
* get a "PG::Error: ERROR: deadlock detected"
* go catatonic for several minutes

This leads me to believe that my SAVEPOINT / RELEASE SAVEPOINT is not
the correct way to protect the system from multiprocessing mischief.
I've read Tom Lane's presentation on concurrency
(http://www.postgresql.org/files/developer/concurrency.pdf) but I
haven't been able to figure out the right solution.

[For what it's worth, with the data I'm importing, INSERTs will far
outnumber the UPDATES.]

Since I someday plan to spin this code up on a big system with >> 3
import processes, I'd like to get this right.

Hints and solutions warmly accepted. Thanks.

- rdp

Responses

Re: configuring queries for concurrent updates at 2012-06-24 07:06:44 from Craig Ringer

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jasen Betts	2012-06-23 08:51:33	Re: efficiency of wildcards at both ends
Previous Message	Jeff Davis	2012-06-23 00:11:53	Re: retrieving function raise messages in ecpg embedded sql code