diff between 6.5.3 and 7.0.2 with SELECT/UPDATE

From: Jim Mercer <jim(at)reptiles(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: diff between 6.5.3 and 7.0.2 with SELECT/UPDATE
Date: 2000-06-23 14:03:39
Message-ID: 20000623100338.F24735@reptiles.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


i have some code, running production, on 6.5.3.

i have one process, which does a SELECT within a transaction, using a cursor.

this process then passes the read records, through stdout, and a pipeline,
into other processes which examine and deal with the data.

currently, i have:

$ read-rec | adjust-rec | summarize-rec

read-rec does a:
BEGIN WORK;
DECLARE records CURSOR FOR SELECT ... FROM table1 WHERE ...;
loop
FETCH FORWARD n IN records
print to stdout
END WORK;

adjust-rec does a:
BEGIN WORK;
loop
read stdin
verify some values
if changed
UPDATE table1 SET f1 = value .... WHERE key = value;
print to stdout
END WORK;

summarize-rec does a:
loop
read stdin
tabulate various columns
BEGIN WORK;
INSERT summary values INTO table2;
END WORK;

the reason i do this as a pipeline, is that there are various procedures
i need to do on that set of data that read-rec reads.

the pipeline allows me to add processing, without changing any of the existing
code. i simply add another element to the pipeline, or remove it.
(i can also test the pipeline elements by saving the output of read-rec
to a file and doing "newmodule < file.dat", the data sets i'm dealing with
here have millions of records, and as such, a full test can take some time
and a simple test using redirected data may not even need to touch the
database.)

in any case, that code is working wonderfully on a FreeBSD machine running
6.5.3.

i built a new FreeBSD machine and put 7.0.2 on it.

when i run the same pipeline on the 7.0.2 machine, i am finding that on
the first update performed by adjust-rec, the backend for that query
jumps to 90% CPU and stays there. also, setproctitle of that process shows
that it is doing "UPDATE".

now, in a sample run, the read-rec process will read 400,000 records.
in that batch, there are a known 100 records, randomly dispersed, that will
need adjustment.

the backend of the adjust-rec process will have a proctitle of idle, until
the first update, then the read-rec process gets bogged down and the whole
pipeline slows down.

i've tried a couple things like changing from TRANSACTION ISOLATION from
READ COMMITTED to SERIALIZED for one and for all elements of the pipeline.

i'm sure there is something i'm missing here, probably to do with something
in the locking or concurrency changes between 6.5.3 and 7.0.2.

in short, if you have one process doing a long select on a table, and a second
process jumps in and updates a record of that table, why does it slow things
down so much?

--
[ Jim Mercer jim(at)reptiles(dot)org +1 416 410-5633 ]
[ Reptilian Research -- Longer Life through Colder Blood ]
[ Don't be fooled by cheap Finnish imitations; BSD is the One True Code. ]

Browse pgsql-general by date

  From Date Subject
Next Message kaiq 2000-06-23 14:20:41 app-index in www.pgsql.com
Previous Message Joseph 2000-06-23 09:56:55 RE: Postgres with php3