From: | "Claus Guttesen" <kometen(at)gmail(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | seemingly slow for-loop in plpgsql |
Date: | 2008-09-02 22:27:38 |
Message-ID: | b41c75520809021527g593177e1sa3c5716c0199a44c@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi.
I have a table with a hashcode-field which is a md5-checksum of a
file. I updated all null-entries using a rails-script by calling
'/sbin/md5 -q' (on FreeBSD). When all null-entries were updated I
found out that '\n' was added to the md5-checksum. :-)
So I wanted to update the table using plpgsql. As I understand it from
the docs (http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html)
in section 38.6.1.2. RETURN NEXT and RETURN QUERY that ' ... if a
PL/pgSQL function produces a very large result set, performance might
be poor: ...'.
select count(*) from duplicates;
count
----------
134673
select count(*) from duplicates where length(hashcode) = 33;
count
--------
31731
\d duplicates
Table "public.duplicates"
Column | Type | Modifiers
--------------+----------+---------------------------------------------------------
id | integer | not null default
nextval('duplicates_id_seq'::regclass)
uid | integer |
filename | text |
hashcode | text |
Indexes:
"duplicates_hashcode_idx" btree (hashcode)
"duplicates_uid_idx" btree (uid)
create or replace function update_hashcode() returns setof duplicates as
$body$
declare
d duplicates%rowtype;
h text;
begin
for d in select * from duplicates where length(hashcode) = 33 loop
h := rtrim(d.hashcode, E'\n');
update duplicates set hashcode = h where id = d.id;
return next d;
end loop;
end
$body$
language 'plpgsql' ;
select count(*) from update_hashcode();
Postgres is 8.3.3 on FreeBSD current on a test-server with an opteron
at 2 GHz and 4 GB ram. The server is not the fastest around but I have
another table with 85 mill. entries where 12 mill. have '\n' as part
of the hashcode. The prod.server is a HP DL360 with a p800-controller
so it's much faster but the script will still be too slow to make this
solution viable.
How can I tune the plpgsql-script? Using cursors? I tried with a
cursor-based script and ended up with this skeleton-script:
create or replace function update_hashcode(refcursor) returns refcursor as '
declare
d duplicates%rowtype;
h text;
begin
open $1 for select * from duplicates;
return $1;
end;
' language plpgsql;
begin;
select update_hashcode('funccursor');
fetch next in funccursor;
commit;
which fetches the next row. But how can I iterate over the rows using cursors?
--
regards
Claus
When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.
Shakespeare
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-09-02 22:47:29 | Re: seemingly slow for-loop in plpgsql |
Previous Message | pw | 2008-09-02 21:14:47 | How do I get min and max from an array of floating point values |