Poor Performance for large queries in functions

From: John Meinel <john(at)johnmeinel(dot)com>
To: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Poor Performance for large queries in functions
Date: 2004-09-29 03:14:23
Message-ID: 415A288F.2030002@johnmeinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32 pgsql-performance

I've been using postgres 8.0 beta for win32 for quite a while now, and I
am very happy with it. However, I am having an odd problem. Basically, I
have a large query which is a bunch of UNION SELECTs from a bunch of
different tables. I have all the necessary columns indexed, and when I
run the query by hand, it runs very fast (20ms). However, if I try to
bundle this query up into a server side function, it runs very slow (10
seconds). I'm trying to figure out why, but since I can't run EXPLAIN
ANALYZE inside a function, I don't really know what else to do.

The layout of my database is a bunch of tables that all have an object
id associated with them. There is a main object table that defines per
object permissions, and then all of the tables refer to eachother by the
unique id. What I'm trying to do is get a list of objects that might
refer to a given id.

Here is the query. 48542 is just one of the object ids. Some of these
tables have 500,000 rows, but most are quite small, and the result is
only 3 rows.

SELECT * FROM object WHERE id in (
SELECT id FROM data_t WHERE project_id = 48542
UNION SELECT id FROM analyzeset_t
WHERE subject_id = 48542
OR img_id = 48542
OR hdr_id = 48542
UNION SELECT id FROM bdi_t WHERE dcmstudy_id = 48542
UNION SELECT id FROM crq_t WHERE dcmstudy_id = 48542
UNION SELECT id FROM dcmfile_t WHERE dcmseries_id = 48542
UNION SELECT id FROM dcmseries_t WHERE dcmstudy_id = 48542
UNION SELECT id FROM dcmstudy_t
WHERE dcmsub_id = 48542
OR consent_id = 48542
UNION SELECT id FROM hsq_t WHERE dcmstudy_id = 48542
UNION SELECT id FROM job_t WHERE claimed_id = 48542
UNION SELECT id FROM loc_t WHERE contact_id = 48542
UNION SELECT id FROM pathslide_t WHERE plane_id = 48542
UNION SELECT id FROM pft_t WHERE dcmstudy_id = 48542
UNION SELECT id FROM pftblood_t WHERE pft_id = 48542
UNION SELECT id FROM pftdata_t WHERE pft_id = 48542
UNION SELECT id FROM pftpred_t WHERE pft_id = 48542
UNION SELECT id FROM run_t WHERE job_id = 48542
UNION SELECT id FROM scanread_t
WHERE readby_id = 48542
OR dcmstudy_id = 48542
UNION SELECT id FROM service_t WHERE comp_id = 48542
UNION SELECT id FROM sliceplane_t WHERE tissue_id = 48542
UNION SELECT id FROM store_t WHERE loc_id = 48542
UNION SELECT id FROM subject_t WHERE supersub_id = 48542
UNION SELECT id FROM vc_t WHERE dcmstudy_id = 48542
UNION SELECT id FROM vcdata_t WHERE vc_id = 48542
UNION SELECT id FROM vcdyn_t WHERE vc_id = 48542
UNION SELECT id FROM vcstatic_t WHERE vc_id = 48542
UNION SELECT child_id as id FROM datapar_t WHERE par_id = 48542
UNION SELECT par_id as id FROM datapar_t WHERE child_id = 48542
UNION SELECT store_id as id FROM finst_t WHERE file_id = 48542
UNION SELECT file_id as id FROM finst_t WHERE store_id = 48542
UNION SELECT from_id as id FROM link_t WHERE to_id = 48542
UNION SELECT to_id as id FROM link_t WHERE from_id = 48542
UNION SELECT data_id as id FROM proc_t WHERE filter_id = 48542
UNION SELECT filter_id as id FROM proc_t WHERE data_id = 48542
UNION SELECT subject_id as id FROM subdata_t WHERE data_id = 48542
UNION SELECT data_id as id FROM subdata_t WHERE subject_id = 48542
)
;

If I run this exact query, it takes 21 ms.

I tried to wrap it into a function with:

create function getrefs(int) returns setof object as '
...
' language sql;

Where the ... is just the same query with 48542 replaced with $1.
select getrefs(48542);
takes 10356.000ms

I have also tried some other things such as:

CREATE OR REPLACE FUNCTION mf_getrefobjs(int) RETURNS boolean AS '
DECLARE
oldid alias for $1;
BEGIN
DROP TABLE refobjs;
CREATE TEMPORARY TABLE refobjs AS
SELECT * FROM object WHERE id in (
SELECT id FROM data_t WHERE project_id = oldid
...
);
RETURN 1;
end;
' LANGUAGE plpgSQL;

I have tried returning cursors (they return fast, but the first FETCH
NEXT, is very slow.)

Does anyone know why this would be? Is it a problem that in a function
it doesn't notice that all of the '=' are the same number, and it cannot
optimize the query? Is there some way I could force an EXPLAIN ANALYZE?
(if i run it on SELECT getrefs() I just get that it made 1 function call.)

I've tried adding oldid::int in case it was a casting problem.

Actually, I've also tried stripping it all the way down to one query:
create or replace function getrefs(int4) returns setof object as '
SELECT * FROM object WHERE id in (
SELECT id FROM data_t WHERE project_id = $1::int
);
' language sql;

And that takes 3ms to return 0 rows. It actually seems like it is
ignoring the index on project_id in this case.

It is true that project_id can be Null. It seems that if I make this
query on other tables that have "not null" the don't have the same
performance hit.

Any help would be much appreciated, even if it is just giving me a
better place to ask.

John
=:->

In response to

Responses

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Tom Lane 2004-09-29 05:10:47 Re: Poor Performance for large queries in functions
Previous Message Bruce Momjian 2004-09-28 16:19:13 Re: VC++ psql build broken

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Creager 2004-09-29 03:44:24 Re: This query is still running after 10 hours...
Previous Message Kevin Barnard 2004-09-29 02:41:50 Re: This query is still running after 10 hours...