Skip site navigation (1) Skip section navigation (2)

Poor Performance for large queries in functions

From: John Meinel <john(at)johnmeinel(dot)com>
To: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Poor Performance for large queries in functions
Date: 2004-09-29 03:14:23
Message-ID: 415A288F.2030002@johnmeinel.com (view raw or flat)
Thread:
Lists: pgsql-hackers-win32pgsql-performance
I've been using postgres 8.0 beta for win32 for quite a while now, and I 
am very happy with it. However, I am having an odd problem. Basically, I 
have a large query which is a bunch of UNION SELECTs from a bunch of 
different tables. I have all the necessary columns indexed, and when I 
run the query by hand, it runs very fast (20ms). However, if I try to 
bundle this query up into a server side function, it runs very slow (10 
seconds). I'm trying to figure out why, but since I can't run EXPLAIN 
ANALYZE inside a function, I don't really know what else to do.

The layout of my database is a bunch of tables that all have an object 
id associated with them. There is a main object table that defines per 
object permissions, and then all of the tables refer to eachother by the 
unique id. What I'm trying to do is get a list of objects that might 
refer to a given id.

Here is the query. 48542 is just one of the object ids. Some of these 
tables have 500,000 rows, but most are quite small, and the result is 
only 3 rows.

SELECT * FROM object WHERE id in (
	SELECT id FROM data_t WHERE project_id = 48542
	UNION SELECT id FROM analyzeset_t
		WHERE subject_id = 48542
		   OR img_id = 48542
		   OR hdr_id = 48542
	UNION SELECT id FROM bdi_t WHERE dcmstudy_id = 48542
	UNION SELECT id FROM crq_t WHERE dcmstudy_id = 48542
	UNION SELECT id FROM dcmfile_t WHERE dcmseries_id = 48542
	UNION SELECT id FROM dcmseries_t WHERE dcmstudy_id = 48542
	UNION SELECT id FROM dcmstudy_t
		WHERE dcmsub_id = 48542
		   OR consent_id = 48542
	UNION SELECT id FROM hsq_t WHERE dcmstudy_id = 48542
	UNION SELECT id FROM job_t WHERE claimed_id = 48542
	UNION SELECT id FROM loc_t WHERE contact_id = 48542
	UNION SELECT id FROM pathslide_t WHERE plane_id = 48542
	UNION SELECT id FROM pft_t WHERE dcmstudy_id = 48542
	UNION SELECT id FROM pftblood_t WHERE pft_id = 48542
	UNION SELECT id FROM pftdata_t WHERE pft_id = 48542
	UNION SELECT id FROM pftpred_t WHERE pft_id = 48542
	UNION SELECT id FROM run_t WHERE job_id = 48542
	UNION SELECT id FROM scanread_t
		WHERE readby_id = 48542
		   OR dcmstudy_id = 48542
	UNION SELECT id FROM service_t WHERE comp_id = 48542
	UNION SELECT id FROM sliceplane_t WHERE tissue_id = 48542
	UNION SELECT id FROM store_t WHERE loc_id = 48542
	UNION SELECT id FROM subject_t WHERE supersub_id = 48542
	UNION SELECT id FROM vc_t WHERE dcmstudy_id = 48542
	UNION SELECT id FROM vcdata_t WHERE vc_id = 48542
	UNION SELECT id FROM vcdyn_t WHERE vc_id = 48542
	UNION SELECT id FROM vcstatic_t WHERE vc_id = 48542
	UNION SELECT child_id as id FROM datapar_t WHERE par_id = 48542
	UNION SELECT par_id as id FROM datapar_t WHERE child_id = 48542
	UNION SELECT store_id as id FROM finst_t WHERE file_id = 48542
	UNION SELECT file_id as id FROM finst_t WHERE store_id = 48542
	UNION SELECT from_id as id FROM link_t WHERE to_id = 48542
	UNION SELECT to_id as id FROM link_t WHERE from_id = 48542
	UNION SELECT data_id as id FROM proc_t WHERE filter_id = 48542
	UNION SELECT filter_id as id FROM proc_t WHERE data_id = 48542
	UNION SELECT subject_id as id FROM subdata_t WHERE data_id = 48542
	UNION SELECT data_id as id FROM subdata_t WHERE subject_id = 48542
)
;

If I run this exact query, it takes 21 ms.

I tried to wrap it into a function with:

create function getrefs(int) returns setof object as '
...
' language sql;

Where the ... is just the same query with 48542 replaced with $1.
select getrefs(48542);
takes 10356.000ms

I have also tried some other things such as:

CREATE OR REPLACE FUNCTION mf_getrefobjs(int) RETURNS boolean AS '
DECLARE
	oldid alias for $1;
BEGIN
	DROP TABLE refobjs;
     CREATE TEMPORARY TABLE refobjs AS
     SELECT * FROM object WHERE id in (
         SELECT id FROM data_t WHERE project_id = oldid
...
     );
	RETURN 1;
end;
' LANGUAGE plpgSQL;

I have tried returning cursors (they return fast, but the first FETCH 
NEXT, is very slow.)

Does anyone know why this would be? Is it a problem that in a function 
it doesn't notice that all of the '=' are the same number, and it cannot 
optimize the query? Is there some way I could force an EXPLAIN ANALYZE? 
(if i run it on SELECT getrefs() I just get that it made 1 function call.)

I've tried adding oldid::int in case it was a casting problem.

Actually, I've also tried stripping it all the way down to one query:
create or replace function getrefs(int4) returns setof object as '
     SELECT * FROM object WHERE id in (
         SELECT id FROM data_t WHERE project_id = $1::int
     );
' language sql;

And that takes 3ms to return 0 rows. It actually seems like it is 
ignoring the index on project_id in this case.

It is true that project_id can be Null. It seems that if I make this 
query on other tables that have "not null" the don't have the same 
performance hit.

Any help would be much appreciated, even if it is just giving me a 
better place to ask.

John
=:->

In response to

Responses

pgsql-performance by date

Next:From: Robert CreagerDate: 2004-09-29 03:44:24
Subject: Re: This query is still running after 10 hours...
Previous:From: Kevin BarnardDate: 2004-09-29 02:41:50
Subject: Re: This query is still running after 10 hours...

pgsql-hackers-win32 by date

Next:From: Tom LaneDate: 2004-09-29 05:10:47
Subject: Re: Poor Performance for large queries in functions
Previous:From: Bruce MomjianDate: 2004-09-28 16:19:13
Subject: Re: VC++ psql build broken

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group