Skip site navigation (1) Skip section navigation (2)

Re: Poor Performance for large queries in functions

From: John Meinel <john(at)johnmeinel(dot)com>
To: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Poor Performance for large queries in functions
Date: 2004-09-29 06:34:07
Message-ID: 415A575F.2060802@johnmeinel.com (view raw or flat)
Thread:
Lists: pgsql-hackers-win32pgsql-performance
Tom Lane wrote:
> John Meinel <john(at)johnmeinel(dot)com> writes:
> 
>>... However, if I try to 
>>bundle this query up into a server side function, it runs very slow (10 
>>seconds). I'm trying to figure out why, but since I can't run EXPLAIN 
>>ANALYZE inside a function, I don't really know what else to do.
> 
> 
> A parameterized query inside a function is basically the same as a
> PREPARE'd query with parameters at the SQL level.  So you can
> investigate what's happening here with
> 
> 	PREPARE foo(int) AS
> 		SELECT * FROM object WHERE id in (
> 			SELECT id FROM data_t WHERE project_id = $1
> 		UNION SELECT ... ;
> 
> 	EXPLAIN ANALYZE EXECUTE foo(48542);
> 
> I'm not sure where the problem is either, so please do send along the
> results.
> 
> 			regards, tom lane
> 
> PS: pgsql-performance would be a more appropriate venue for this
> discussion.

Well, I think I tracked the problem down to the fact that the column 
does not have a "not null" constraint on it. Here is a demonstration. 
Basically, I have 3 tables, tobjects, tdata, and tproject. tdata 
basically just links between tobjects and tproject, but isn't required 
to link to tproject. Yes, the real data has more columns, but this shows 
the problem.

jfmeinel=> \d tobjects
    Table "public.tobjects"
  Column |  Type   | Modifiers
--------+---------+-----------
  id     | integer | not null
Indexes:
     "tobjects_pkey" primary key, btree (id)

jfmeinel=> \d tproject
    Table "public.tproject"
  Column |  Type   | Modifiers
--------+---------+-----------
  id     | integer | not null
Indexes:
     "tproject_pkey" primary key, btree (id)

jfmeinel=> \d tdata
        Table "public.tdata"
    Column   |  Type   | Modifiers
------------+---------+-----------
  id         | integer | not null
  project_id | integer |
Indexes:
     "tdata_pkey" primary key, btree (id)
     "tdata_project_id_idx" btree (project_id)
Foreign-key constraints:
     "tdata_id_fkey" FOREIGN KEY (id) REFERENCES tobjects(id) ON UPDATE 
CASCADE ON DELETE CASCADE
     "tdata_project_id_fkey" FOREIGN KEY (project_id) REFERENCES 		
		tproject(id) ON UPDATE CASCADE ON DELETE SET DEFAULT

jfmeinel=> select count(*) from tdata;
  count
--------
  545768

jfmeinel=> select count(*) - count(project_id) from tdata;
  ?column?
----------
       240

So tdata(project_id) is almost completely full, of the 540000+ entries, 
only 240 are null.


jfmeinel=> prepare myget(int) as select id from tdata
jfmeinel->     where project_id = $1;
PREPARE

jfmeinel=> explain analyze execute myget(30000);
                                                  QUERY PLAN
--------------------------------------------------------------------
  Seq Scan on tdata  (cost=0.00..9773.10 rows=181923 width=4)
	(actual time=1047.000..1047.000 rows=0 loops=1)
    Filter: (project_id = $1)
  Total runtime: 1047.000 ms

jfmeinel=> explain analyze select id from tdata where project_id = 30000;
                                                          QUERY PLAN

-------------------------------------------------------------------------
  Index Scan using tdata_project_id_idx on tdata  (cost=0.00..4.20 
rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1)
    Index Cond: (project_id = 30000)
  Total runtime: 0.000 ms

So notice that when doing the actual select it is able to do the index 
query. But for some reason with a prepared statement, it is not able to 
do it.

Any ideas?

Since I only have the integers now, I can send the data to someone if 
they care to investigate it. It comes to 2.2M as a .tar.bz2, so 
obviously I'm not going to spam the list.

If I rewrite myget as:
prepare myget(int) as select id from tdata where project_id = 30000; it 
does the right thing again. So it's something about how a variable 
interacts with an indexed column with null values.

Note: I've tried creating a script that generates dummy data to show 
this problem and I have failed (it always performed the query correctly.)

But this test data definitely shows the problem. And yes, I've vacuum 
analyzed all over the place.

John
=:->

PS> I tested this on PostgreSQL 7.4.3, and it did not demonstrate this 
problem. I am using PostgreSQL 8.0.0beta2 (probably -dev1)

In response to

Responses

pgsql-performance by date

Next:From: Richard HuxtonDate: 2004-09-29 08:40:11
Subject: Re: [PERFORM] Poor Performance for large queries
Previous:From: Tom LaneDate: 2004-09-29 05:10:47
Subject: Re: Poor Performance for large queries in functions

pgsql-hackers-win32 by date

Next:From: Richard HuxtonDate: 2004-09-29 08:40:11
Subject: Re: [PERFORM] Poor Performance for large queries
Previous:From: Tom LaneDate: 2004-09-29 05:10:47
Subject: Re: Poor Performance for large queries in functions

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group