Skip site navigation (1) Skip section navigation (2)

Re: [pgsql-hackers-win32] Poor Performance for large queries

From: John Meinel <john(at)johnmeinel(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [pgsql-hackers-win32] Poor Performance for large queries
Date: 2004-09-29 16:44:14
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers-win32pgsql-performance
Tom Lane wrote:
> [ enlarging on Richard's response a bit ]
> John Meinel <john(at)johnmeinel(dot)com> writes:
>>jfmeinel=> explain analyze execute myget(30000);
>>                                                  QUERY PLAN
>>  Seq Scan on tdata  (cost=0.00..9773.10 rows=181923 width=4)
>>	(actual time=1047.000..1047.000 rows=0 loops=1)
>>    Filter: (project_id = $1)
>>  Total runtime: 1047.000 ms
>>jfmeinel=> explain analyze select id from tdata where project_id = 30000;
>>                                                          QUERY PLAN
>>  Index Scan using tdata_project_id_idx on tdata  (cost=0.00..4.20 
>>rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1)
>>    Index Cond: (project_id = 30000)
>>  Total runtime: 0.000 ms
>>So notice that when doing the actual select it is able to do the index 
>>query. But for some reason with a prepared statement, it is not able to 
>>do it.
> This isn't a "can't do it" situation, it's a "doesn't want to do it"
> situation, and it's got nothing whatever to do with null or not null.
> The issue is the estimated row count, which in the first case is so high
> as to make the seqscan approach look cheaper.  So the real question here
> is what are the statistics on the column that are making the planner
> guess such a large number when it has no specific information about the
> compared-to value.  Do you have one extremely common value in the column?
> Have you done an ANALYZE recently on the table, and if so can you show
> us the pg_stats row for the column?
> 			regards, tom lane

The answer is "yes" that particular column has very common numbers in 
it. Project id is a number from 1->21. I ended up modifying my query 
such that I do the bulk of the work in a regular UNION SELECT so that 
all that can be optimized, and then I later do another query for this 
row in an 'EXECUTE ...' so that unless I'm actually requesting a small 
number, the query planner can notice that it can do an indexed query.

I'm pretty sure this is just avoiding worst case scenario. Because it is 
true that if I use the number 18, it will return 500,000 rows. Getting 
those with an indexed lookup would be very bad. But typically, I'm doing 
numbers in a very different range, and so the planner was able to know 
that it would not likely find that number.

Thanks for pointing out what the query planner was thinking, I was able 
to work around it.


In response to

pgsql-performance by date

Next:From: SZŰCS GáborDate: 2004-09-29 17:44:13
Subject: stubborn query confuses two different servers
Previous:From: Tom LaneDate: 2004-09-29 16:18:13
Subject: Re: Poor Performance for large queries in functions

pgsql-hackers-win32 by date

Next:From: Steve GolleryDate: 2004-09-30 18:26:25
Subject: problem getting started with PostgreSQL 8 beta 3 on XP
Previous:From: Tom LaneDate: 2004-09-29 16:18:13
Subject: Re: Poor Performance for large queries in functions

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group