Re: [pgsql-hackers-win32] Poor Performance for large queries

From: John Meinel <john(at)johnmeinel(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [pgsql-hackers-win32] Poor Performance for large queries
Date: 2004-09-29 16:44:14
Message-ID: 415AE65E.3030200@johnmeinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32 pgsql-performance

Tom Lane wrote:
> [ enlarging on Richard's response a bit ]
>
> John Meinel <john(at)johnmeinel(dot)com> writes:
>
>>jfmeinel=> explain analyze execute myget(30000);
>> QUERY PLAN
>>--------------------------------------------------------------------
>> Seq Scan on tdata (cost=0.00..9773.10 rows=181923 width=4)
>> (actual time=1047.000..1047.000 rows=0 loops=1)
>> Filter: (project_id = $1)
>> Total runtime: 1047.000 ms
>
>
>>jfmeinel=> explain analyze select id from tdata where project_id = 30000;
>> QUERY PLAN
>
>
>>-------------------------------------------------------------------------
>> Index Scan using tdata_project_id_idx on tdata (cost=0.00..4.20
>>rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1)
>> Index Cond: (project_id = 30000)
>> Total runtime: 0.000 ms
>
>
>>So notice that when doing the actual select it is able to do the index
>>query. But for some reason with a prepared statement, it is not able to
>>do it.
>
>
> This isn't a "can't do it" situation, it's a "doesn't want to do it"
> situation, and it's got nothing whatever to do with null or not null.
> The issue is the estimated row count, which in the first case is so high
> as to make the seqscan approach look cheaper. So the real question here
> is what are the statistics on the column that are making the planner
> guess such a large number when it has no specific information about the
> compared-to value. Do you have one extremely common value in the column?
> Have you done an ANALYZE recently on the table, and if so can you show
> us the pg_stats row for the column?
>
> regards, tom lane
>

The answer is "yes" that particular column has very common numbers in
it. Project id is a number from 1->21. I ended up modifying my query
such that I do the bulk of the work in a regular UNION SELECT so that
all that can be optimized, and then I later do another query for this
row in an 'EXECUTE ...' so that unless I'm actually requesting a small
number, the query planner can notice that it can do an indexed query.

I'm pretty sure this is just avoiding worst case scenario. Because it is
true that if I use the number 18, it will return 500,000 rows. Getting
those with an indexed lookup would be very bad. But typically, I'm doing
numbers in a very different range, and so the planner was able to know
that it would not likely find that number.

Thanks for pointing out what the query planner was thinking, I was able
to work around it.

John
=:->

In response to

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Steve Gollery 2004-09-30 18:26:25 problem getting started with PostgreSQL 8 beta 3 on XP
Previous Message Tom Lane 2004-09-29 16:18:13 Re: Poor Performance for large queries in functions

Browse pgsql-performance by date

  From Date Subject
Next Message SZŰCS Gábor 2004-09-29 17:44:13 stubborn query confuses two different servers
Previous Message Tom Lane 2004-09-29 16:18:13 Re: Poor Performance for large queries in functions