Re: Query using SeqScan instead of IndexScan

From: Brendan Duddridge <brendan(at)clickspace(dot)com>
To: chris smith <dmagick(at)gmail(dot)com>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "PostgreSQL Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query using SeqScan instead of IndexScan
Date: 2006-04-01 01:31:47
Message-ID: A4EB422F-8A4E-44E8-BB22-E18C5EA7A7A0@clickspace.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ah I see. Ok, well we have a very wide variety here...

category_id | count
-------------+-------
1000521 | 31145
1001211 | 22991
1001490 | 22019
1001628 | 12472
1000046 | 10480
1000087 | 10338
1001223 | 10020
1001560 | 9532
1000954 | 8633
1001314 | 8191
1001482 | 8140
1001556 | 7959
1001481 | 7850
[snip...]
1001133 | 1
1000532 | 1
1000691 | 1
1000817 | 1
1000783 | 1
1000689 | 1

(1157 rows)

So what's the best kind of query to handle this kind of data to make
it fast in all cases? I'd like get down to sub-second response times.

currently we have:

select cp.product_id
from category_product cp, product_attribute_value pav
where cp.category_id = 1001082 and cp.product_id =
pav.product_id;

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 6:23 PM, chris smith wrote:

> On 4/1/06, Brendan Duddridge <brendan(at)clickspace(dot)com> wrote:
>> Hi Jim,
>>
>> I'm not quite sure what you mean by the correlation of category_id?
>
> It means how many distinct values does it have (at least that's my
> understanding of it ;) ).
>
> select category_id, count(*) from category_product group by
> category_id;
>
> will show you how many category_id's there are and how many products
> are in each category.
>
> Having a lot of products in one category (or having a small amount of
> categories) can slow things down because the db can't use the index
> effectively.. which might be what you're seeing (hence why it's fast
> for some categories, slow for others).
>
>
>> On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:
>>
>>> What's the correlation of category_id? The current index scan cost
>>> estimator places a heavy penalty on anything with a correlation much
>>> below about 90%.
>>>
>>> On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
>>>> Hi,
>>>>
>>>> I have a query that is using a sequential scan instead of an index
>>>> scan. I've turned off sequential scans and it is in fact faster
>>>> with
>>>> the index scan.
>>>>
>>>> Here's my before and after.
>>>>
>>>> Before:
>>>>
>>>> ssdev=# SET enable_seqscan TO DEFAULT;
>>>> ssdev=# explain analyze select cp.product_id
>>>> from category_product cp, product_attribute_value pav
>>>> where cp.category_id = 1001082 and cp.product_id =
>>>> pav.product_id;
>>>>
>>>>
>>>> QUERY PLAN
>>>> -------------------------------------------------------------------
>>>> --
>>>> ---
>>>> -------------------------------------------------------------------
>>>> --
>>>> ---
>>>> ------------------------------
>>>> Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual
>>>> time=4.521..2580.520 rows=19695 loops=1)
>>>> Hash Cond: ("outer".product_id = "inner".product_id)
>>>> -> Seq Scan on product_attribute_value pav
>>>> (cost=0.00..40127.12
>>>> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
>>>> loops=1)
>>>> -> Hash (cost=23.10..23.10 rows=970 width=4) (actual
>>>> time=2.267..2.267 rows=1140 loops=1)
>>>> -> Index Scan using
>>>> x_category_product__category_id_fk_idx
>>>> on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual
>>>> time=0.122..1.395 rows=1140 loops=1)
>>>> Index Cond: (category_id = 1001082)
>>>> Total runtime: 2584.221 ms
>>>> (7 rows)
>>>>
>>>>
>>>> After:
>>>>
>>>> ssdev=# SET enable_seqscan TO false;
>>>> ssdev=# explain analyze select cp.product_id
>>>> from category_product cp, product_attribute_value pav
>>>> where cp.category_id = 1001082 and cp.product_id =
>>>> pav.product_id;
>>>>
>>>>
>>>> QUERY PLAN
>>>> -------------------------------------------------------------------
>>>> --
>>>> ---
>>>> -------------------------------------------------------------------
>>>> --
>>>> ---
>>>> -------------------------------------
>>>> Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual
>>>> time=0.373..71.177 rows=19695 loops=1)
>>>> -> Index Scan using x_category_product__category_id_fk_idx on
>>>> category_product cp (cost=0.00..23.10 rows=970 width=4) (actual
>>>> time=0.129..1.438 rows=1140 loops=1)
>>>> Index Cond: (category_id = 1001082)
>>>> -> Index Scan using product_attribute_value__product_id_fk_idx
>>>> on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4)
>>>> (actual time=0.016..0.053 rows=17 loops=1140)
>>>> Index Cond: ("outer".product_id = pav.product_id)
>>>> Total runtime: 74.747 ms
>>>> (6 rows)
>>>>
>>>> There's quite a big difference in speed there. 2584.221 ms vs.
>>>> 74.747
>>>> ms.
>>>>
>>>> Any ideas what I can do to improve this without turning sequential
>>>> scanning off?
>>>>
>>>> Thanks,
>>>>
>>>> ___________________________________________________________________
>>>> _
>>>> Brendan Duddridge | CTO | 403-277-5591 x24 |
>>>> brendan(at)clickspace(dot)com
>>>>
>>>> ClickSpace Interactive Inc.
>>>> Suite L100, 239 - 10th Ave. SE
>>>> Calgary, AB T2G 0V9
>>>>
>>>> http://www.clickspace.com
>>>>
>>>
>>>
>>>
>>> --
>>> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
>>> Pervasive Software http://pervasive.com work: 512-231-6117
>>> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
>>> your
>>> message can get through to the mailing list cleanly
>>>
>>
>>
>>
>>
>
>
> --
> Postgresql & php tutorials
> http://www.designmagick.com/
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Antoine 2006-04-01 09:54:00 Re: [Solved] Slow performance on Windows .NET and OleDb
Previous Message chris smith 2006-04-01 01:23:37 Re: Query using SeqScan instead of IndexScan