Skip site navigation (1) Skip section navigation (2)

Re: Query using SeqScan instead of IndexScan

From: Brendan Duddridge <brendan(at)clickspace(dot)com>
To: chris smith <dmagick(at)gmail(dot)com>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "PostgreSQL Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query using SeqScan instead of IndexScan
Date: 2006-04-01 01:31:47
Message-ID: A4EB422F-8A4E-44E8-BB22-E18C5EA7A7A0@clickspace.com (view raw or flat)
Thread:
Lists: pgsql-performance
Ah I see. Ok, well we have a very wide variety here...

category_id | count
-------------+-------
      1000521 | 31145
      1001211 | 22991
      1001490 | 22019
      1001628 | 12472
      1000046 | 10480
      1000087 | 10338
      1001223 | 10020
      1001560 |  9532
      1000954 |  8633
      1001314 |  8191
      1001482 |  8140
      1001556 |  7959
      1001481 |  7850
[snip...]
      1001133 |     1
      1000532 |     1
      1000691 |     1
      1000817 |     1
      1000783 |     1
      1000689 |     1

(1157 rows)

So what's the best kind of query to handle this kind of data to make  
it fast in all cases? I'd like get down to sub-second response times.

currently we have:

select cp.product_id
              from category_product cp, product_attribute_value pav
              where cp.category_id = 1001082 and cp.product_id =
              pav.product_id;


Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan(at)clickspace(dot)com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 6:23 PM, chris smith wrote:

> On 4/1/06, Brendan Duddridge <brendan(at)clickspace(dot)com> wrote:
>> Hi Jim,
>>
>> I'm not quite sure what you mean by the correlation of category_id?
>
> It means how many distinct values does it have (at least that's my
> understanding of it ;) ).
>
> select category_id, count(*) from category_product group by  
> category_id;
>
> will show you how many category_id's there are and how many products
> are in each category.
>
> Having a lot of products in one category (or having a small amount of
> categories) can slow things down because the db can't use the index
> effectively.. which might be what you're seeing (hence why it's fast
> for some categories, slow for others).
>
>
>> On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:
>>
>>> What's the correlation of category_id? The current index scan cost
>>> estimator places a heavy penalty on anything with a correlation much
>>> below about 90%.
>>>
>>> On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
>>>> Hi,
>>>>
>>>> I have a query that is using a sequential scan instead of an index
>>>> scan. I've turned off sequential scans and it is in fact faster  
>>>> with
>>>> the index scan.
>>>>
>>>> Here's my before and after.
>>>>
>>>> Before:
>>>>
>>>> ssdev=# SET enable_seqscan TO DEFAULT;
>>>> ssdev=# explain analyze select cp.product_id
>>>>              from category_product cp, product_attribute_value pav
>>>>              where cp.category_id = 1001082 and cp.product_id =
>>>>              pav.product_id;
>>>>
>>>>
>>>>            QUERY PLAN
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> ---
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> ---
>>>> ------------------------------
>>>> Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
>>>> time=4.521..2580.520 rows=19695 loops=1)
>>>>    Hash Cond: ("outer".product_id = "inner".product_id)
>>>>    ->  Seq Scan on product_attribute_value pav   
>>>> (cost=0.00..40127.12
>>>> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
>>>> loops=1)
>>>>    ->  Hash  (cost=23.10..23.10 rows=970 width=4) (actual
>>>> time=2.267..2.267 rows=1140 loops=1)
>>>>          ->  Index Scan using  
>>>> x_category_product__category_id_fk_idx
>>>> on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
>>>> time=0.122..1.395 rows=1140 loops=1)
>>>>                Index Cond: (category_id = 1001082)
>>>> Total runtime: 2584.221 ms
>>>> (7 rows)
>>>>
>>>>
>>>> After:
>>>>
>>>> ssdev=# SET enable_seqscan TO false;
>>>> ssdev=# explain analyze select cp.product_id
>>>>              from category_product cp, product_attribute_value pav
>>>>              where cp.category_id = 1001082 and cp.product_id =
>>>>              pav.product_id;
>>>>
>>>>
>>>>               QUERY PLAN
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> ---
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> ---
>>>> -------------------------------------
>>>> Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual
>>>> time=0.373..71.177 rows=19695 loops=1)
>>>>    ->  Index Scan using x_category_product__category_id_fk_idx on
>>>> category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
>>>> time=0.129..1.438 rows=1140 loops=1)
>>>>          Index Cond: (category_id = 1001082)
>>>>    ->  Index Scan using product_attribute_value__product_id_fk_idx
>>>> on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)
>>>> (actual time=0.016..0.053 rows=17 loops=1140)
>>>>          Index Cond: ("outer".product_id = pav.product_id)
>>>> Total runtime: 74.747 ms
>>>> (6 rows)
>>>>
>>>> There's quite a big difference in speed there. 2584.221 ms vs.  
>>>> 74.747
>>>> ms.
>>>>
>>>> Any ideas what I can do to improve this without turning sequential
>>>> scanning off?
>>>>
>>>> Thanks,
>>>>
>>>> ___________________________________________________________________ 
>>>> _
>>>> Brendan Duddridge | CTO | 403-277-5591 x24 |   
>>>> brendan(at)clickspace(dot)com
>>>>
>>>> ClickSpace Interactive Inc.
>>>> Suite L100, 239 - 10th Ave. SE
>>>> Calgary, AB  T2G 0V9
>>>>
>>>> http://www.clickspace.com
>>>>
>>>
>>>
>>>
>>> --
>>> Jim C. Nasby, Sr. Engineering Consultant      jnasby(at)pervasive(dot)com
>>> Pervasive Software      http://pervasive.com    work: 512-231-6117
>>> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>        subscribe-nomail command to majordomo(at)postgresql(dot)org so that
>>> your
>>>        message can get through to the mailing list cleanly
>>>
>>
>>
>>
>>
>
>
> --
> Postgresql & php tutorials
> http://www.designmagick.com/
>

In response to

pgsql-performance by date

Next:From: AntoineDate: 2006-04-01 09:54:00
Subject: Re: [Solved] Slow performance on Windows .NET and OleDb
Previous:From: chris smithDate: 2006-04-01 01:23:37
Subject: Re: Query using SeqScan instead of IndexScan

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group