Re: Query using SeqScan instead of IndexScan

From: Brendan Duddridge <brendan(at)clickspace(dot)com>
To: Jim C(dot) Nasby <jnasby(at)pervasive(dot)com>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query using SeqScan instead of IndexScan
Date: 2006-04-01 01:09:18
Message-ID: EBEB7ACF-84A8-4FCF-A481-8C087FF372D1@clickspace.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?
The category_id is part of a compound primary key in the
category_product
table. The primary key on category_product is (category_id, product_id).

Here's the definitions of the two tables involved in the join:

Table "public.category_product"
Column | Type | Modifiers
---------------------+----------------------+-----------
category_id | integer | not null
product_id | integer | not null
en_name_sort_order | integer |
fr_name_sort_order | integer |
merchant_sort_order | integer |
price_sort_order | integer |
merchant_count | integer |
is_active | character varying(5) |
Indexes:
"x_category_product_pk" PRIMARY KEY, btree (category_id,
product_id)
"category_product__is_active_idx" btree (is_active)
"category_product__merchant_sort_order_idx" btree
(merchant_sort_order)
"x_category_product__category_id_fk_idx" btree (category_id)
CLUSTER
"x_category_product__product_id_fk_idx" btree (product_id)
Foreign-key constraints:
"x_category_product_category_fk" FOREIGN KEY (category_id)
REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED
"x_category_product_product_fk" FOREIGN KEY (product_id)
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED

Table "public.product_attribute_value"
Column | Type | Modifiers
----------------------------+-----------------------+-----------
attribute_id | integer | not null
attribute_unit_id | integer |
attribute_value_id | integer |
boolean_value | character varying(5) |
decimal_value | numeric(30,10) |
product_attribute_value_id | integer | not null
product_id | integer | not null
product_reference_id | integer |
status_code | character varying(32) |
Indexes:
"product_attribute_value_pk" PRIMARY KEY, btree
(product_attribute_value_id)
"product_attribute_value__attribute_id_fk_idx" btree (attribute_id)
"product_attribute_value__attribute_unit_id_fk_idx" btree
(attribute_unit_id)
"product_attribute_value__attribute_value_id_fk_idx" btree
(attribute_value_id)
"product_attribute_value__product_id_fk_idx" btree (product_id)
"product_attribute_value__product_reference_id_fk_idx" btree
(product_reference_id)
Foreign-key constraints:
"product_attribute_value_attribute_fk" FOREIGN KEY
(attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE
INITIALLY DEFERRED
"product_attribute_value_attributeunit_fk" FOREIGN KEY
(attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id)
DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_attributevalue_fk" FOREIGN KEY
(attribute_value_id) REFERENCES attribute_value(attribute_value_id)
DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_product_fk" FOREIGN KEY (product_id)
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_productreference_fk" FOREIGN KEY
(product_reference_id) REFERENCES product(product_id) DEFERRABLE
INITIALLY DEFERRED

Not sure if that helps answer your question, but the query is pretty
slow. Sometimes it takes 5 - 15 seconds depending on the category_id
specified.

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:

> What's the correlation of category_id? The current index scan cost
> estimator places a heavy penalty on anything with a correlation much
> below about 90%.
>
> On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
>> Hi,
>>
>> I have a query that is using a sequential scan instead of an index
>> scan. I've turned off sequential scans and it is in fact faster with
>> the index scan.
>>
>> Here's my before and after.
>>
>> Before:
>>
>> ssdev=# SET enable_seqscan TO DEFAULT;
>> ssdev=# explain analyze select cp.product_id
>> from category_product cp, product_attribute_value pav
>> where cp.category_id = 1001082 and cp.product_id =
>> pav.product_id;
>>
>>
>> QUERY PLAN
>> ---------------------------------------------------------------------
>> ---
>> ---------------------------------------------------------------------
>> ---
>> ------------------------------
>> Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual
>> time=4.521..2580.520 rows=19695 loops=1)
>> Hash Cond: ("outer".product_id = "inner".product_id)
>> -> Seq Scan on product_attribute_value pav (cost=0.00..40127.12
>> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
>> loops=1)
>> -> Hash (cost=23.10..23.10 rows=970 width=4) (actual
>> time=2.267..2.267 rows=1140 loops=1)
>> -> Index Scan using x_category_product__category_id_fk_idx
>> on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual
>> time=0.122..1.395 rows=1140 loops=1)
>> Index Cond: (category_id = 1001082)
>> Total runtime: 2584.221 ms
>> (7 rows)
>>
>>
>> After:
>>
>> ssdev=# SET enable_seqscan TO false;
>> ssdev=# explain analyze select cp.product_id
>> from category_product cp, product_attribute_value pav
>> where cp.category_id = 1001082 and cp.product_id =
>> pav.product_id;
>>
>>
>> QUERY PLAN
>> ---------------------------------------------------------------------
>> ---
>> ---------------------------------------------------------------------
>> ---
>> -------------------------------------
>> Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual
>> time=0.373..71.177 rows=19695 loops=1)
>> -> Index Scan using x_category_product__category_id_fk_idx on
>> category_product cp (cost=0.00..23.10 rows=970 width=4) (actual
>> time=0.129..1.438 rows=1140 loops=1)
>> Index Cond: (category_id = 1001082)
>> -> Index Scan using product_attribute_value__product_id_fk_idx
>> on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4)
>> (actual time=0.016..0.053 rows=17 loops=1140)
>> Index Cond: ("outer".product_id = pav.product_id)
>> Total runtime: 74.747 ms
>> (6 rows)
>>
>> There's quite a big difference in speed there. 2584.221 ms vs. 74.747
>> ms.
>>
>> Any ideas what I can do to improve this without turning sequential
>> scanning off?
>>
>> Thanks,
>>
>> ____________________________________________________________________
>> Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com
>>
>> ClickSpace Interactive Inc.
>> Suite L100, 239 - 10th Ave. SE
>> Calgary, AB T2G 0V9
>>
>> http://www.clickspace.com
>>
>
>
>
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
> your
> message can get through to the mailing list cleanly
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message chris smith 2006-04-01 01:23:37 Re: Query using SeqScan instead of IndexScan
Previous Message Magnus Hagander 2006-03-31 23:27:15 Re: [Solved] Slow performance on Windows .NET and OleDb