Re: Query using SeqScan instead of IndexScan

From: "chris smith" <dmagick(at)gmail(dot)com>
To: "Brendan Duddridge" <brendan(at)clickspace(dot)com>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "PostgreSQL Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query using SeqScan instead of IndexScan
Date: 2006-04-01 01:23:37
Message-ID: 3c1395330603311723i368092f2jf0c2291ab033ae2a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 4/1/06, Brendan Duddridge <brendan(at)clickspace(dot)com> wrote:
> Hi Jim,
>
> I'm not quite sure what you mean by the correlation of category_id?

It means how many distinct values does it have (at least that's my
understanding of it ;) ).

select category_id, count(*) from category_product group by category_id;

will show you how many category_id's there are and how many products
are in each category.

Having a lot of products in one category (or having a small amount of
categories) can slow things down because the db can't use the index
effectively.. which might be what you're seeing (hence why it's fast
for some categories, slow for others).

> On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:
>
> > What's the correlation of category_id? The current index scan cost
> > estimator places a heavy penalty on anything with a correlation much
> > below about 90%.
> >
> > On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
> >> Hi,
> >>
> >> I have a query that is using a sequential scan instead of an index
> >> scan. I've turned off sequential scans and it is in fact faster with
> >> the index scan.
> >>
> >> Here's my before and after.
> >>
> >> Before:
> >>
> >> ssdev=# SET enable_seqscan TO DEFAULT;
> >> ssdev=# explain analyze select cp.product_id
> >> from category_product cp, product_attribute_value pav
> >> where cp.category_id = 1001082 and cp.product_id =
> >> pav.product_id;
> >>
> >>
> >> QUERY PLAN
> >> ---------------------------------------------------------------------
> >> ---
> >> ---------------------------------------------------------------------
> >> ---
> >> ------------------------------
> >> Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual
> >> time=4.521..2580.520 rows=19695 loops=1)
> >> Hash Cond: ("outer".product_id = "inner".product_id)
> >> -> Seq Scan on product_attribute_value pav (cost=0.00..40127.12
> >> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
> >> loops=1)
> >> -> Hash (cost=23.10..23.10 rows=970 width=4) (actual
> >> time=2.267..2.267 rows=1140 loops=1)
> >> -> Index Scan using x_category_product__category_id_fk_idx
> >> on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual
> >> time=0.122..1.395 rows=1140 loops=1)
> >> Index Cond: (category_id = 1001082)
> >> Total runtime: 2584.221 ms
> >> (7 rows)
> >>
> >>
> >> After:
> >>
> >> ssdev=# SET enable_seqscan TO false;
> >> ssdev=# explain analyze select cp.product_id
> >> from category_product cp, product_attribute_value pav
> >> where cp.category_id = 1001082 and cp.product_id =
> >> pav.product_id;
> >>
> >>
> >> QUERY PLAN
> >> ---------------------------------------------------------------------
> >> ---
> >> ---------------------------------------------------------------------
> >> ---
> >> -------------------------------------
> >> Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual
> >> time=0.373..71.177 rows=19695 loops=1)
> >> -> Index Scan using x_category_product__category_id_fk_idx on
> >> category_product cp (cost=0.00..23.10 rows=970 width=4) (actual
> >> time=0.129..1.438 rows=1140 loops=1)
> >> Index Cond: (category_id = 1001082)
> >> -> Index Scan using product_attribute_value__product_id_fk_idx
> >> on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4)
> >> (actual time=0.016..0.053 rows=17 loops=1140)
> >> Index Cond: ("outer".product_id = pav.product_id)
> >> Total runtime: 74.747 ms
> >> (6 rows)
> >>
> >> There's quite a big difference in speed there. 2584.221 ms vs. 74.747
> >> ms.
> >>
> >> Any ideas what I can do to improve this without turning sequential
> >> scanning off?
> >>
> >> Thanks,
> >>
> >> ____________________________________________________________________
> >> Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com
> >>
> >> ClickSpace Interactive Inc.
> >> Suite L100, 239 - 10th Ave. SE
> >> Calgary, AB T2G 0V9
> >>
> >> http://www.clickspace.com
> >>
> >
> >
> >
> > --
> > Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> > Pervasive Software http://pervasive.com work: 512-231-6117
> > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that
> > your
> > message can get through to the mailing list cleanly
> >
>
>
>
>

--
Postgresql & php tutorials
http://www.designmagick.com/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brendan Duddridge 2006-04-01 01:31:47 Re: Query using SeqScan instead of IndexScan
Previous Message Brendan Duddridge 2006-04-01 01:09:18 Re: Query using SeqScan instead of IndexScan