Re: Query using SeqScan instead of IndexScan

From: Brendan Duddridge <brendan(at)clickspace(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: chris smith <dmagick(at)gmail(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query using SeqScan instead of IndexScan
Date: 2006-04-01 17:51:12
Message-ID: 95C1A48B-A76F-4F9B-B5EE-0BFAF8873067@clickspace.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Jim,

from SELECT * FROM pg_stats WHERE tablename='table' AND
attname='category_id'

I find correlation on category_product for category_id is 0.643703

Would setting the index on category_id to be clustered help with this?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB T2G 0V9

http://www.clickspace.com

On Apr 1, 2006, at 8:32 AM, Jim C. Nasby wrote:

> On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
>> On 4/1/06, Brendan Duddridge <brendan(at)clickspace(dot)com> wrote:
>>> Hi Jim,
>>>
>>> I'm not quite sure what you mean by the correlation of category_id?
>>
>> It means how many distinct values does it have (at least that's my
>> understanding of it ;) ).
>
> Your understanding is wrong. :) What you're discussing is n_distinct.
>
> http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html
>
> correlation: "Statistical correlation between physical row ordering
> and
> logical ordering of the column values. This ranges from -1 to +1. When
> the value is near -1 or +1, an index scan on the column will be
> estimated to be cheaper than when it is near zero, due to reduction of
> random access to the disk. (This column is NULL if the column data
> type
> does not have a < operator.)"
>
> In other words, the following will have a correlation of 1:
>
> 1
> 2
> 3
> ...
> 998
> 999
> 1000
>
> And this is -1...
>
> 1000
> 999
> ...
> 2
> 1
>
> While this would have a very low correlation:
>
> 1
> 1000
> 2
> 999
> ...
>
> The lower the correlation, the more expensive an index scan is,
> because
> it's more random. As I mentioned, I believe that the current index
> scan
> cost estimator is flawed though, because it will bias heavily against
> correlations that aren't close to 1 or -1.
>
> So, what does
>
> SELECT * FROM pg_stats WHERE tablename='table' AND
> attname='category_id';
>
> show?
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2006-04-01 19:23:20 Re: [Solved] Slow performance on Windows .NET and OleDb
Previous Message Jim C. Nasby 2006-04-01 15:32:47 Re: Query using SeqScan instead of IndexScan