Skip site navigation (1) Skip section navigation (2)

Re: An "obvious" index not being used

From: Daniele Varrazzo <piro(at)develer(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: An "obvious" index not being used
Date: 2008-06-19 02:19:59
Message-ID: 4859C24F.6000503@develer.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane ha scritto:
> Daniele Varrazzo <piro(at)develer(dot)com> writes:
>> There is an index in the field "foo.account_id" but is not used. The resulting 
>> query plan is:
> 
>>   Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
>> time=13412.088..13412.089 rows=1 loops=1)
>>     ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0) (actual
>> time=0.678..13307.074 rows=92790 loops=1)
>>           Hash Cond: (foo.account_id = accounts.id)
>>           ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514 width=4)
>> (actual time=0.014..7163.538 rows=5313514 loops=1)
> 
> Well, if the estimate of 432953 rows selected were correct, it'd be
> right not to use the index.  Fetching one row in ten is not a chore
> for an indexscan.  (I'm not sure it'd prefer an indexscan even with an
> accurate 92K-row estimate, but at least you'd be in the realm where
> tweaking random_page_cost would make a difference.)

Let me guess: because the account tables has an estimated (and correct) guess 
of 22 records fetched out from 270 =~ 8%, it assumes that it will need to 
fetch the 8% of 5.3M records (which... yes, it matches the estimate of 433K). 
Well, this seems terribly wrong for this data set :(

> I'm not sure why that estimate is so bad, given that you said you
> increased the stats target on the table.  Is there anything particularly
> skewed about the distribution of the account IDs?

Probably there is, in the sense that the relatively many accounts of 'abc' 
type are referred by relatively few records. In the plan for the hardcoded 
query the estimate is:

->  Bitmap Index Scan on ifoo_x1  (cost=0.00..4115.67 rows=178308
width=0) (actual time=89.766..89.766 rows=92790 loops=1)

which is actually more accurate.

I suspect the foo.account_id statistical data are not used at all in query: 
the query planner can only estimate the number of accounts to look for, not 
how they are distributed in the referencing tables. It seems the only way to 
get the proper plan is to add a load of fake accounts! Well, I'd rather have 
the query executed in 2 times, in order to have the stats correctly used: this 
is the first time it happens to me.

-- 
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

In response to

Responses

pgsql-performance by date

Next:From: Albe LaurenzDate: 2008-06-19 07:13:42
Subject: Re: WAL DUDAS
Previous:From: Tom LaneDate: 2008-06-19 01:43:34
Subject: Re: An "obvious" index not being used

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group