Re: An "obvious" index not being used

From: Daniele Varrazzo <piro(at)develer(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: An "obvious" index not being used
Date: 2008-06-19 02:19:59
Message-ID: 4859C24F.6000503@develer.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane ha scritto:
> Daniele Varrazzo <piro(at)develer(dot)com> writes:
>> There is an index in the field "foo.account_id" but is not used. The resulting
>> query plan is:
>
>> Aggregate (cost=300940.70..300940.71 rows=1 width=0) (actual
>> time=13412.088..13412.089 rows=1 loops=1)
>> -> Hash IN Join (cost=11.97..299858.32 rows=432953 width=0) (actual
>> time=0.678..13307.074 rows=92790 loops=1)
>> Hash Cond: (foo.account_id = accounts.id)
>> -> Seq Scan on foo (cost=0.00..275591.14 rows=5313514 width=4)
>> (actual time=0.014..7163.538 rows=5313514 loops=1)
>
> Well, if the estimate of 432953 rows selected were correct, it'd be
> right not to use the index. Fetching one row in ten is not a chore
> for an indexscan. (I'm not sure it'd prefer an indexscan even with an
> accurate 92K-row estimate, but at least you'd be in the realm where
> tweaking random_page_cost would make a difference.)

Let me guess: because the account tables has an estimated (and correct) guess
of 22 records fetched out from 270 =~ 8%, it assumes that it will need to
fetch the 8% of 5.3M records (which... yes, it matches the estimate of 433K).
Well, this seems terribly wrong for this data set :(

> I'm not sure why that estimate is so bad, given that you said you
> increased the stats target on the table. Is there anything particularly
> skewed about the distribution of the account IDs?

Probably there is, in the sense that the relatively many accounts of 'abc'
type are referred by relatively few records. In the plan for the hardcoded
query the estimate is:

-> Bitmap Index Scan on ifoo_x1 (cost=0.00..4115.67 rows=178308
width=0) (actual time=89.766..89.766 rows=92790 loops=1)

which is actually more accurate.

I suspect the foo.account_id statistical data are not used at all in query:
the query planner can only estimate the number of accounts to look for, not
how they are distributed in the referencing tables. It seems the only way to
get the proper plan is to add a load of fake accounts! Well, I'd rather have
the query executed in 2 times, in order to have the stats correctly used: this
is the first time it happens to me.

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Albe Laurenz 2008-06-19 07:13:42 Re: WAL DUDAS
Previous Message Tom Lane 2008-06-19 01:43:34 Re: An "obvious" index not being used