Re: An "obvious" index not being used

From: Daniele Varrazzo <piro(at)develer(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Francisco Reyes <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: An "obvious" index not being used
Date: 2008-07-20 23:07:04
Message-ID: 4883C518.2090604@develer.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane ha scritto:
> Daniele Varrazzo <piro(at)develer(dot)com> writes:
>> In my problem I had 2 tables: a small one (accounts), a large one (foo). The
>> way the query is written doesn't allow the stats from the large table to be
>> used at all, unless the records from the small table are fetched. This is
>> independent from the stats accuracy.
>
>> What the planner does is to assume an even distribution in the data in the
>> joined fields.
>
> Sir, you don't know what you're talking about.

This is probably correct, I am not into the PG internals.

I was just reporting the analysis I proposed in my previous message in this
thread
(http://archives.postgresql.org/pgsql-performance/2008-06/msg00095.php). You
gave me an hint of where the backend was missing to correctly estimate, and I
deduced a guess of the strategy the backend could have used to reach that
result - not matching the reality of my data set but I think matching the
picture it could have using the stats data but not performing any further fetch.

Nobody confuted that message, of course that may have happened because it was
laughable:

Daniele Varrazzo ha scritto:
> Tom Lane ha scritto:
>> Daniele Varrazzo <piro(at)develer(dot)com> writes:
>>> There is an index in the field "foo.account_id" but is not used. The
>>> resulting query plan is:
>>
>>> Aggregate (cost=300940.70..300940.71 rows=1 width=0) (actual
>>> time=13412.088..13412.089 rows=1 loops=1)
>>> -> Hash IN Join (cost=11.97..299858.32 rows=432953 width=0)
>>> (actual
>>> time=0.678..13307.074 rows=92790 loops=1)
>>> Hash Cond: (foo.account_id = accounts.id)
>>> -> Seq Scan on foo (cost=0.00..275591.14 rows=5313514
>>> width=4)
>>> (actual time=0.014..7163.538 rows=5313514 loops=1)
>>
>> Well, if the estimate of 432953 rows selected were correct, it'd be
>> right not to use the index. Fetching one row in ten is not a chore
>> for an indexscan. (I'm not sure it'd prefer an indexscan even with an
>> accurate 92K-row estimate, but at least you'd be in the realm where
>> tweaking random_page_cost would make a difference.)
>
> Let me guess: because the account tables has an estimated (and correct)
> guess of 22 records fetched out from 270 =~ 8%, it assumes that it will
> need to fetch the 8% of 5.3M records (which... yes, it matches the
> estimate of 433K).

This is the idea I had about how the query planner behaved in that query, and
why the query performs as I expect when the joined items are explicit. Was it
wrong?

Thank you very much. Again, the only reason for which I think I was right is
because nobody confuted my previous email.

Regards,

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2008-07-20 23:10:18 Re: 3ware vs Areca
Previous Message Tom Lane 2008-07-20 04:44:47 Re: An "obvious" index not being used