Re: Optimizer seems to be way off, why?

From: Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimizer seems to be way off, why?
Date: 2005-07-20 19:16:24
Message-ID: 42DEA308.6030608@aeccom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Richard Huxton wrote:
> Dirk Lutzebäck wrote:
>
>> Hi,
>>
>> I do not under stand the following explain output (pgsql 8.0.3):
>>
>> explain analyze
>> select b.e from b, d
>> where b.r=516081780 and b.c=513652057 and b.e=d.e;
>>
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------------
>>
>> Nested Loop (cost=0.00..1220.09 rows=1 width=4) (actual
>> time=0.213..2926.845 rows=324503 loops=1)
>> -> Index Scan using b_index on b (cost=0.00..1199.12 rows=1
>> width=4) (actual time=0.104..17.418 rows=3293 loops=1)
>> Index Cond: (r = 516081780::oid)
>> Filter: (c = 513652057::oid)
>> -> Index Scan using d_e_index on d (cost=0.00..19.22 rows=140
>> width=4) (actual time=0.009..0.380 rows=99 loops=3293)
>> Index Cond: ("outer".e = d.e)
>> Total runtime: 3638.783 ms
>> (7 rows)
>>
>> Why is the rows estimate for b_index and the nested loop 1? It is
>> actually 3293 and 324503.
>
>
> I'm guessing (and that's all it is) that b.r and b.c have a higher
> correlation than the planner is expecting. That is, it expects the
> b.c=... to reduce the number of matching rows much more than it is.
>
> Try a query just on WHERE b.r=516081780 and see if it gets the estimate
> right for that.
>
> If it's a common query, it might be worth an index on (r,c)
>
> --
> Richard Huxton
> Archonet Ltd
>

Thanks Richard, dropping the join for b.c now gives better estimates (it
also uses a different index now) although not accurate (off by factor
10). This query is embedded in a larger query which now got a 1000 times
speed up (!) because I can drop b.c because it is redundant.

Though, why can't the planner see this correlation? I think somebody
said the planner does not know about multiple column correlations, does it?

Regards,

Dirk

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message John Mendenhall 2005-07-20 20:28:32 Re: performance decrease after reboot
Previous Message Richard Huxton 2005-07-20 17:01:41 Re: Optimizer seems to be way off, why?