Re: Planner not using column limit specified for one column for another column equal to first

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner not using column limit specified for one column for another column equal to first
Date: 2010-04-16 12:49:45
Message-ID: k2u331e40661004160549td6bf86d5pb6fde70e6d1ab5d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

16 квітня 2010 р. 11:31 Yeb Havinga <yebhavinga(at)gmail(dot)com> написав:

> Віталій Тимчишин wrote:
>
>> Hello.
>>
>> I have a query that performs very poor because there is a limit on join
>> column that is not applied to other columns:
>>
>> select * from company this_ left outer join company_tag this_1_ on
>> this_.id=this_1_.company_id left outer join company_measures companymea2_ on
>> this_.id=companymea2_.company_id left outer join company_descr ces3_ on
>> this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and
>> this_1_.company_id>50000000
>> order by this_.id asc limit 1000;
>>
>> (plan1.txt)
>> Total runtime: 7794.692 ms
>>
>> At the same time if I apply the limit (>50000000) to other columns in
>> query itself it works like a charm:
>>
>> select * from company this_ left outer join company_tag this_1_ on
>> this_.id=this_1_.company_id left outer join company_measures companymea2_ on
>> this_.id=companymea2_.company_id left outer join company_descr ces3_ on
>> this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and
>> this_1_.company_id>50000000
>> and companymea2_.company_id>50000000 and ces3_.company_id>50000000
>> order by this_.id asc limit 1000;
>>
>> (plan2.txt)
>> Total runtime: 27.547 ms
>>
>> I've thought and someone in this list've told me that this should be done
>> automatically.
>>
> Yes, if you have in a query a=b and b=c, then the optimizer figures out
> that a=c as well. (a,b and c are then member of the same equivalence class).
>
> However both queries are not the same, since the joins you're using are
> outer joins. In the first it's possible that records are returned for
> company records with no matching ces3_ records, the ces3_ records is null in
> that case. In the second query no NULL ces3_ information may be returned.
>

OK, but when I move limit to join condition the query is still fast:

select * from company this_ left outer join company_tag this_1_ on
this_.id=this_1_.company_id
left outer join company_measures companymea2_ on
this_.id=companymea2_.company_id and companymea2_.company_id>50000000
left outer join company_descr ces3_ on this_.id=ces3_.company_id and
ces3_.company_id>50000000
where this_1_.tag_id = 7 and this_.id>50000000
and this_1_.company_id>50000000
order by this_.id asc limit 1000;

(plan3.txt),
Total runtime: 26.327 ms
BTW: Changing slow query to inner joins do not make it fast

>
> Another thing is it seems that the number of rows guessed is far off from
> the actual number of rows, is the number 5000000 artificial or are you're
> statistics old or too small histogram/mcv's?
>

Nope, I suppose this is because of limit. If I remove the limit, the
estimations are quite correct. There are ~6 millions of row in each table.

Attachment Content-Type Size
plan3.txt text/plain 1.7 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Віталій Тимчишин 2010-04-16 12:59:50 Re: Planner not using column limit specified for one column for another column equal to first
Previous Message Yeb Havinga 2010-04-16 08:31:06 Re: Planner not using column limit specified for one column for another column equal to first