Re: Planner not using column limit specified for one column for another column equal to first

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner not using column limit specified for one column for another column equal to first
Date: 2010-04-16 12:59:50
Message-ID: k2y331e40661004160559zb22b8ac2i3d1909c0c8ff0704@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

16 квітня 2010 р. 11:25 Hannu Krosing <hannu(at)2ndquadrant(dot)com> написав:

> On Fri, 2010-04-16 at 11:02 +0300, Віталій Тимчишин wrote:
> > Hello.
> >
> >
> > I have a query that performs very poor because there is a limit on
> > join column that is not applied to other columns:
> >
> >
> > select * from company this_ left outer join company_tag this_1_ on
> > this_.id=this_1_.company_id left outer join company_measures
> > companymea2_ on this_.id=companymea2_.company_id left outer join
> > company_descr ces3_ on this_.id=ces3_.company_id where this_1_.tag_id
> > = 7 and this_.id>50000000
> > and this_1_.company_id>50000000
> > order by this_.id asc limit 1000;
> >
> >
> > (plan1.txt)
> > Total runtime: 7794.692 ms
> >
> >
> > At the same time if I apply the limit (>50000000) to other columns in
> > query itself it works like a charm:
> >
> >
> > select * from company this_ left outer join company_tag this_1_ on
> > this_.id=this_1_.company_id left outer join company_measures
> > companymea2_ on this_.id=companymea2_.company_id left outer join
> > company_descr ces3_ on this_.id=ces3_.company_id where this_1_.tag_id
> > = 7 and this_.id>50000000
> > and this_1_.company_id>50000000
> > and companymea2_.company_id>50000000 and ces3_.company_id>50000000
> > order by this_.id asc limit 1000;
>
> The queries are not the same.
>
> 2nd variant will not return the rows where there are no matching rows
> inthis_1_ , companymea2_ or ces3_.company_id
>
> A query equivalent to first one would be:
>
>
> select * from company this_
> left outer join company_tag this_1_
> on (this_.id=this_1_.company_id
> and this_1_.company_id>50000000)
> left outer join company_measures companymea2_
> on (this_.id=companymea2_.company_id
> and companymea2_.company_id>50000000)
> left outer join company_descr ces3_
> on (this_.id=ces3_.company_id
> and ces3_.company_id>50000000)
> where this_1_.tag_id = 7
> and this_.id>50000000
> order by this_.id asc
> limit 1000;
>

And it's still fast (see plan in another mail), while "inner join" variant
of original query is still slow.

>
>
> I'm not sure that planner considers the above form of plan rewrite, nor
> that it would make much sense to do so unless there was a really small
> number of rows where x_.company_id>50000000
>
> Actually no,
select id > 50000000, count(*) from company group by 1
f,1096042
t,5725630

I don't know why the planner wishes to perform few merges of 1000 to a
million of records (and the merges is the thing that takes time) instead of
taking a 1000 of records from main table and then doing a nested loop. And
it must read all the records that DO NOT match the criteria for secondary
tables before getting to correct records if it do not filter secondary
tables with index on retrieve.

set enable_mergejoin=false helps original query, but this is another problem
and first solution is simpler and can be used by planner automatically,
while second requires rethinking/rewrite of LIMIT estimation logic
(Plan of nested loop attached)

Attachment Content-Type Size
nestplan.txt text/plain 1.4 KB

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Yeb Havinga 2010-04-16 13:21:30 Re: Planner not using column limit specified for one column for another column equal to first
Previous Message Віталій Тимчишин 2010-04-16 12:49:45 Re: Planner not using column limit specified for one column for another column equal to first