Re: BUG #15618: Duplicating a join clause breaks index use

From: Alex <cdalxndr(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: BUG #15618: Duplicating a join clause breaks index use
Date: 2019-02-02 15:53:30
Message-ID: 602024371.2009661.1549122810083@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Indeed, increasing those two parameter values results in a fast query.

With default values, the planning time is ~2ms but execution time is ~4600ms.With increased values, the planning time increases by 0.5ms (2.5ms) but execution time drops to 0.7ms (aprox x6000 faster).
As the impact on the execution time is very high, shouldn't another mechanism be used instead of parameters join_collapse_limit or from_collapse_limit, such as some maximum planning time?
With a maximum planning time, faster machines will benefit from better query planning.

Another suggestion is to run a benchmark on the machine to automatically determine some good values, and attach hardware info to the results, to recompute in case hardware changes (although hdd performance may degrade with fragmentation and ssd with space occupied).
On Saturday, February 2, 2019, 5:08:15 PM GMT+2, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

Alex <cdalxndr(at)yahoo(dot)com> writes:
> select
> ...
> from
>     product p
>     left outer join product_property pp on p.id=pp.product_id
>     left outer join product_property_value ppv on pp.value_id=ppv.id
>     left outer join description_element de on pp.description_source=de.id
>     left outer join description_value dev on de.description_value_id=dev.id
>     left outer join description_element dep on de.parent=dep.id
>     left outer join description_value depv on dep.description_value_id=depv.id
>     left outer join shop_category sc_dup on p.shop_category=sc_dup.id           --duplicate join
>     left outer join shop_category sc on p.shop_category=sc.id
>     left outer join category c on sc.category=c.id
> where
>     c.id=8511

I suspect the real issue here has little to do with the "duplicate" nature
of the extra join, and a lot to do with the fact that your query exceeds
join_collapse_limit.  That causes the planner to break it up in a
semi-arbitrary way to limit planning time.  Adding one more join moves
where the break happens, and evidently is preventing the planner from
discovering a good join order.

You want join_collapse_limit to be at least 10 to handle this query well,
since 10 tables are mentioned.  The default value, 8, is kinda tuned for
older and slower machines than most of us use today.  I wouldn't make it
enormous, because it does prevent exponential planning time growth for
many tables, but you could likely get away with 15 or so.  You should
likely keep from_collapse_limit in sync with that too.

            regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Evgeniy Efimkin 2019-02-02 17:41:29 Re: BUG #15617: walsender hang if sync replica disconnected from network
Previous Message Tom Lane 2019-02-02 15:08:05 Re: BUG #15618: Duplicating a join clause breaks index use