Re: Why could GEQO produce plans with lower costs than the standard_join_search?

From: Donald Dong <xdong(at)csumb(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why could GEQO produce plans with lower costs than the standard_join_search?
Date: 2019-05-22 18:35:07
Message-ID: 93416A2A-A16C-4CE8-8FD5-9C55FFBB25D7@csumb.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you very much for the explanation! I think the join order
benchmark I used [1] is somewhat representative, however, I probably
didn't use the most accurate cost estimation.

I find the cost from cheapest_total_path->total_cost is different
from the cost from queryDesc->planstate->total_cost. What I saw was
that GEQO tends to form paths with lower
cheapest_total_path->total_cost (aka the fitness of the children).
However, standard_join_search is more likely to produce a lower
queryDesc->planstate->total_cost, which is the cost we get using
explain.

I wonder why those two total costs are different? If the total_cost
from the planstate is more accurate, could we use that instead as the
fitness in geqo_eval?

[1] https://github.com/gregrahn/join-order-benchmark

Regards,
Donald Dong

> On May 7, 2019, at 4:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Donald Dong <xdong(at)csumb(dot)edu> writes:
>> I was expecting the plans generated by standard_join_search to have lower costs
>> than the plans from GEQO. But after the results I have from a join order
>> benchmark show that GEQO produces plans with lower costs most of the time!
>
>> I wonder what is causing this observation? From my understanding,
>> standard_join_search is doing a complete search. So I'm not sure how the GEQO
>> managed to do better than that.
>
> standard_join_search is *not* exhaustive; there's a heuristic that causes
> it not to consider clauseless joins unless it has to.
>
> For the most part, GEQO uses the same heuristic (cf desirable_join()),
> but given the right sort of query shape you can probably trick it into
> situations where it will be forced to use a clauseless join when the
> core code wouldn't. It'd still be surprising for that to come out with
> a lower cost estimate than a join order that obeys the heuristic,
> though. Clauseless joins are generally pretty awful.
>
> I'm a tad suspicious about the representativeness of your benchmark
> queries if you find this is happening "most of the time".
>
> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2019-05-22 18:35:31 Re: ACL dump ordering broken as well for tablespaces
Previous Message Tom Lane 2019-05-22 18:27:51 Re: Teach pg_upgrade test to honor NO_TEMP_INSTALL