Re: Improve OR conditions on joined columns (common star schema problem)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve OR conditions on joined columns (common star schema problem)
Date: 2017-02-12 17:32:36
Message-ID: 10353.1486920756@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> This is very interesting. Couldn't this be even more generic and
> instead of looking at just the jointree quals, also look at the join
> conditions too, as I think you can use this to also transforms queries
> such as:

The hard part of that is to remember exactly where you need to do surgery
on the querytree to replace the OR clause, keeping in mind that a tree
copy step is going to happen in between first searching for the OR and
doing the replacement. I'm sure it's soluble but it would've involved
more complexity than I felt justified for a POC.

> I imagine you'd also want an enable_unionor GUC which can be used to
> disable this for when the planner makes a poor choice.

It's not so much poor choices as the cost of the optimization attempt ---
if there's a K-relation OR clause, this will increase the cost of planning
by a factor approaching K+1, whether or not you get a better plan out of
it. I ran the regression tests with some instrumentation and determined
that this logic fires a dozen or two times, and fails to produce a plan
that looks cheaper than the standard plan in any of those cases. So if we
go down this road, not only do we need a GUC but I suspect it had better
default to off; only people using star schemas are really likely to get a
win out of it.

Maybe we could improve that situation by applying some heuristics that
prevent the optimization from being attempted unless it's more likely to
produce a win. But I'm not sure what that would look like. We have to
make the decision pretty early and without a lot of information. As an
example, Jim's case fails completely if we don't do the transformation
before reduce_outer_joins, because one of the things that *has* to happen
to get a desirable plan is to recognize that the extracted restriction
clause allows the left join to the dimension table to be simplified to an
inner join.

Having written the POC, I find myself not terribly satisfied with it.
It seems brute-force and not at all integrated with the rest of the
planner. Still, asking for integration with the existing UNION planning
is probably silly, since that really needs to be thrown away and rewritten.
Maybe this is a reasonable way forward until we get a clearer view of
what that area needs to look like.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Rusinov 2017-02-12 17:55:56 Re: WIP: About CMake v2
Previous Message Magnus Hagander 2017-02-12 15:48:24 RADIUS fallback servers