Re: Improve OR conditions on joined columns.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve OR conditions on joined columns.
Date: 2017-02-08 23:54:43
Message-ID: 1397.1486598083@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:
> AFAICT this can be transformed into a UNION (not all) if dim.id is
> unique. Does the upper planner pathification make this any easier?

What I did in 9.6 is a first step. The next step, I think, is to
replace prepunion.c with something that can consider more than one
implementation path for a union.

Although ... actually, that may not be the bottleneck for what you're
after. The issue here is not so much discovering a clever plan for
a union as realizing that the query could be cast as a union in the
first place.

Maybe it'd be better to imagine this as something closer to planagg.c,
that is it knows how to apply a specific high-level optimization that
might or might not be a win, so it builds a path describing that and sees
if it looks cheaper than a path done the normal way. The fact that
we could even build a path describing a union is something that wasn't
there before 9.6, but maybe there's enough infrastructure for that now.

> There's another transform using arrays that's possible as well (see
> attached example); I believe that would work regardless of uniqueness.

That doesn't look terribly promising for automated application.
And I think it's really dependent on the exact shape of the OR
clause, which is an unpleasant limitation. Considering the amount
of work this will take to do at all, you'd want it to be pretty
general I think. I'm imagining something that would look for an
OR in which each clause referenced only one rel, then if it can
identify a way to re-unique-ify the result, split into a UNION
with an arm for each rel used in the OR. The nature of the
conditions in each OR arm don't seem to matter ... though probably
you'd have to insist on there not being volatile conditions anywhere
in the query, because they'd get evaluated too many times.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-02-09 00:33:12 WIP: Faster Expression Processing v4
Previous Message Kyle Gearhart 2017-02-08 23:11:20 Re: libpq Alternate Row Processor