Re: updated join removal patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-17 20:59:13
Message-ID: 3835.1253221153@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 15, 2009 at 9:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Here we go again. Following Tom's advice to not insert crocks in
>>> add_path(), I've instead introduced a noopjoin path type which ignores
>>> its inner side.

I've committed this with some revisions. As to the points discussed
earlier:

>> * I'm not really happy with the "NoopJoinPath" representation. ...
>> A possible compromise is to use T_SubqueryScan as the pathtype, with
>> the idea that we're pretending a SubqueryScan is to be inserted and then
>> immediately optimized away. But I don't want the inner path in there
>> in any case.

> I don't have strong feelings about it. I thought about making just a
> Noop path type, but I couldn't see any clear reason to prefer it one
> way or the other.

I ended up using a NoOpPath struct with pathtype = T_Join, which is a
dummy superclass nodetag that never really gets instantiated. The
SubqueryScan idea looked too messy after I remembered that createplan.c
likes to switch() on the pathtype, so having two different pathtype
uses of T_SubqueryScan seemed pretty ugly. But T_Join isn't really used
anywhere, despite being nominally a executable plan type. It's pretty
much a judgment call whether that's really cleaner than using the path's
own NodeTag...

>> * I'm quite unimpressed with the refactoring you did to make a single
>> function deal with merge clauses, hash clauses, *and* join removal
>> clauses. I think that makes it close to unmaintainable and is buying
>> little if anything speedwise.

> You're the committer; I'm not. But I completely disagree. There
> isn't any reason at all to duplicate this logic in two separate
> places, let alone three. I'd actually be in favor of merging the
> existing two cases even if we weren't adding join removal.

No, I still think this was a bad idea. There are *parts* of those
tests that are similar, but combining them all into one function is
just a recipe for bugs.

> As for a GUC, I think it would be useful to have for debugging, or in
> case of bugs. It's really another join strategy, and we have enable_*
> flags for all the others. But if you don't want it for some reason,
> I'm not in a position to twist your arm.

I left it out for the moment. If anyone can point to a case where the
join removal logic slows planning noticeably without gaining anything,
we can reconsider.

>> * Not entirely sure where to put the code that does the hard work
>> (relation_is_distinct_for).

> Yeah, it seemed a little weird to me. For a while I was reusing some
> of the support functions that query_is_distinct_for() calls, but the
> final version doesn't. I wonder if it should just be moved to
> joinpath.c; it seems to fit in better with what is going on there.

I ended up putting the index-specific logic in indxpath.c, which aside
from seeming reasonable made it easier to deal with expression indexes.
If we ever add another test method that doesn't depend on indexes, we
can put it in a reasonable place and have joinpath.c call that too.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emmanuel Cecchet 2009-09-17 21:00:23 Re: generic copy options
Previous Message Robert Haas 2009-09-17 20:52:45 Re: Linux LSB init script