Re: UNION ALL has higher cost than inheritance

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UNION ALL has higher cost than inheritance
Date: 2010-11-08 19:56:31
Message-ID: AANLkTi=-jCsqj3v-pTZfOg+KbEr1S-AzSiOSdkU=mbvX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I did some hacking on this and came up with the attached patch, which
>> could use a bit more work on the comments but passes regression tests.
>> However, this just solves the issue of being smart about top-level
>> UNION ALL cases.  It might be worth looking into using MergeAppend
>> for the sorting required for other types of set operations.  That would
>> involve quite a different patch, and I'm not sure if it'd remove the
>> need for this one or not.
>
> I spent some more time thinking about this.  It seems difficult to make
> any real further progress without essentially throwing out the current
> approach to planning set-operation trees and starting over.  What we'd
> want is to generate Paths representing the different ways a set-op could
> be implemented and then pick the best one.  I can see the general shape
> of how to do that: a set-op should be thought of as generating a
> constrained join relation in a fashion similar to an OUTER JOIN
> construct, and then the different Paths are possible implementations for
> the join rel.  We'd need new Path types for hashed or sorted UNION,
> INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo
> struct (or extend that struct to cover these cases).

It strikes me that INTERSECT ALL and EXCEPT ALL are pretty much JUST a
semi- or anti-join (on the entire set of output columns). But
without ALL it's completely different.

> It strikes me that maybe top-level DISTINCT could be married into this
> too, since it's basically the same as a one-input UNION operator.  But
> not sure what to do with DISTINCT ON.
>
> Another thought here is that the current definition of the SetOp
> execution node type could be improved.  Instead of taking a single
> pre-merged input tuple stream, it'd be better if it had two inputs
> like a JOIN node.  Then we could eliminate the "flag" column, which
> would simplify matters for planning and reduce the amount of data
> that has to be passed through sorting.  SetOp itself would need to
> borrow some of the capability of MergeAppend so that it could read
> two sorted streams in parallel and keep them in sync.
>
> But this all looks like a pretty substantial amount of work, and
> given the low level of user demand for improving the performance of
> set operations, it seems to belong fairly far down the to-do list.
> So I'm not going to tackle it now.  Barring objection, I'll clean up
> yesterday's patch a bit more and commit it.

I agree. If we had infinite resources it would be nice to tackle
this, but I think we have bigger fish to fry. In particular, I wonder
if you've thought any more about the generalized inner-indexscan
machinery, or taken a look at any of the issues around KNNGIST. I'd
like to see our limited supply of planner-fu invested in those areas,
or perhaps in making inner join removal work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-08 19:59:48 Re: How to share the result data of separated plan
Previous Message Peter Eisentraut 2010-11-08 19:49:25 Re: Should we use make -k on the buildfarm?