Quick Links

Re: UNION ALL has higher cost than inheritance

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: UNION ALL has higher cost than inheritance
Date:	2010-11-08 19:56:31
Message-ID:	AANLkTi=-jCsqj3v-pTZfOg+KbEr1S-AzSiOSdkU=mbvX@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I did some hacking on this and came up with the attached patch, which
>> could use a bit more work on the comments but passes regression tests.
>> However, this just solves the issue of being smart about top-level
>> UNION ALL cases. It might be worth looking into using MergeAppend
>> for the sorting required for other types of set operations. That would
>> involve quite a different patch, and I'm not sure if it'd remove the
>> need for this one or not.
>
> I spent some more time thinking about this. It seems difficult to make
> any real further progress without essentially throwing out the current
> approach to planning set-operation trees and starting over. What we'd
> want is to generate Paths representing the different ways a set-op could
> be implemented and then pick the best one. I can see the general shape
> of how to do that: a set-op should be thought of as generating a
> constrained join relation in a fashion similar to an OUTER JOIN
> construct, and then the different Paths are possible implementations for
> the join rel. We'd need new Path types for hashed or sorted UNION,
> INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo
> struct (or extend that struct to cover these cases).

It strikes me that INTERSECT ALL and EXCEPT ALL are pretty much JUST a
semi- or anti-join (on the entire set of output columns). But
without ALL it's completely different.

> It strikes me that maybe top-level DISTINCT could be married into this
> too, since it's basically the same as a one-input UNION operator. But
> not sure what to do with DISTINCT ON.
>
> Another thought here is that the current definition of the SetOp
> execution node type could be improved. Instead of taking a single
> pre-merged input tuple stream, it'd be better if it had two inputs
> like a JOIN node. Then we could eliminate the "flag" column, which
> would simplify matters for planning and reduce the amount of data
> that has to be passed through sorting. SetOp itself would need to
> borrow some of the capability of MergeAppend so that it could read
> two sorted streams in parallel and keep them in sync.
>
> But this all looks like a pretty substantial amount of work, and
> given the low level of user demand for improving the performance of
> set operations, it seems to belong fairly far down the to-do list.
> So I'm not going to tackle it now. Barring objection, I'll clean up
> yesterday's patch a bit more and commit it.

I agree. If we had infinite resources it would be nice to tackle
this, but I think we have bigger fish to fry. In particular, I wonder
if you've thought any more about the generalized inner-indexscan
machinery, or taken a look at any of the issues around KNNGIST. I'd
like to see our limited supply of planner-fu invested in those areas,
or perhaps in making inner join removal work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: UNION ALL has higher cost than inheritance at 2010-11-08 18:19:54 from Tom Lane

Responses

Re: UNION ALL has higher cost than inheritance at 2010-11-08 20:05:16 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2010-11-08 19:59:48	Re: How to share the result data of separated plan
Previous Message	Peter Eisentraut	2010-11-08 19:49:25	Re: Should we use make -k on the buildfarm?