Re: PostgreSQL runs a query much slower than BDE and

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Hardman <peter(at)ssbg(dot)zetnet(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL runs a query much slower than BDE and
Date: 2006-10-02 13:06:50
Message-ID: 1159794410.2659.112.camel@holly
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2006-08-17 at 14:33 -0400, Tom Lane wrote:

> There's a more interesting issue, which I'm afraid we do not have time
> to fix for PG 8.2. The crux of the matter is that given
>
> SELECT ...
> FROM SHEEP_FLOCK f1 JOIN
> (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
> FROM SHEEP_FLOCK f
> GROUP BY f.regn_no) f2
> ON f1.regn_no = f2.regn_no
> AND f1.transfer_date = f2.last_xfer_date
>
> if there is an index on (regn_no, transfer_date) then the planner could
> in principle do a double-column merge join between an indexscan on this
> index and the output of a GroupAggregate implementation of the subquery.
> The GroupAggregate plan would in fact only be sorting on regn_no, so
> it's not immediately obvious why this is OK. The reason is that there
> is only one MAX() value for any particular regn_no, and so the sort
> condition that the last_xfer_date values be in order for any one value
> of regn_no is vacuous. We could consider the subquery's output to be
> sorted by *any* list of the form "regn_no, other-stuff".
>
> The planner's notion of matching pathkey lists to determine sortedness
> is not at all capable of dealing with this. After a little bit of
> thought I'm tempted to propose that we add a concept that a particular
> pathkey list is "unique", meaning that it is known to include a unique
> key for the data. Such a key would match, for sortedness purposes,
> any requested sort ordering consisting of its columns followed by
> others. In the above example, we would know that a GROUP BY implemented
> by GroupAggregate yields an output for which the grouping columns
> are a unique sort key.
>
> I imagine this concept is already known in the database research
> literature; anyone recognize it and know a standard name for it?

(catching up on some earlier mails....)

Not seen any particular name for that around. There are quite a few
places in the optimizer, IIRC, that could use the concept of uniqueness
if it existed.

I would note that the above query plan is similar-ish to the one you'd
get if you tried to push down the GROUP BY from the top of a join. So
the uniqueness information sounds like an important precursor to that.

I've just rechecked out the lit I was reading on this earlier this year:
http://portal.acm.org/ft_gateway.cfm?id=233320&type=pdf&coll=&dl=acm&CFID=15151515&CFTOKEN=6184618#search=%22db2%20order%20optimization%20tpc-d%22
"Fundamental Techniques for Order Optimization" Simmen et al

Also, IIRC, there was some work talking about extending the Interesting
Order concept to allow groupings to be noted also.

>From our work on sorting earlier, we had it that a Merge Join will
always require a Mark/Restore operation on its sorted inputs. If the
Outer input is unique then a Restore operation will never be required,
so the Mark can be avoided also and thus the materialization of the sort
can also be avoided. So some way of telling the MJ node that the sort
order is also unique would be very useful.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2006-10-02 15:32:35 Re: any hope for my big query?
Previous Message Tom Lane 2006-10-02 03:01:19 Re: selecting data from information_schema.columns performance.