Re: Proposed Query Planner TODO items

From: Dennis Haney <davh(at)diku(dot)dk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposed Query Planner TODO items
Date: 2004-02-13 08:27:31
Message-ID: 402C8A73.20804@diku.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

You are refering to:

@inproceedings{ hellerstein93predicate,
author = "Joseph M. Hellerstein and Michael Stonebraker",
title = "Predicate migration: optimizing queries with expensive
predicates",
pages = "267--276",
year = "1993",
abstract = "The traditional focus of relational query optimization
schemes has been on the choice of join methods and join orders.
Restrictions have typically been handled in query optimizers by
"predicate pushdown" rules, which apply restrictions in some random
order before as many joins as possible. These rules work under the
assumption that restriction is essentially a zero-time operation.
However, today's extensible and object-oriented database systems allow
users to define time-consuming functions,...",
url = "citeseer.nj.nec.com/article/hellerstein92predicate.html" }

Tom Lane wrote:

>I think the key issue here is that the two EXISTS tests depend only on
>l1.l_orderkey and l1.l_suppkey of the outer query. Therefore they get
>"pushed down" in the plan tree to be evaluated during the initial scan
>of l1. This is normally a good heuristic choice, but because the EXISTS
>tests are relatively expensive, that ends up forcing the planner to use
>a nestloop-with-inner-index-scan join between nation/supplier and l1.
>Any other join technique will involve a seqscan of l1 causing the EXISTS
>tests to be evaluated at every row of lineitem; the planner correctly
>ranks those alternatives as even worse than this.
>
>The trouble is that the nestloop is hugely expensive: you can see that
>the repeated indexscans on l1 take up 1912.454*760 - 0.066*277343 -
>0.812*287821 or 1201449.750 msec, about 80% of the total.
>
>It seems that the correct way to plan this query would require
>postponing evaluation of the EXISTS clauses. If those were further up
>the tree, the planner would have chosen a merge or hash join at this
>step, which would probably take a tenth as much time. The cost to run
>the EXISTS clauses themselves wouldn't change; they'd not be executed
>any more frequently in this case.
>
>I recall seeing traces in the code of logic that would attempt to delay
>the evaluation of expensive WHERE tests, but that's been gone since
>Berkeley days. Perhaps we should think about resurrecting it, or at
>least putting in some kind of heuristic to try to cope better with this
>case.
>
>It would be interesting to see what the runtime looks like if you add
>the following to the WHERE clauses of both inner EXISTS:
> AND s_nationkey = n_nationkey AND o_orderkey = l1.l_orderkey
>This would not change the results AFAICS, but it would force the
>evaluation of the EXISTS clauses up to the top level of the outer plan
>(since the planner would then see 'em as join constraints).
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>
>

--
Dennis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Gibson 2004-02-13 09:19:05 dblink - custom datatypes NOW work :)
Previous Message Richard Huxton 2004-02-13 07:22:17 Re: 7.4 - FK constraint performance