Re: two seperate queries run faster than queries ORed

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: Joseph Shraibman <jks(at)selectacast(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: two seperate queries run faster than queries ORed
Date: 2004-03-22 20:44:09
Message-ID: 7925.1079988249@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> Well, you have to be careful on the combination to not give the wrong
> answers if there's a row with u.status=3 that matches a row d.status=3.

We could in theory handle that using something similar to the method
currently used for "OR" indexscans (that is, rather than doing either
UNION- or UNION-ALL-like processing, we drop tuples from later scans
that meet the qual tests of the earlier scans). However I don't see any
clean way in the current planner to cost out both approaches and pick
the cheaper one. It looks to me like we'd have to do over the *entire*
join planning process each way, which is ugly as well as unreasonably
expensive. The problem is that the OR approach only wins when the
component clauses of the OR can drop down to lower levels of the plan
tree if they are considered separately. But a plan tree with a
restriction at a low level and one without it are two different things,
and the dynamic-programming approach we use to build up join plans
doesn't yield the same solutions. (As indeed it shouldn't, since the
whole point of Joseph's example is to get fundamentally different plans
for the two parts of the OR.)

We could possibly approach it heuristically, that is examine the clauses
and try to guess whether it's better to split them apart or not. But
even assuming that we punt on that part of the problem, it seems like a
mess. For instance suppose that there are additional relations in the
query that aren't mentioned in the OR clause. The planner may want to
join some of those relations in advance of forming the join that the OR
itself describes. Pushing down different parts of the OR might cause
the best join path to change. How could you merge multiple scans if
some include extra relations and some don't?

In short, I see how such a plan could be executed, but I don't see any
effective approach for generating the plan ...

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Subbiah, Stalin 2004-03-22 21:00:21 Databases Vs. Schemas
Previous Message Joseph Shraibman 2004-03-22 19:40:04 Re: two seperate queries run faster than queries ORed together