Optimisation of INTERSECT expressions

From: "Phil Endecott" <spam_from_postgresql_lists(at)chezphil(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Optimisation of INTERSECT expressions
Date: 2004-03-23 14:12:01
Message-ID: 20040323141200.34D86D1EC02@svr1.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear PostgresQL Experts,

I am trying to get to the bottom of some efficiency problems and hope that
you can help. The difficulty seems to be with INTERSECT expressions.

I have a query of the form
select A from T where C1 intersect select A from T where C2;
It runs in about 100 ms.

But it is equivalent to this query
select A from T where C1 and C2;
which runs in less than 10 ms.

Looking at the output of "explain analyse" on the first query, it seems
that PostgresQL always computes the two sub-expressions and then computes
an explicit intersection on the results. I had hoped that it would notice
that both subexpressions are scanning the same input table T and convert
the expression to the second form.

Is there a reason why it can't do this transformation?

(Of course, I could just re-write my code to use the second form, but my
application is generating these bits of SQL programmatically, and it is not
trivial as in some cases the two tables are not the same so an intersection
really is needed; if PostgresQL can do it for me, that would be much
better. I don't want to write an SQL parser!)

While studying the same code I found another case where my INTERSECT
expressions don't seem to be optimised as much as I'd like. In this case,
one of the subexpressions being intersected is empty much of the time. But
even when it is empty, PostgresQL computes the other (expensive)
subexpression and does an intersect. Could PostgresQL do something like this:

- guess which subexpression is likely to produce fewest rows
- compute this subexpression
- if empty, return now with an empty result
- compute other subexpression
- compute intersection
- return intersection

Alternatively, it could be defined that the left subexpression is always
computed first and the second not computed if it is empty, like the
behaviour of logical AND and OR operators in C.

Thanks in advance for any suggestions.

--Phil.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Szabo 2004-03-23 14:50:53 Re: Optimisation of INTERSECT expressions
Previous Message Adam Ruth 2004-03-23 13:54:02 Re: Databases Vs. Schemas