Re: [SQL] Subselect performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Lopez <ridruejo(at)atm9(dot)com(dot)dtu(dot)dk>
Cc: pgsql-sql(at)hub(dot)org
Subject: Re: [SQL] Subselect performance
Date: 1999-09-21 14:15:32
Message-ID: 21555.937923332@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Daniel Lopez <ridruejo(at)atm9(dot)com(dot)dtu(dot)dk> writes:
> $list = select d from c
> select b from a where b in ( $list )
> is 5 seconds

> select b from a where b in (select d from c)
> is 3 minutes!! (although it should be at least as fast as a)!

Not necessarily. Your first example is depending on the fact that
the "list" (number of values selected from c) is short. Try it with
10000 or 100000 values from c, if you want to see the backend crash ;-)

The second case works OK even if the sub-select result is large, because
it re-executes the sub-select for each row from a (essentially,
rescanning c to see if b is matched). However, this means the runtime
is proportional to the product of the number of rows in a and c. Ugh.

Try rewriting your query as a join:

select a.b from a, c where a.b = c.d

(you might want "select distinct" here, if b can match many rows from d).
If the system can't figure out anything better than a nested-loop join,
then it'll probably end up taking about the same amount of time, but
this form at least allows the possibility of a smarter join method.

I believe we have a TODO list item to perform this sort of
transformation automatically when the sub-select is of a form that
allows it. We need to get the left/outer join stuff working first
in order to have an exact match of the behavior.

regards, tom lane

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Jrg R. Rudnick 1999-09-21 14:48:57 SQL Functions on Composite Types - what's the purpose??
Previous Message Mike Field 1999-09-21 13:23:47 Search Engine with PHP