Evaluating Subselects

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: "Kapil Tilwani" <karan_pg_2(at)yahoo(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Evaluating Subselects
Date: 2001-06-29 15:02:50
Message-ID: web-79070@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

Kapil,

> I have a query like
> select * from purchasemaster where transactionid in (select
> transactionid from itemmaster where itempriority = 2)

That's a fine query. You could even take it to meet your relatives.
:-)

> ****I need this query for MS-Access, MS-SQL and Postgres. Because
> the application is supposed to be such that for upto 2-3 users, the
> application would be running on MS-Access or MSDE, while for heavier
> databases i.e., greater than 4 , the ideal database would be Postgres

1. Why? In what way is MS Access or MSDE better for a 2-3 user
database? You may wish to re-evaluate your development strategy.

2. You're into a world of hurt, my friend. MSDE has a number of SQL
compliance lapses; MS Access is not SQL92 compliant at all, except for
the simplest SELECT queries. (I say this as someone who gets paid to
develop MS Access/VB)

> Only the problem is ... (correct me if i am wrong, PLEEEEEEEEEEEASE)
> that for each tuple in purchasemaster the sub-query is re-evaluated
> (True or False????). What I am looking at is whether it is possible
> to execute the query in the sub-query just once, get its value as in
> "(1,2,5,23,7,90)", etc. in place of the sub-query and accordingly,
> execute the main query.

Depends on which database you're talking about. PostgreSQL, and, in
theory, MSDE, will only evaluate the IN() expression once (it's
correlated sub-selects that get evaluated multiple times -- and you
can't do these in Access at all).

Access, however, lacks a temporary table space. SInce it has to
evaluate the sub-select entirely in RAM, it evaluates it for every row
in the main table, unless both tables are quite small. As a result,
IN() queries run like molasses in Access. And Access doesn't support
any alternate subselect structures at all.

This does bring up an interesting question for Tom and Stephan:

Which is more efficient, under what circumstances?

1. SELECT a.* FROM a WHERE a.1 IN (SELECT b.1 FROM b WHERE b.2 = 0);

2. SELECT a.* FROM a WHERE EXISTS (
SELECT b.1 FROM b WHERE b.2 = 0 AND b.1 = a.1);

3. SELECT a.* FROM a JOIN
(SELECT b.1 FROM b WHERE b.2 = 0) sub_b
ON a.1 = sub_b.1;

I do a lot of #3 because it's easier to return calculated expressions
from the sub-query that way, but am I shooting myself in the foot,
performance-wise?

-Josh Berkus

P.S. Kapil, you might want to consider buying O'Reilly's "SQL in a
Nutshell" for cross-platform SQL comparisons.

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

In response to

  • at 2001-05-28 11:26:37 from Kapil Tilwani

Browse pgsql-general by date

  From Date Subject
Next Message Janning Vygen 2001-06-29 15:14:46 Re: function cache??
Previous Message Fabrizio Mazzoni 2001-06-29 14:59:05 R: I: Help with indexes/queries/msaccess

Browse pgsql-sql by date

  From Date Subject
Next Message Craig Longman 2001-06-29 15:36:45 create datatype
Previous Message Tom Lane 2001-06-29 14:46:53 Re: NOTIFY "string" from rule