Re: [SQL] optimizing 2-table join w/millions of rows

From: Herouth Maoz <herouth(at)oumail(dot)openu(dot)ac(dot)il>
To: Michael Olivier <molivier(at)yahoo(dot)com>
Cc: pgsql-sql(at)postgreSQL(dot)org
Subject: Re: [SQL] optimizing 2-table join w/millions of rows
Date: 1998-11-23 13:09:09
Message-ID: l03110700b27f0a10d780@[147.233.159.109]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

At 5:05 +0200 on 23/11/98, Michael Olivier wrote:

>
> > How about:
> >
> > SELECT U.acctname
> > FROM usertest U, bgndtest B
> > WHERE B.acctname = U.acctname
> > AND B.part_needed=3 AND B.loc_needed=5
> > AND U.part=2 AND U.loc=3;
>
> Yes, that looks equivalent. My problem is this is too slow an
> operation as I'm benchmarking it right now. And if I add less-than or
> greater-than comparisons, the performance goes _way_ down from there.
> How can I get the best performance out of this kind of operation?
>
> Is there any way to force postgres to hold certain tables in memory
> all the time? As I said, cost of memory isn't an issue, but
> performance is.

This is very strange. As a general rule, a simple join should always be
faster than a query with a subquery. Especially when that subquery is "in"
and not "exists".

In principle, if there is an index on the part, loc and acctname (separate
indices) in usertest, and on part_needed, loc_needed and acctname in
bgndtest, the most optimal query should be: restrict each of the tables by
the literal comparisons, and then match the acctname.

I don't know who wrote the optimiser, and I'm not sure he is on this list.
There are optimizer settings which can be changed, such as SET R_PLANS ON.

In any case, if you see in "real life" that a subquery actually works
better for you, then avoid the join. No reason to do both:

SELECT acctname
FROM bgndtest
WHERE part_needed=3 and loc_needed=5
AND acctname IN (
SELECT acctname
FROM usertest
WHERE part=2 AND loc=3
);

Or better yet:

SELECT acctname
FROM bgndtest B
WHERE part_needed=3 and loc_needed=5
AND EXISTS (
SELECT *
FROM usertest U
WHERE part=2 AND loc=3
AND U.acctname = B.acctname
);

Of course, you may do it the other way around. It depends on which of your
tables is supposed to have more records, and which of the comparisons is
actually a "=" comparison, which is very important, because less-than and
greater-than don't use the indices.

What I mean by "the other way around" is:

SELECT acctname
FROM usertest U
WHERE part=2 and loc=3
AND EXISTS (
SELECT *
FROM bgndtest B
WHERE part_needed=3 and loc_needed=5
AND U.acctname = B.acctname
);

As for your question - there is no way to keep an entire table in memory,
unless you have a ram disk.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Sferacarta Software 1998-11-23 14:11:29 Re[2]: [SQL] abs value, power, sqrt
Previous Message Michael Olivier 1998-11-23 03:05:14 Re: [SQL] optimizing 2-table join w/millions of rows