Re: Multiple selects versus join

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: "Adrian 'Dagurashibanipal' von Bidder" <avbidder(at)fortytwo(dot)ch>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Multiple selects versus join
Date: 2002-08-20 08:21:48
Message-ID: Pine.LNX.4.33.0208200215410.2333-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 20 Aug 2002, Adrian 'Dagurashibanipal' von Bidder wrote:

> On Tue, 2002-08-20 at 04:00, Jean-Christian Imbeault wrote:
> > Is it more efficient to do multiple selects on multiple tables to get
> > data or do a join of those tables and extract the data from the
> > resulting temp table?
> >
> > For example:
> >
> > select name from a where id=1;
> > select pub_date from b where id=1;
> > pub_id = select publisher_id from c where id=1;
> > select pub_name from d where id = pub_id;
> >
> > or (I don't know the syntax for join so this is just pseudo-sql);
> >
> > select name, pub_date, pub_name from join((join(A, B, C) on id), D) on
> > pub_id where id = 1;
> >
> > All my tables are related to each other so I am wondering which is more
> > efficient, do multile selects or joining the tables. (Joining the table
> > would return just one row).
>
> Yo!
>
> If you *know* that you will get just one row, you know more than the
> database system can know. Generally: a join of tables with n and m rows
> can return at most n*m rows, so joins are not very efficient compared to
> multiple selects. A join is basically a select on both tables with
> additional processing done. If you do multiple selects and can do the
> join 'by hand' because you know much about the data you expect, this
> will almost always be faster.

Not true. What if you're joining on fields that have a unique constraint
on them? There can be only one (to paraphrase Highlander) so you know how
many rows you're gonna get back. It's quite possible you're searching for
a single employee record and all the tables joining that one (even if
there are dozens) are all joined on a unique field that references the
main employee table. In that case, it WILL be faster to let the database
do it in one query, unless it has a brain dead planner.

The real issue is how many rows are you tossing around versus how many is
the database tossing around. So, just like you'd pick the table that
would return the fewest rows (most likely anyway) to process first by
hand, so you should build your query using join syntax that forces
postgresql to use that table first. Indexes make all the difference on
<10% table being returned too.

Basically, if you're gonna return many of the rows in a table, then it may
well be faster to do it in code, but if you know it will be <10% it's much
faster to let the database do it.

Of course, the real reason to let the database do it is to make sure it
gets done right. It's way likelier that you or I will write code that
makes some fatal mistake we don't see than it is that the database will do
that. If you make a mistake in your join syntax you'll know right up
front, when you get 0 or 1,000,000,000,000 rows :-)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Darrin Domoney 2002-08-20 13:15:48 Event recurrence - in database or in application code ????
Previous Message Adrian 'Dagurashibanipal' von Bidder 2002-08-20 08:09:04 Re: Explain Output made Colorful