Re: need some help understanding sloq query

From: "Esger Abbink" <pgsql(at)bumblebeast(dot)com>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: need some help understanding sloq query
Date: 2001-12-06 13:34:08
Message-ID: 200112061334.OAA03446@fikkie.vesc.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> Esger,
>
> > select some_other_fields from ents e, qtys q where e.set_id =
> > q.set_id and
> > e.e_id = q.e_id and e.set_id in (select set_id from datasets
> > where update_id in (select cur_update_id from current)) and
> > q.other_field = some_const ;
> >
> > this query takes ages :(
>
> I'm not surprised. You're doing a nested subselect with the IN
> operator; frankly, you couldn't make a relatively simple query any
> slower than the above. It looks like the query parser is doing its best
> to optimize, but you've forced it to compare every row in ents JOIN qtys
> against the whole datasets table. What's wrong with:
>
> SELECT some_fields
> FROM ents JOIN qtys USING (set_id)
> JOIN datasets USING (set_id)
> JOIN current ON datasets.update_id = current.cur_update_id
> WHERE other_field = some_constant;
>

i figured i wasnt doing things "right" ;)

unfortunately the query above takes down my postgres (7.0.3 tried on 2 servers). 1 join no problem, 2 joins results in:

DEBUG: Data Base System is in production state at Thu Dec 6 11:42:18 2001
Server process (pid 1095) exited with status 11 at Thu Dec 6 11:45:08 2001
Terminating any active server processes...
Server processes were terminated at Thu Dec 6 11:45:08 2001
Reinitializing shared memory and semaphores
The Data Base System is starting up

this is directly related to executing the query (or just the part with the first 2 joins) and is 100% reproducible sofar.

secondly, from the costs explain gives for the query part i can get through (1st join: merge join costs 7800, index scan costs 2000, index scan costs 4200) it seems that doing a simpler variant of my query (see the other post -followup-) with IN replaced by '=' is a lot faster. the resulting query would be something like:

select some_other_fields from ents e, qtys q where
e.set_id = q.set_id and
e.e_id = q.e_id and
e.set_id = X and
q.other_field = some_const ;

this query has negligable costs (nested loop cost 200, index scan cost 50, index scan cost 4 on test db) but should be enveloped in some for-loop construct to feed it the values for X after which the answers should be "glued" together. the set of X's has no theoritical maximum that i can recall atm but in practice it currently is always size 1 and i dont expect much if any growth. i cant however assume its size 1.

(the above does sound like a complex answer to a simple problem again though..)

> That allows Tom's magic query engine to do its work optimizing.
> Assuming, of course, that there *are* indexes on update_id and
> datasets.set_id ...
>
> I heartily recommend that you read Celko's "SQL for Smarties" (book
> reviews: http://techdocs.postgresql.org/bookreviews.php ). You seem to
> have fallen into the trap of using complex queries to answer simple
> questions, and your database performance is suffering because of it.

will investigate.

>
> -Josh Berkus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>
>

--
NeoMail - Webmail that doesn't suck... as much.
http://neomail.sourceforge.net

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Roberto Mello 2001-12-06 15:48:57 Re: Inline IF condition
Previous Message Stephan Szabo 2001-12-06 04:18:51 Re: Select into