From: | "Esger Abbink" <pgsql(at)bumblebeast(dot)com> |
---|---|
To: | "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-sql(at)postgresql(dot)org |
Subject: | Re: need some help understanding sloq query |
Date: | 2001-12-06 13:34:08 |
Message-ID: | 200112061334.OAA03446@fikkie.vesc.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
> Esger,
>
> > select some_other_fields from ents e, qtys q where e.set_id =
> > q.set_id and
> > e.e_id = q.e_id and e.set_id in (select set_id from datasets
> > where update_id in (select cur_update_id from current)) and
> > q.other_field = some_const ;
> >
> > this query takes ages :(
>
> I'm not surprised. You're doing a nested subselect with the IN
> operator; frankly, you couldn't make a relatively simple query any
> slower than the above. It looks like the query parser is doing its best
> to optimize, but you've forced it to compare every row in ents JOIN qtys
> against the whole datasets table. What's wrong with:
>
> SELECT some_fields
> FROM ents JOIN qtys USING (set_id)
> JOIN datasets USING (set_id)
> JOIN current ON datasets.update_id = current.cur_update_id
> WHERE other_field = some_constant;
>
i figured i wasnt doing things "right" ;)
unfortunately the query above takes down my postgres (7.0.3 tried on 2 servers). 1 join no problem, 2 joins results in:
DEBUG: Data Base System is in production state at Thu Dec 6 11:42:18 2001
Server process (pid 1095) exited with status 11 at Thu Dec 6 11:45:08 2001
Terminating any active server processes...
Server processes were terminated at Thu Dec 6 11:45:08 2001
Reinitializing shared memory and semaphores
The Data Base System is starting up
this is directly related to executing the query (or just the part with the first 2 joins) and is 100% reproducible sofar.
secondly, from the costs explain gives for the query part i can get through (1st join: merge join costs 7800, index scan costs 2000, index scan costs 4200) it seems that doing a simpler variant of my query (see the other post -followup-) with IN replaced by '=' is a lot faster. the resulting query would be something like:
select some_other_fields from ents e, qtys q where
e.set_id = q.set_id and
e.e_id = q.e_id and
e.set_id = X and
q.other_field = some_const ;
this query has negligable costs (nested loop cost 200, index scan cost 50, index scan cost 4 on test db) but should be enveloped in some for-loop construct to feed it the values for X after which the answers should be "glued" together. the set of X's has no theoritical maximum that i can recall atm but in practice it currently is always size 1 and i dont expect much if any growth. i cant however assume its size 1.
(the above does sound like a complex answer to a simple problem again though..)
> That allows Tom's magic query engine to do its work optimizing.
> Assuming, of course, that there *are* indexes on update_id and
> datasets.set_id ...
>
> I heartily recommend that you read Celko's "SQL for Smarties" (book
> reviews: http://techdocs.postgresql.org/bookreviews.php ). You seem to
> have fallen into the trap of using complex queries to answer simple
> questions, and your database performance is suffering because of it.
will investigate.
>
> -Josh Berkus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>
>
--
NeoMail - Webmail that doesn't suck... as much.
http://neomail.sourceforge.net
From | Date | Subject | |
---|---|---|---|
Next Message | Roberto Mello | 2001-12-06 15:48:57 | Re: Inline IF condition |
Previous Message | Stephan Szabo | 2001-12-06 04:18:51 | Re: Select into |