Re: SELECT too complex?

From: Rory Campbell-Lange <rory(at)campbell-lange(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Postgresql Novice List <pgsql-novice(at)postgresql(dot)org>
Subject: Re: SELECT too complex?
Date: 2003-06-27 22:21:12
Message-ID: 20030627222112.GA553@campbell-lange.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hi Josh. Thanks very much for your input.

On 27/06/03, Josh Berkus (josh(at)agliodbs(dot)com) wrote:

> As for simplifying, I'm afraid that I don't really understand what
> you're trying to get with this query. If I'm clueless, send me a
> version of the query with each clause commented.

The aim of the select is to find all objects which a person has joined.
Objects are shared on boards, recorded in board_objects. People join
objects as board_object_members, which point to instances of
board_objects. So, an object can show up on one or more boards. One can
be a member of the object on one or more boards. One may also not have
the ability to "see" a board_object record of an object because one
isn't joined to that particular board.

So I am also an object and my membership of a board depends on whether
or not a board_objects record exists for that board. (See ***)

Mmm. Clear as mud.

> > FOR resulter IN
> > SELECT
> > o.t_text_id as itemid,
> > pers.t_nickname as itemauthor,
> > o.n_type as itemtype,
> > o.t_name as itemtitle,
> > o.t_description as itembody,
> > to_char(o.dt_created, 'DD Mon'') as itemcreated,
> > to_char(botimer, ''DD on'') as itemupdated,
> > COALESCE (c_count, 0) as itemcommentcount,
> > p.n_id as imageid,
> > COALESCE(p.t_path, ''none'') as imagesrc,
> > p.n_width as imagewidth,
> > p.n_height as imageheight
> > FROM
> > people pers,
> > objects o
> > LEFT OUTER JOIN photo p ON o.n_id_photo = p.n_id
-- join photo on objects. There may or may
-- not be a photo.
> > LEFT OUTER JOIN (
> > SELECT
> > bo.n_object_id as boider,
> > max(bo.dt_modified) as botimer
> > FROM
> > board_objects bo, board_object_members bom
> > WHERE
> > bom.n_person_id = personid ***
> > AND
> > bom.n_board_object_id = bo.n_id
> > GROUP BY
> > bo.n_object_id
> > ) as timer

-- here I grab the most recently updated instance of a
-- board_ object to which I am attached. It is important to
-- grab the latest as I may be attached to the same object
-- on several boards, each which may have different
-- update times.
-- personid is my object id.
>
> You have a lot of left outer joins on aggregate subselects in this query.
> try putting the subselect in the SELECT clause, and using ORDER BY ____ DESC
> LIMIT 1 instead of MAX.

OK. Sounds sensible.

> Also, I fail to see the reason you need to left outer join on Board_Objects,
> when as subselect to board_objects is in your WHERE criteria. Presumably, if
> there are no records in board_objects, you'll lose the row from person
> anyway.
> What about doing a straight join to board_objects and simplifing your outer
> joins thereby?

Good point! Excellent analysis; thanks. Presumably reducing outer
joins will in theory reduce the query time?
>
> > ON o.n_id = timer.boider
> > LEFT OUTER JOIN (
> > SELECT
> > bo.n_object_id as c_bo_id,
> > count(comm.n_id) as c_count
> > FROM
> > board_objects bo, comments comm
> > WHERE
> > comm.n_board_object_id = bo.n_id
> > AND
> > bo.n_board_id IN
>
> You *definitely* don't want to use IN in this subquery, unless you're on 7.4.
> Use EXISTS instead. I'm also unclear on why you need this IN subquery at all.
> You're checking board_objects against itself; why not just put it in the main
> subselect?

Mmm. This bit of the query grabs the count of
all comments on all versions of the object that
I'm attached to. It disregards comments on
board_objects I'm not joined to.

To paraphrase:
Get count of comments on board_objects where
board_objects.object_id is this object id BUT
only get objects on boards where I'm a member.

I haven't really used EXISTS. Makes sense,
thanks.

>
> > (SELECT
> > n_board_id
> > FROM
> > board_objects
> > WHERE
> > n_object_id = persobjid
> > )
> > GROUP BY
> > bo.n_object_id, bo.n_object_id
> > ) as counter
> > ON o.n_id = counter.c_bo_id
> > WHERE
> > o.n_creator = pers.n_id
> > AND
> > o.n_id IN (
>
> Once again, through 7.3, use EXISTS, not IN.

Yup.
>
> > SELECT DISTINCT ON (bo.dt_modified, bo.n_object_id)
> > bo.n_object_id as bo_obid
>
> Why do you need a SELECT DISTINCT if you're just checking for existence?

I found that a normal query was finding more copies of an object because
an object can be reflected several times in board_objects.
>
> > FROM
> > board_objects bo, board_object_members bom
> > WHERE
> > bo.n_id = bom.n_board_object_id
> > AND
> > bom.n_person_id = personid
> > )
> > ORDER BY
> > botimer DESC
> > LIMIT
> > locallimit
> > OFFSET
> > localoffset LOOP
>
> Overall, yeah, I think you're making the query too complex. You have 3
> subselects consulting the same table; surely at least 2 of these could be
> consolidated.

Going through it I'm not sure the aims of the query subselects are all
that wrong (i.e. most recent object I'm attached to; sum of comments on
this object that I'm attached to but only in the context of the boards I
belong to; ensuring that I'm WHEREing only objects that I can access).
Although I could be horribly wrong :-).

However your analysis has made me think much more deeply about how the
joins work and whether or not I need left outer joins. More than that,
the difficulty I've had explaining the database structure to you means
that I should carefully reconsider it.

Many thanks for the time you've taken to answer my question.

Rory
--
Rory Campbell-Lange
<rory(at)campbell-lange(dot)net>
<www.campbell-lange.net>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Joe Conway 2003-06-27 23:21:14 Re: ERROR: tuplestore_gettuple: invalid state
Previous Message Nabil Sayegh 2003-06-27 20:21:17 Re: connectby_reverselookup()