Skip site navigation (1) Skip section navigation (2)

Re: SELECT too complex?

From: Rory Campbell-Lange <rory(at)campbell-lange(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Postgresql Novice List <pgsql-novice(at)postgresql(dot)org>
Subject: Re: SELECT too complex?
Date: 2003-06-27 22:21:12
Message-ID: 20030627222112.GA553@campbell-lange.net (view raw or flat)
Thread:
Lists: pgsql-novice
Hi Josh. Thanks very much for your input.

On 27/06/03, Josh Berkus (josh(at)agliodbs(dot)com) wrote:

> As for simplifying, I'm afraid that I don't really understand what
> you're trying to get with this query.  If I'm clueless, send me a
> version of the query with each clause commented.

The aim of the select is to find all objects which a person has joined.
Objects are shared on boards, recorded in board_objects. People join
objects as board_object_members, which point to instances of
board_objects. So, an object can show up on one or more boards. One can
be a member of the object on one or more boards. One may also not have
the ability to "see" a board_object record of an object because one
isn't joined to that particular board.

So I am also an object and my membership of a board depends on whether
or not a board_objects record exists for that board. (See ***)

Mmm. Clear as mud.

> >     FOR resulter IN
> >         SELECT
> >             o.t_text_id                            as itemid,
> >             pers.t_nickname                     as itemauthor,
> >             o.n_type                            as itemtype,
> >             o.t_name                            as itemtitle,
> >             o.t_description                     as itembody,
> >             to_char(o.dt_created,  'DD Mon'')   as itemcreated,
> >             to_char(botimer, ''DD  on'')        as itemupdated,
> >             COALESCE (c_count, 0)               as itemcommentcount,
> >             p.n_id                              as imageid,
> >             COALESCE(p.t_path, ''none'')        as imagesrc,
> >             p.n_width                           as imagewidth,
> >             p.n_height                          as imageheight
> >         FROM
> >             people pers,
> >             objects o
> >             LEFT OUTER JOIN photo p ON o.n_id_photo = p.n_id
                            -- join photo on objects. There may or may
                            -- not be a photo.
> >             LEFT OUTER JOIN (
> >                 SELECT
> >                     bo.n_object_id as boider,
> >                     max(bo.dt_modified) as botimer
> >                 FROM
> >                     board_objects bo, board_object_members bom
> >                 WHERE
> >                     bom.n_person_id = personid ***
> >                     AND
> >                     bom.n_board_object_id = bo.n_id
> >                 GROUP BY
> >                     bo.n_object_id
> >             ) as timer

                -- here I grab the most recently updated instance of a
                -- board_ object to which I am attached. It is important to
                -- grab the latest as I may be attached to the same object
                -- on several boards, each which may have different
                -- update times.
                -- personid is my object id.
> 
> You have a lot of left outer joins on aggregate subselects in this query.
> try putting the subselect in the SELECT clause, and using ORDER BY ____ DESC 
> LIMIT 1 instead of MAX.

    OK. Sounds sensible.

> Also, I fail to see the reason you need to left outer join on Board_Objects, 
> when as subselect to board_objects is in your WHERE criteria.  Presumably, if 
> there are no records in board_objects, you'll lose the row from person 
> anyway.
> What about doing a straight join to board_objects and simplifing your outer 
> joins thereby?

    Good point! Excellent analysis; thanks. Presumably reducing outer
    joins will in theory reduce the query time?
> 
> >             ON o.n_id = timer.boider
> >             LEFT OUTER JOIN (
> >                 SELECT
> >                     bo.n_object_id as c_bo_id,
> >                     count(comm.n_id) as c_count
> >                 FROM
> >                     board_objects bo, comments comm
> >                 WHERE
> >                     comm.n_board_object_id = bo.n_id
> >                     AND
> >                     bo.n_board_id IN
> 
> You *definitely* don't want to use IN in this subquery, unless you're on 7.4.
> Use EXISTS instead.  I'm also unclear on why you need this IN subquery at all.  
> You're checking board_objects against itself; why not just put it in the main 
> subselect?
                         
                         Mmm. This bit of the query grabs the count of
                         all comments on all versions of the object that
                         I'm attached to. It disregards comments on
                         board_objects I'm not joined to.

                         To paraphrase:
                         Get count of comments on board_objects where
                         board_objects.object_id is this object id BUT
                         only get objects on boards where I'm a member.

                         I haven't really used EXISTS. Makes sense,
                         thanks.

> 
> >                         (SELECT
> >                             n_board_id
> >                         FROM
> >                             board_objects
> >                         WHERE
> >                             n_object_id = persobjid
> >                         )
> >                 GROUP BY
> >                     bo.n_object_id, bo.n_object_id
> >                 ) as counter
> >                 ON o.n_id = counter.c_bo_id
> >         WHERE
> >             o.n_creator = pers.n_id
> >             AND
> >             o.n_id IN (
> 
> Once again, through 7.3, use EXISTS, not IN.

Yup.
> 
> >                 SELECT DISTINCT ON (bo.dt_modified, bo.n_object_id)
> >                     bo.n_object_id as bo_obid
> 
> Why do you need a SELECT DISTINCT if you're just checking for existence?

I found that a normal query was finding more copies of an object because
an object can be reflected several times in board_objects.
> 
> >                 FROM
> >                     board_objects bo, board_object_members bom
> >                 WHERE
> >                     bo.n_id = bom.n_board_object_id
> >                     AND
> >                     bom.n_person_id = personid
> >             )
> >         ORDER BY
> >             botimer DESC
> >         LIMIT
> >             locallimit
> >         OFFSET
> >             localoffset LOOP
> 
> Overall, yeah, I think you're making the query too complex.  You have 3 
> subselects consulting the same table; surely at least 2 of these could be 
> consolidated.

Going through it I'm not sure the aims of the query subselects are all
that wrong (i.e. most recent object I'm attached to; sum of comments on
this object that I'm attached to but only in the context of the boards I
belong to; ensuring that I'm WHEREing only objects that I can access).
Although I could be horribly wrong :-).

However your analysis has made me think much more deeply about how the
joins work and whether or not I need left outer joins. More than that,
the difficulty I've had explaining the database structure to you means
that I should carefully reconsider it.

Many thanks for the time you've taken to answer my question.

Rory
-- 
Rory Campbell-Lange 
<rory(at)campbell-lange(dot)net>
<www.campbell-lange.net>

In response to

Responses

pgsql-novice by date

Next:From: Joe ConwayDate: 2003-06-27 23:21:14
Subject: Re: ERROR: tuplestore_gettuple: invalid state
Previous:From: Nabil SayeghDate: 2003-06-27 20:21:17
Subject: Re: connectby_reverselookup()

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group