Re: Query Question (one group by vs two separate)

From: "Jonathon Suggs" <jsuggs(at)murmp(dot)com>
To: "Joris Dobbelsteen" <joris(at)familiedobbelsteen(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query Question (one group by vs two separate)
Date: 2008-12-15 16:40:34
Message-ID: ec7191c70812150840u556af81aq3ab5018cce2d88ae@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Joris.

I'm familiar with explain and I do use it quite frequently. My site is just
starting out so I don't think performance is that crucial of an issue at the
onset (premature optimization). Also, there is not that much data yet so
its difficult to have a very accurate picture of what a full production load
would look like.

So I guess I was looking for some type of rule of thumb for when the
different options would be better. I guess I'll just have to keep an eye
out for my trouble queries and work with my indexes and techniques as I
progress. Thanks again.

On Sat, Dec 13, 2008 at 6:51 PM, Joris Dobbelsteen <
joris(at)familiedobbelsteen(dot)nl> wrote:

> Jonathon Suggs wrote, On 10-12-08 20:12:
>
>> I'm asking this as a more general question on which will perform better.
>> I'm trying to get a set of comments and their score/rankings from two
>> tables.
>>
>> *comments*
>> cid (integer, primary key)
>> title
>> body
>>
>> *comment_ratings*
>> cid (integer, primary key)
>> uid (integer, primary key)
>> score
>>
>> *Option 1* (Single group by query)
>> select c.cid,c.title,c.body,coalsece(sum(r.score),0) as score
>> from comments c left outer join comment_ratings r on c.cid = r.cid
>> group by c.cid,c.title,c.body;
>>
>> *Option 2* (Two simple queries)
>> select cid,title,body from comments;
>> select cid,sum(score) from comment_ratings;
>>
>> I know that in general you always want to keep the number of queries to a
>> minimum and that Option 1 is the most logical/preferred way to proceed.
>> However, I've greatly simplified Option 1. In practice I'm going to be
>> pulling a lot more columns and will be inner joining to additional tables.
>> /So my real question is whether having a lot of items in the group by
>> clause will have an adverse effect on performance./
>>
>
> You can answer that yourself... Create the database and fill it with a
> decent set of sample data (if that doesn't already exists). Run analyze to
> get your statistics. Use explain/explain analyze on the queries you are
> interested in.
>
> The manual has very good references. If you like, pgadmin can even create
> graphical representation (which is what I prefer).
>
> The obvious advantage depends on what you want to achieve? If its just a
> dump of a table and an aggregate, you second option might be better.
> If you need both data to transform it into something the user will see,
> probably the first option is better, as you leave combining to the highly
> specialized and optimized software and don't have to do it yourself. The
> choice has to do with index scans and table scans, that depend on the
> situation.
>
> In fact, if the first one doesn't come out nice (first join than aggregate,
> do aggeration in a subquery). So: learn explain!!! Every (decent) RDBMS has
> such a facility.
>
> - Joris
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-12-15 16:45:09 Re: Setting NEW in AFTER trigger
Previous Message Robert Fitzpatrick 2008-12-15 16:32:39 Setting NEW in AFTER trigger