Re: Optimising SELECT on a table with one million rows

From: "Bryan Murphy" <bryan(dot)murphy(at)gmail(dot)com>
To: "Cultural Sublimation" <cultural_sublimation(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Optimising SELECT on a table with one million rows
Date: 2007-07-30 17:14:05
Message-ID: bd8531800707301014m6ca15207meb9efe684725b2d1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

First question... did you create the appropriate indexes on the appropriate
columns for these tables? Foreign keys do not implicitly create indexes in
postgres.

Bryan

On 7/30/07, Cultural Sublimation <cultural_sublimation(at)yahoo(dot)com> wrote:
>
> Hi,
>
> I'm fairly new with Postgresql, so I am not sure if the performance
> problems I'm having are due to poorly constructed queries/indices,
> or if I bumped into more fundamental problems requiring a design of
> my database structure. That's why I'm requesting your help.
>
> Here's the situation: I have three tables: Users, Stories, and Comments.
> Stories have an author (a user), and a comment is associated with a
> story and with the user who posted it. The structure of the database
> is therefore fairly simple: (there are also some sequences, which I've
> omitted for clarity)
>
>
> CREATE TABLE users
> (
> user_id int UNIQUE NOT NULL,
> user_name text,
> PRIMARY KEY (user_id)
> );
>
>
> CREATE TABLE stories
> (
> story_id int UNIQUE NOT NULL,
> story_title text,
> story_body text,
> story_timestamp timestamptz,
> story_author int REFERENCES users (user_id) NOT NULL,
> PRIMARY KEY (story_id)
> );
>
>
> CREATE TABLE comments
> (
> comment_id int UNIQUE NOT NULL,
> comment_title text,
> comment_body text,
> comment_timestamp timestamptz,
> comment_story int REFERENCES stories (story_id) NOT
> NULL,
> comment_author int REFERENCES users (user_id) NOT NULL,
> PRIMARY KEY (comment_id)
> );
>
>
> I've also populated the database with some test data, comprising 1,000
> users, 1,000 stories (one per user), and 1,000,000 comments (one comment
> per user per story).
>
> Now, the query I wish to optimise is also simple: get me all comments (the
> comment_id suffices) and corresponding user *names* for a given story.
> If for example the story_id is 100, the query looks like this:
>
> SELECT comments.comment_id, users.user_name
> FROM comments, users
> WHERE comments.comment_story = 100 AND comments.comment_author =
> users.user_id;
>
> The problem is that this query takes a *very* long time. With the said
> 1,000,000 comments, it needs at least 1100ms on my system. "Explain
> analyze" tells me that a sequential scan is being performed on both
> users and comments:
>
> Hash Join (cost=28.50..21889.09 rows=988 width=14) (actual
> time=3.674..1144.779 rows=1000 loops=1)
> Hash Cond: ((comments.comment_author)::integer = (users.user_id
> )::integer)
> -> Seq Scan on comments (cost=0.00..21847.00 rows=988 width=8)
> (actual
> time=0.185..1136.067 rows=1000 loops=1)
> Filter: ((comment_story)::integer = 100)
> -> Hash (cost=16.00..16.00 rows=1000 width=14) (actual time=
> 3.425..3.425
> rows=1000 loops=1)
> -> Seq Scan on users (cost=0.00..16.00 rows=1000 width=14)
> (actual
> time=0.068..1.845 rows=1000 loops=1)
> Total runtime: 1146.424 ms
>
> On the long run, I guess one possible solution to this problem will be
> to partition the comments table into a number of sub-tables, most likely
> based on the timestamp attribute (by having current versus historic data).
> Nevertheless, I am wondering if there are other more straightforward ways
> to optimise this query. Some clever use of indices, perhaps? Or is
> the way I am now constructing the select non-optimal? Or do I need
> some pixie-magic on the Postgresql settings? Anyway, any suggestions
> are welcome! (and thanks in advance)
>
> Regards,
> C.S.
>
>
>
>
>
> ____________________________________________________________________________________
> Yahoo! oneSearch: Finally, mobile search
> that gives answers, not web links.
> http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org/
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2007-07-30 17:16:54 Re: Optimising SELECT on a table with one million rows
Previous Message Cultural Sublimation 2007-07-30 17:01:13 Optimising SELECT on a table with one million rows