Re: index for group by

From: Steve Midgley <public(at)misuse(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: index for group by
Date: 2008-07-22 21:17:20
Message-ID: 20080722211730.865606502BA@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

At 09:20 AM 7/22/2008, pgsql-sql-owner(at)postgresql(dot)org wrote:
>Date: Tue, 22 Jul 2008 13:27:24 +0200
>From: "A. Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
>To: pgsql-sql(at)postgresql(dot)org
>Subject: Re: index for group by
>Message-ID: <20080722112724(dot)GD2742(at)a-kretschmer(dot)de>
>
>am Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenberg
>folgendes:
> > Hi,
> >
> > is there a way to speedup "group by" queries with an index?
> >
> > In particular if I have a table like this:
> >
> > CREATE TABLE data
> > (
> > id1 integer,
> > id2 integer,
> > somedata character varying,
> > ts timestamp with time zone
> > );
> >
> > where continously data is logged about "id1" and "id2" into
> "somedata",
> > together with the timestamp when it was logged.
> >
> > So I have multiple rows with the same id1 and id2 but different
> > timestamp (and data maybe).
> >
> > At the moment I have ~40.000.000 rows in that table so doing a
> >
> > SELECT id1, id2 FROM data GROUP BY id1, id2;
>
>
>without a where-clause every select forces a seq-scan.

First, why are you doing a group by when you aren't doing an
aggregation (like COUNT, SUM, etc)? It seems like you can get way
better performance by doing this:

SELECT DISTINCT ON (id1, id2) id1, id2 FROM data ORDER BY id1, id2

(Assuming your compound index is in "id1,id2" order). Am I missing
something?

A different more cumbersome idea I have for you (if you really do need
a GROUP BY) is to build a warehouse table that precalculates the data
you want. You can build some recurring process that runs every NN
minutes or hours and fires off a stored procedure which grabs all the
data from this "data" table, aggregates it and saves it to warehouse
table. You could aggregate against your datetime stamp by N hours or
days as well. If this idea is of interest you can write back to the
list or off-list to me for more info.

Steve

Browse pgsql-sql by date

  From Date Subject
Next Message Jeff Williams 2008-07-22 23:16:11 Re: PERSISTANT PREPARE (another point of view)
Previous Message Tom Lane 2008-07-22 20:46:13 Re: Strange query duration