Re: Optimizing sum() operations

From: Harold A(dot) Giménez Ch(dot) <harold(dot)gimenez(at)gmail(dot)com>
To: "Dobes Vandermeer" <dobesv(at)gmail(dot)com>
Cc: "Sean Davis" <sdavis2(at)mail(dot)nih(dot)gov>, pgsql-novice(at)postgresql(dot)org
Subject: Re: Optimizing sum() operations
Date: 2008-10-04 01:10:34
Message-ID: c807ef1a0810031810y2f802b71s7e19e2f8cdc67463@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Your best bet is probably to do some bookkeeping on a different table, and
keep it up to date with the aggregations you will require (ie: cache it).
Maintain this table on inserts to the main table via triggers, or whichever
mechanism inserts the data in the first place such as a web app, a script,
etc.

This would definitely scale, although the indexing strategy proposed earlier
seems exotic ;)

On Fri, Oct 3, 2008 at 6:38 PM, Dobes Vandermeer <dobesv(at)gmail(dot)com> wrote:

> On Fri, Oct 3, 2008 at 12:23 PM, Sean Davis <sdavis2(at)mail(dot)nih(dot)gov> wrote:
> > On Fri, Oct 3, 2008 at 3:13 PM, Dobes Vandermeer <dobesv(at)gmail(dot)com>
> wrote:
> >> On Fri, Oct 3, 2008 at 4:51 AM, Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
> wrote:
> >>> On Fri, Oct 3, 2008 at 4:51 AM, Dobes Vandermeer <dobesv(at)gmail(dot)com>
> wrote:
> >>>> I'm currently using sum() to compute historical values in reports;
> >>>> basically select sum(amount) on records where date <= '...' and date
> >>>>>= '...' who = X.
> >>>>
> >>>> Second, if this is a concern, is there a best practice for optimizing
> >>>> these kinds of queries?
> >>>
> >>> You'll need to test to see what performance you get. That said,
> >>> indexing is a good place to start. You can always run explain and
> >>> explain analyze on the queries to double-check the planner.
> >>
> >> Could I create an index that includes a sum() function - like:
> >>
> >> create index sumidx on records (who, date, sum(amount)) ?
> >>
> >> I'm sure that theoretically this is possible, but does postgres support
> it?
> >
> > I'm not sure what you want to do. Trying to make an index on a sum()
> > doesn't make any sense because the sum() depends on the rows used in a
> > query; i.e., sum() is an aggregate and cannot be used in an index.
>
> Well, if you think about the structure of a B-Tree, each page of the
> index contains a list of references to other subtrees, and the key
> value ranges for those subtrees. In the case of sum() you could cache
> the sum total of that column for an entire subtree and, if the range
> of the query includes the whole subtree, you could skip descending
> into the subtree and take the sum straight from that page in the
> index.
>
> This is a just a general theory that occurred to me, it's probably a
> pretty specialized kind of indexing that isn't supported by any RDBMS,
> but it's possible there could be a postgres plugin which did this,
> though. Has anyone heard of something like that?
>
> > What is wrong with an index on who and date and then doing the sum?
>
> I think that if there are a lot of rows that match the query, it'll
> take a long time, so I thought I'd start inquiring about whether
> anyone has a good algorithm for accelerating these kinds of queries.
>
> --
>
> Dobes Vandermeer
> Director, Habitsoft Inc.
> dobesv(at)habitsoft(dot)com
> 778-891-2922
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Nikita Koselev 2008-10-05 06:51:39 array of composite types - how can I launch this function with an array of composite values
Previous Message Dobes Vandermeer 2008-10-03 23:48:40 Re: Optimizing sum() operations