Re: Optimizing sum() operations

From: "Dobes Vandermeer" <dobesv(at)gmail(dot)com>
To: "Sean Davis" <sdavis2(at)mail(dot)nih(dot)gov>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Optimizing sum() operations
Date: 2008-10-03 22:38:50
Message-ID: 7324d9a20810031538q406daf89g94f7d57375c90c69@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Fri, Oct 3, 2008 at 12:23 PM, Sean Davis <sdavis2(at)mail(dot)nih(dot)gov> wrote:
> On Fri, Oct 3, 2008 at 3:13 PM, Dobes Vandermeer <dobesv(at)gmail(dot)com> wrote:
>> On Fri, Oct 3, 2008 at 4:51 AM, Sean Davis <sdavis2(at)mail(dot)nih(dot)gov> wrote:
>>> On Fri, Oct 3, 2008 at 4:51 AM, Dobes Vandermeer <dobesv(at)gmail(dot)com> wrote:
>>>> I'm currently using sum() to compute historical values in reports;
>>>> basically select sum(amount) on records where date <= '...' and date
>>>>>= '...' who = X.
>>>>
>>>> Second, if this is a concern, is there a best practice for optimizing
>>>> these kinds of queries?
>>>
>>> You'll need to test to see what performance you get. That said,
>>> indexing is a good place to start. You can always run explain and
>>> explain analyze on the queries to double-check the planner.
>>
>> Could I create an index that includes a sum() function - like:
>>
>> create index sumidx on records (who, date, sum(amount)) ?
>>
>> I'm sure that theoretically this is possible, but does postgres support it?
>
> I'm not sure what you want to do. Trying to make an index on a sum()
> doesn't make any sense because the sum() depends on the rows used in a
> query; i.e., sum() is an aggregate and cannot be used in an index.

Well, if you think about the structure of a B-Tree, each page of the
index contains a list of references to other subtrees, and the key
value ranges for those subtrees. In the case of sum() you could cache
the sum total of that column for an entire subtree and, if the range
of the query includes the whole subtree, you could skip descending
into the subtree and take the sum straight from that page in the
index.

This is a just a general theory that occurred to me, it's probably a
pretty specialized kind of indexing that isn't supported by any RDBMS,
but it's possible there could be a postgres plugin which did this,
though. Has anyone heard of something like that?

> What is wrong with an index on who and date and then doing the sum?

I think that if there are a lot of rows that match the query, it'll
take a long time, so I thought I'd start inquiring about whether
anyone has a good algorithm for accelerating these kinds of queries.

--

Dobes Vandermeer
Director, Habitsoft Inc.
dobesv(at)habitsoft(dot)com
778-891-2922

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Mark Roberts 2008-10-03 23:06:43 Re: Optimizing sum() operations
Previous Message Sean Davis 2008-10-03 19:23:42 Re: Optimizing sum() operations