Re: need to speed up query

From: Justin <justin(at)emproshunts(dot)com>
To: PFC <lists(at)peufeu(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: need to speed up query
Date: 2008-05-06 17:41:55
Message-ID: 48209863.9070709@emproshunts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

PFC wrote:
>
>>> What is a "period" ? Is it a month, or something more "custom" ?
>>> Can periods overlap ?
>
>> No periods can never overlap. If the periods did you would be in
>> violation of many tax laws around the world. Plus it you would not
>> know how much money you are making or losing.
>
> I was wondering if you'd be using the same query to compute how
> much was gained every month and every week, which would have
> complicated things.
> But now it's clear.
>
>> To make this really funky you can have a Fiscal Calendar year start
>> June 15 2008 and end on June 14 2009
>
> Don't you just love those guys ? Always trying new tricks to make
> your life more interesting.

Thats been around been around a long time. You can go back a few
hundreds years

>>> Note that here you are scanning the entire table multiple times,
>>> the complexity of this is basically (rows in gltrans)^2 which is
>>> something you'd like to avoid.
>>>
>> For accounting purposes you need to know the Beginning Balances,
>> Debits, Credits, Difference between Debits to Credits and the
>> Ending Balance for each account. We have 133 accounts with
>> presently 12 periods defined so we end up 1596 rows returned for this
>> query.
>
> Alright, I propose a solution which only works when periods don't
> overlap.
> It will scan the entire table, but only once, not many times as
> your current query does.
>
>> So period 1 should have for the most part have Zero for Beginning
>> Balances for most types of Accounts. Period 2 is Beginning Balance
>> is Period 1 Ending Balance, Period 3 is Period 2 ending balance so
>> and so on forever.
>
> Precisely. So, it is not necessary to recompute everything for
> each period.
> Use the previous period's ending balance as the current period's
> starting balance...
>
> There are several ways to do this.
> First, you could use your current query, but only compute the sum
> of what happened during a period, for each period, and store that in a
> temporary table.
> Then, you use a plpgsql function, or you do that in your client,
> you take the rows in chronological order, you sum them as they come,
> and you get your balances. Use a NUMERIC type, not a FLOAT, to avoid
> rounding errors.
>
> The other solution does the same thing but optimizes the first
> step like this :
> INSERT INTO temp_table SELECT period, sum(...) GROUP BY period
>
> To do this you must be able to compute the period from the date
> and not the other way around. You could store a period_id in your
> table, or use a function.
>
> Another much more efficient solution would be to have a summary
> table which keeps the summary data for each period, with beginning
> balance and end balance. This table will only need to be updated when
> someone finds an old receipt in their pocket or something.
>

As i posted earlier the software did do this but it has so many bugs
else where in the code it allows it get out of balance to what really is
happening. I spent a several weeks trying to get this working and find
all the places it went wrong. I gave up and did this query which took
a day write and balance to a point that i turned it over to the
accountant. I redid the front end and i'm off to the races and Fixing
other critical problems.

All i need to do is take Shanun Thomas code and replace the View this
select statement creates

>> This falls under the stupid question and i'm just curious what other
>> people think what makes a query complex?
>
> I have some rather complex queries which postgres burns in a few
> milliseconds.
> You could define complexity as the amount of brain sweat that went
> into writing that query.
> You could also define complexity as O(n) or O(n^2) etc, for
> instance your query (as written) is O(n^2) which is something you
> don't want, I've seen stuff that was O(2^n) or worse, O(n!) in
> software written by drunk students, in this case getting rid of it is
> an emergency...
>

Thanks for your help and ideas i really appreciate it.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-05-06 17:59:09 Re: multiple joins + Order by + LIMIT query performance issue
Previous Message Greg Smith 2008-05-06 17:39:27 Re: Possible Redundancy/Performance Solution