Re: Denormalizing during select

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Rod Taylor <rbt(at)rbt(dot)ca>
Cc: Edmund Lian <no(dot)spam(at)address(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-03-01 17:53:41
Message-ID: 87d6lbkm6y.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


Rod Taylor <rbt(at)rbt(dot)ca> writes:

> I've been trying to figure out how to give a running total (similar
> issue I think).

Running totals are a "hard problem". They certainly cannot be solved using
aggregates. They're similar to the ranking problem of assigning a sequential
number to each item within each group.

The problem is they share certain properties of aggregate functions, namely
that they require persistent state storage and a state transition function.
But they definitely aren't aggregate functions in that they return a value for
every record, not one for the whole group.

I'm not even clear how to write an embedded (plpgsql or perl or python)
function, since I'm not clear how to allocate space for the state that will be
available for each call on each record but independent from other calls in the
same query. You have to be able to handle two running totals at the same time.

Note that running totals are not very sql-ish. Since sql deals in unordered
sets a running total is pretty ill-defined. It would have to be calculated
after the sort operation or else require you to sort the input tables in a
subquery or something.

To write a proper well-defined sql-ish query for running totals you would have
to do the very inefficient:

select employee_id, salary,
(select count(*) from employees x where x.salary < employee.salary) as salary_rank,
(select sum(salary) from employees x where x.salary < employee.salary) as running_total
from employees
order by salary desc

Finding a way to transform that into the single-scan plan that's obviously the
right way to execute it would be really cool but seems pretty far-fetched. I
don't think any database is capable of it today.

--
greg

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Nicolas Fertig 2003-03-01 18:53:27 OUTER JOIN with filter
Previous Message Tom Lane 2003-03-01 17:13:32 Re: Denormalizing during select