Re: TODO items for window functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TODO items for window functions
Date: 2008-12-29 17:35:47
Message-ID: 9399.1230572147@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> * Investigate whether we should prohibit window functions in recursive
> terms; check whether any of the committed prohibitions are unnecessary.

I looked into these questions a bit. As for the first, there doesn't
appear to be a compelling implementation reason to forbid it, and I
can't find anything in the spec that says to disallow it. SQL:2008's
prohibition on aggregates in recursive terms is in 7.13 <query
expression> syntax rule 2) g) iii) 4), and there's nothing about window
functions in the vicinity.

The primary reason to forbid aggregates, so far as I can divine the
intent of the SQL committee, is that incremental evaluation of an
aggregate would give implementation-dependent results, ie you'd get a
different aggregate result depending on how many and which rows the
implementation chose to push through the recursion at a time. It seems
like the same charge could be leveled against window functions. On the
other hand it's at least possible to construct recursive queries in
which all the rows of a given window partition should get pushed through
together, so that you should get consistent answers despite the overall
incremental evaluation. So I can't tell for sure if the committee
thought about that and intentionally decided to allow window functions
in recursive terms, or if their failure to forbid it was an oversight.
(My confidence in the unerring accuracy of the spec is not high at
the moment ;-).) Nonetheless, the spec does not forbid it, so I feel
we shouldn't either.

As for the second point, I looked at every place that the committed
patch throws an error for queries or expressions containing window
functions. Most are demonstrably per spec, or are necessary
implementation restrictions arising from the fact that we know an
expression isn't going to get fed through the full planner/executor
machinery (an example of the latter is ALTER COLUMN TYPE USING).
The only case that I think is debatable is that we are throwing
error for window functions used in a SELECT FOR UPDATE/FOR SHARE
query. The corresponding error for aggregate functions is necessary
because the executor top level doesn't "see" the individual rows that
went into the aggregate, so there's no way to lock them. In the
case of window functions no aggregation occurs, and so in principle
we could lock the rows. However, consider something like this:

select x, lead(x) over() from table for update limit 1;

Because of the LIMIT, we'd only lock the first-returned row ...
but the values returned would also depend on the second row of the
table, which wouldn't get locked. In general the results could
depend on any or all rows of the table but we might lock only some.
This seems to me to be at variance with how you'd expect SELECT FOR
UPDATE to behave, so I'm inclined to leave the prohibition in there
--- at least until someone comes up with a convincing use-case for
SELECT FOR UPDATE together with a window function, and explains why
he doesn't care about relevant rows possibly not getting locked.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2008-12-29 17:40:35 Re: pg_start_backup without checkpoint patch (a part of Synch Rep)
Previous Message Jaime Casanova 2008-12-29 17:30:59 Re: TODO items for window functions