Re: Windowing Function Patch Review -> Standard Conformance

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-12-01 16:13:16
Message-ID: e08cc0400812010813u10eb395ewcc55881bce633ca5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2008/12/1 David Rowley <dgrowley(at)gmail(dot)com>:
> I wrote:
>> I was also reading over the standard tonight. I've discovered that the
>> OFFSET in LEAD() and LAG() is optional. It should default to 1 if it is
>> not present. Oracle seems to support this.
>>
>> SQL2008 says:
>> > If <lead or lag function> is specified, then:
>> > i) Let VE1 be <lead or lag extent> and let DT be the declared type of
>> VE1.
>> > ii) Case:
>> > Scalar expressions 209
>> > WD 9075-2:200w(E)
>> > 6.10 <window function>
>> > If <offset> is specified, then let OFF be <offset>. The declared type of
>> >OFF shall be an
>> > exact numeric type with scale 0 (zero).
>> > 1)
>> > 2) Otherwise, let OFF be 1 (one).
>>
>> Yet another variant of LEAD() and LAG() but I think well worth it for both
>> compliance to the standard and compatibility to Oracle.
>
> I figured this was quite simple so I've created a patch to implement this.
> Can probably put this down to the fact that I'm starting to feel bad about
> pointing out the mistakes and having someone else fix them. Figured it was
> time to make some changes myself.
>
> I've got limited experience with diff so please let me know if there is
> something wrong with the patch. Same goes for my changes to the code.
>
> I re-sequenced the OIDs of other window functions so it will require initdb.
>
> Also I made some updates to the documentation. Wasn't 100% sure on the
> syntax for the optional arguments there. Hitoshi had: arg1 [,optional1].
> I've changed this to arg, [optional1], [optional2].
>
> One thing I didn't do was update the regression test:
> SELECT oid, proname FROM pg_proc WHERE proiswfunc;
>
> Hopefully this patch will apply after applying Heikki's latest patch
> (version 3).
>
> If you're happy with this Heikki can you merge to your patch?

I merged Heikki's patch with your lead/lag, then added a few tests for
those new comer functions. Also it contains some of those Tom pointed
including:

- Remove sgml keyword modifications as it will be generated automatically.
- Remove PartitionClause and OrderClause since they can be replaced
with SortGroupClause.
- Parallel test schedule changed to fit the parallel limit.
- equalfuncs/nodefuncs are now consistent in Query/WFunc
- Fix error code, which is now 42P36. But I'm still not sure it is appropriate.

And the patch is against HEAD. The git repository now points "spool"
branch for his approach, which I suppose will be merged to the master
(trunk) of the window functions repository.

I tested the spool performance with David's earlier bigtable:

CREATE TABLE bigtable (
id SERIAL NOT NULL PRIMARY KEY,
timestamp TIMESTAMP NOT NULL
);

-- about 383MB of data
INSERT INTO bigtable (timestamp)
SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL
FROM generate_series(1,10000000);

CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp);

VACUUM ANALYZE bigtable;

sample=# SELECT COUNT(*) FROM bigtable;
count
----------
10000000
(1 row)

sample=# SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1;
lead
----------------------------
2008-12-02 00:15:10.288461
(1 row)

sample=# EXPLAIN ANALYZE SELECT LEAD(timestamp) OVER (ORDER BY id)
FROM bigtable LIMIT 1;

QUERY PLAN

----------------------------------------------------------------------------------------------
---------------------------------------------------
Limit (cost=0.00..0.04 rows=1 width=12) (actual time=0.038..0.039
rows=1 loops=1)
-> Window (cost=0.00..386612.13 rows=10000000 width=12) (actual
time=0.036..0.036 rows=1
loops=1)
-> Index Scan using bigtable_pkey on bigtable
(cost=0.00..286612.13 rows=10000000 w
idth=12) (actual time=0.018..0.021 rows=2 loops=1)
Total runtime: 0.071 ms
(4 rows)

shows quite good result. Great work.

The following query works on my build:

> SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary GROUP
> BY depname;
> ERROR: variable not found in subplan target list

Now, I am thinking about refactoring around aggregate common code, and
renaming WFunc to WinFunc, which leads pg_proc.proiswfunc be
pg_proc.proiswinfunc and so on if no objections come.

P.S. seems hit attachment limitation. Sorry if you receive duplicated mail.

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081202-1.gz application/x-gzip 42.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2008-12-01 16:13:50 Re: Windowing Function Patch Review -> Standard Conformance
Previous Message Robert Haas 2008-12-01 16:05:59 Re: New to_timestamp implementation is pretty strict