Re: Windowing Function Patch Review -> Standard Conformance

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-12-01 16:13:50
Message-ID: e08cc0400812010813v64f65555t1c31858ebb4d8118@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

patch-2

2008/12/2 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2008/12/1 David Rowley <dgrowley(at)gmail(dot)com>:
>> I wrote:
>>> I was also reading over the standard tonight. I've discovered that the
>>> OFFSET in LEAD() and LAG() is optional. It should default to 1 if it is
>>> not present. Oracle seems to support this.
>>>
>>> SQL2008 says:
>>> > If <lead or lag function> is specified, then:
>>> > i) Let VE1 be <lead or lag extent> and let DT be the declared type of
>>> VE1.
>>> > ii) Case:
>>> > Scalar expressions 209
>>> > WD 9075-2:200w(E)
>>> > 6.10 <window function>
>>> > If <offset> is specified, then let OFF be <offset>. The declared type of
>>> >OFF shall be an
>>> > exact numeric type with scale 0 (zero).
>>> > 1)
>>> > 2) Otherwise, let OFF be 1 (one).
>>>
>>> Yet another variant of LEAD() and LAG() but I think well worth it for both
>>> compliance to the standard and compatibility to Oracle.
>>
>> I figured this was quite simple so I've created a patch to implement this.
>> Can probably put this down to the fact that I'm starting to feel bad about
>> pointing out the mistakes and having someone else fix them. Figured it was
>> time to make some changes myself.
>>
>> I've got limited experience with diff so please let me know if there is
>> something wrong with the patch. Same goes for my changes to the code.
>>
>> I re-sequenced the OIDs of other window functions so it will require initdb.
>>
>> Also I made some updates to the documentation. Wasn't 100% sure on the
>> syntax for the optional arguments there. Hitoshi had: arg1 [,optional1].
>> I've changed this to arg, [optional1], [optional2].
>>
>> One thing I didn't do was update the regression test:
>> SELECT oid, proname FROM pg_proc WHERE proiswfunc;
>>
>> Hopefully this patch will apply after applying Heikki's latest patch
>> (version 3).
>>
>> If you're happy with this Heikki can you merge to your patch?
>
> I merged Heikki's patch with your lead/lag, then added a few tests for
> those new comer functions. Also it contains some of those Tom pointed
> including:
>
> - Remove sgml keyword modifications as it will be generated automatically.
> - Remove PartitionClause and OrderClause since they can be replaced
> with SortGroupClause.
> - Parallel test schedule changed to fit the parallel limit.
> - equalfuncs/nodefuncs are now consistent in Query/WFunc
> - Fix error code, which is now 42P36. But I'm still not sure it is appropriate.
>
> And the patch is against HEAD. The git repository now points "spool"
> branch for his approach, which I suppose will be merged to the master
> (trunk) of the window functions repository.
>
> I tested the spool performance with David's earlier bigtable:
>
> CREATE TABLE bigtable (
> id SERIAL NOT NULL PRIMARY KEY,
> timestamp TIMESTAMP NOT NULL
> );
>
> -- about 383MB of data
> INSERT INTO bigtable (timestamp)
> SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL
> FROM generate_series(1,10000000);
>
> CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp);
>
> VACUUM ANALYZE bigtable;
>
> sample=# SELECT COUNT(*) FROM bigtable;
> count
> ----------
> 10000000
> (1 row)
>
> sample=# SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1;
> lead
> ----------------------------
> 2008-12-02 00:15:10.288461
> (1 row)
>
> sample=# EXPLAIN ANALYZE SELECT LEAD(timestamp) OVER (ORDER BY id)
> FROM bigtable LIMIT 1;
>
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------
> ---------------------------------------------------
> Limit (cost=0.00..0.04 rows=1 width=12) (actual time=0.038..0.039
> rows=1 loops=1)
> -> Window (cost=0.00..386612.13 rows=10000000 width=12) (actual
> time=0.036..0.036 rows=1
> loops=1)
> -> Index Scan using bigtable_pkey on bigtable
> (cost=0.00..286612.13 rows=10000000 w
> idth=12) (actual time=0.018..0.021 rows=2 loops=1)
> Total runtime: 0.071 ms
> (4 rows)
>
>
> shows quite good result. Great work.
>
> The following query works on my build:
>
>> SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary GROUP
>> BY depname;
>> ERROR: variable not found in subplan target list
>
>
> Now, I am thinking about refactoring around aggregate common code, and
> renaming WFunc to WinFunc, which leads pg_proc.proiswfunc be
> pg_proc.proiswinfunc and so on if no objections come.
>
> P.S. seems hit attachment limitation. Sorry if you receive duplicated mail.
>
> Regards,
>
>
>
> --
> Hitoshi Harada
>

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081202-2.gz application/x-gzip 93.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-12-01 17:09:28 Re: Statement-level triggers and inheritance
Previous Message Hitoshi Harada 2008-12-01 16:13:16 Re: Windowing Function Patch Review -> Standard Conformance