Re: wip: functions median and percentile

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-09-20 03:47:09
Message-ID: AANLkTi=P1-S7zF4ed544LrE8-L+C2qa9T6pSd8nU5h_a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-rrreviewers

2010/8/19 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> Hello
>
> I am sending a prototype implementation of functions median and
> percentile. This implementation is very simple and I moved it to
> contrib for this moment - it is more easy maintainable. Later I'll
> move it to core.

I've reviewed this patch.

* The patch can apply cleanly and make doesn't print any errors nor
warnings. But it doesn't touch contrib/Makefile so I had to make by
changing dir to contrib/median.

* Cosmetic coding style should be more appropriate, including trailing
white spaces and indentation positions.

* Since these two aggregates use tuplesort inside it, there're
possible risk to cause out of memory in normal use case. I don't think
this fact is critical, but at least some notation should be referred
in docs.

* It doesn't contain any document nor regression tests.

* They should be callable in window function context; for example

contrib_regression=# select median(a) over (order by a) from t1;
ERROR: invalid tuplesort state

or at least user-friend message should be printed.

* The returned type is controversy. median(int) returns float8 as the
patch intended, but avg(int) returns numeric. AFAIK only avg(float8)
returns float8.

* percentile() is more problematic; first, the second argument for the
aggregate takes N of "N%ile" as int, like 50 if you want 50%ile, but
it doesn't care about negative values or more than 100. In addition,
the second argument is taken at the first non-NULL value of the first
argument, but the second argument is semantically constant. For
example, for (1.. 10) value of a in table t1,

contrib_regression=# select percentile(a, a * 10 order by a) from t1;
percentile
------------
1
(1 row)

contrib_regression=# select percentile(a, a * 10 order by a desc) from t1;
percentile
------------
10
(1 row)

and if the argument comes from the subquery which doesn't contain
ORDER BY clause, you cannot predict the result.

That's all of my quick review up to now.

Regards,

--
Hitoshi Harada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-09-20 03:47:35 Re: So where did that $Id$ come from?
Previous Message Josh Berkus 2010-09-20 02:31:52 Re: Configuring synchronous replication

Browse pgsql-rrreviewers by date

  From Date Subject
Next Message David Fetter 2010-09-20 05:11:02 Re: [HACKERS] wip: functions median and percentile
Previous Message David Fetter 2010-09-17 16:53:47 client authentication hook