Skip site navigation (1)
Skip section navigation (2)
## Re: wip: functions median and percentile

**Attachment: median3.diff**

Description: text/x-patch (21.3 KB)
### In response to

### Responses

### pgsql-hackers by date

### pgsql-rrreviewers by date

On 5 October 2010 13:14, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote: > 2010/10/5 Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>: >> On 5 October 2010 07:04, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote: >> Extrapolating from few quick timing tests, even in the best case, on >> my machine, it would take 7 days for the running median to use up >> 100MB, and 8 years for it to use 2GB. So setting the tuplesort's >> workMem to 2GB (only in the running median case) would actually be >> safe in practice, and would prevent the temp file leak (for a few >> years at least!). I feel dirty even suggesting that. Better ideas >> anyone? > > So, I suggested to implement median as a *pure* window function aside > from Pavel's aggregate function, and Greg suggested insertion > capability of tuplesort. By this approach, we keep tuplesort to hold > all the values in the current frame and can release it on the last of > a partition (it's possible by window function API.) This is > incremental addition of values and is far better than O(n^2 log(n)) > although I didn't estimate the order. Only when the frame head is > moving down, we should re-initialize tuplesort and it is as slow as > calling aggregate version per each row (but I think we can solve it > somehow if looking precisely). > Possibly, but that sounds like a lot of work to get an efficient algorithm. The 3 cases I see are: 1). Simple aggregate. Current algorithm is O(n log(n)) which is OK. It could be better because a full sort is not strictly needed. As already mentioned, a quickselect would be O(n). 2). Window without ORDER BY. This is actually basically the same as (1), but called once per partition. 3). Window with ORDER BY (running median). The simplest algorithm is O(n^2 log(n)). It could be tweaked to use an insertion sort, but that would still be O(n^2), which is not a lot better for all the effort that would be involved. In theory (perhaps with some kind of tree) it ought to be possible to come up with an O(n log(n)) algorithm, but that would be a lot of work. In the meantime, the attached variation of the patch fixes the temp file issue and will support all 3 cases. It gives OK performance for (1) and (2), and poor performance for (3). That could be viewed as a future development task, which perhaps the window function API would help with. I think it would be a shame to drop support for (2) just because we can't do (3) efficiently yet. Regards, Dean > Regards, > > -- > Hitoshi Harada >

Description: text/x-patch (21.3 KB)

- Re: wip: functions median and percentile at 2010-10-05 12:14:24 from Hitoshi Harada

- Re: wip: functions median and percentile at 2010-10-10 21:16:59 from Tom Lane

Next: From:Robert HaasDate:2010-10-05 13:12:05Subject: Re: Insertion of geometric type column with column[0], column[1] and etc.Previous: From: Stephen FrostDate: 2010-10-05 13:04:55Subject: Re: Insertion of geometric type column with column[0],column[1] and etc.

Next: From:Tom LaneDate:2010-10-10 21:16:59Subject: Re: wip: functions median and percentilePrevious: From: Hitoshi HaradaDate: 2010-10-05 12:14:24Subject: Re: wip: functions median and percentile