Skip site navigation (1) Skip section navigation (2)

Re: wip: functions median and percentile

From: David Fetter <david(at)fetter(dot)org>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-09-23 18:48:47
Message-ID: 20100923184847.GC19104@fetter.org (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-rrreviewers
On Thu, Sep 23, 2010 at 08:27:38PM +0200, Pavel Stehule wrote:
> 2010/9/23 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> > 2010/9/23 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> >> Hello
> >>
> >> 2010/9/22 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> >>> 2010/9/22 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> >>>> Hello
> >>>>
> >>>> I found probably hard problem in cooperation with window functions :(
> >>
> >> maybe I was confused. I found a other possible problems.
> >>
> >> The problem with median function is probably inside a final function
> >> implementation. Actually we request possibility of repetitive call of
> >> final function. But final function call tuplesort_end function and
> >> tuplesort_performsort. These function changes a state of tuplesort.
> >> The most basic question is "who has to call tuplesort_end function and
> >> when?
> >
> > Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't
> > clean up its internal state at all and tells it's the executor's
> > responsibility to clear memory. It is allowed since ArrayBuildState is
> > only in-memory state. In the other hand, TupleSort should be cleared
> > by calling tuplesort_end() if it has tapeset member (on file based
> > sort) to close physical files.
> >
> > So 2 or 3 ways to go in my mind:
> 
> it is little bit worse - we cannot to call tuplesort_performsort repetitive.
> 
> >
> > 1. call tuplesort_begin_datum with INT_MAX workMem rather than the
> > global work_mem, to avoid it spills out sort state to files. It may
> > sounds dangerous, but actually memory exhausting can happen in
> > array_agg() as well.
> >
> > 2. add TupleSort an argument that tells not to use file at all. This
> > results in the same as #1 but more generic approach.
> >
> > 3. don't use tuplesort in median() but implement its original sort
> > management. This looks quite redundant and like maintenance problem.
> >
> > #2 sounds like the best in generic and consistent way. The only point
> > is whether the change is worth for implementing median() as it's very
> > system-wide common fundamentals.
> >
> > Other options?
> 
> #4 block median under window clause
> 
> #5 use a C array instead tuplesort under window clause. It is very
> unpractical to use a windows clauses with large datasets, so it should
> not be a problem. More, this can be very quick, because for C array we
> can use a qsort function.
> 
> Now I prefer #5 - it can be fast for using inside windows clause and
> safe when window clause will not be used.

If there's some way to do this using the same code in the windowing
and non-windowing case, that would be much, much better from an
architectural point of view.  Single Point of Truth and all that.

Cheers,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

pgsql-hackers by date

Next:From: Dimitri FontaineDate: 2010-09-23 18:52:28
Subject: Re: Easy way to verify gitignore files?
Previous:From: Robert HaasDate: 2010-09-23 18:39:53
Subject: Re: security label support, revised

pgsql-rrreviewers by date

Next:From: Pavel StehuleDate: 2010-09-23 18:56:53
Subject: Re: wip: functions median and percentile
Previous:From: Pavel StehuleDate: 2010-09-23 18:27:38
Subject: Re: wip: functions median and percentile

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group