Re: [HACKERS] Implementing STDDEV and VARIANCE

From: Jeroen van Vianen <jeroen(at)design(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Don Baccus <dhogaza(at)pacifier(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Implementing STDDEV and VARIANCE
Date: 2000-01-24 10:25:38
Message-ID: 4.2.2.20000124111157.00aa2530@mail.design.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 11:27 PM 1/23/00 -0500, Tom Lane wrote:
>Jeroen van Vianen <jeroen(at)design(dot)nl> writes:
> > aggname
> > aggowner
> > aggtype
> > aggtranstype [ n, sx, sx2 ]
> > agginitfunction function that does ( n = 0, sx = 0.0, sx2 = 0.0 )
> > aggtransfunction function that does ( n = n + 1, sx = sx + x,
> > sx2 = sx2 + x * x )
> > aggfinalizefunction function that returns (sx2 - (1/n) * sx * sx ) /
> > n
>
>Right, that's pretty much what I'm visualizing. One minor detail: there
>is not an "agginitfunction", there is an "agginitvalue". So your
>special datatype to hold n/sx/sx2 must have at least a typinput function
>that can convert the text string held in pg_aggregate into the desired
>internal form of the initial state. (At least, that's how it's done
>now. Do you want to argue to change it? As long as we're opening up
>the AGG design for reconsideration, we could revisit that choice too.)

I would suggest supplying an initfunction that initializes the datatype
that holds n/sx/sx2 so you're able to set individual members to NULL if so
desired. I also won't need to implement typinput for all required aggregate
types, one small headache less ;-)

count --> int4
min/max --> basetype
sum --> basetype
avg --> basetype, n
stddev, variance --> n, basetype, basetype

> > Might it be better for me to wait for 7.1 before implementing stddev and
> > variance?
>
>Well, you will need to be pretty fast on the draw if you want to get it
>into 7.0 release, since we will be freezing features for beta in a week.

True.

>But I see no reason that you couldn't implement STDDEV within the
>existing framework; just ignore transfn2 and do it as above. You might
>have some problems with getting the desired response for zero or one
>tuples, but there isn't any way to fix that within the current
>framework :-(. We have to do the function manager rewrite before you
>can have control over when to return a NULL. As long as you are willing
>to live with that, you can have useful functionality now.

The problem with zero or one rows is pretty important IMO if you want to
implement stddev and variance for both population and sample. You won't be
able to explain the difference in outcome if you don't do it right.

Let me wait for you to overhaul the fmgr code and do all the aggregate
stuff right in one sweep. Only thing is: how do we deal with current
user-defined aggregates?

At 09:02 PM 1/23/00 -0800, Don Baccus wrote:
>At 11:27 PM 1/23/00 -0500, Tom Lane wrote:
> >Right, that's pretty much what I'm visualizing. One minor detail: there
> >is not an "agginitfunction", there is an "agginitvalue". So your
> >special datatype to hold n/sx/sx2 must have at least a typinput function
> >that can convert the text string held in pg_aggregate into the desired
> >internal form of the initial state. (At least, that's how it's done
> >now. Do you want to argue to change it? As long as we're opening up
> >the AGG design for reconsideration, we could revisit that choice too.)
>
>At the moment I have a hard time visualizing an aggregate function where
>a constant initializer wouldn't serve, but ... what would be the cost of
>the generalization? It would only be called once per query or subquery
>containing the aggregate, right?

Initializer functions for count need to return 0, for min, max, avg, sum,
stddev and variance they need to set individual members to NULL (at least
that's how I see it now). A function returning this (with the new fmgr
code) would be very easy to implement (I hope ;-) ).

I'll hold my breath until the dust settles and we're starting 7.1.

Jeroen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas SB 2000-01-24 10:53:45 AW: [HACKERS] Implementing STDDEV and VARIANCE
Previous Message Hannu Krosing 2000-01-24 10:03:42 Re: [HACKERS] Use of Indicies ...