Re: [HACKERS] Standard Deviation function.

From: dg(at)illustra(dot)com (David Gould) stuporg(at)erols(dot)com (Stupor Genius) pgsql-hackers(at)postgreSQL(dot)org, maillist(at)candle(dot)pha(dot)pa(dot)us Re: [HACKERS] Standard Deviation function. 1998-06-05 08:15:46 9806050815.AA04919@hawk.illustra.com (view raw, whole thread or download thread mbox) 1998-06-05 01:20:50 from Chris Albertson  1998-06-05 01:35:56 from "Stupor Genius"   1998-06-05 01:55:07 from Bruce Momjian    1998-06-05 03:22:09 from "Stupor Genius"     1998-06-05 03:24:16 from Bruce Momjian     1998-06-05 08:15:46 from dg(at)illustra(dot)com (David Gould) pgsql-hackers
```>
> > > > I really need a Standard Deviation aggregate function...
> > >
> > > got around to trying to implement it.  I was going to have some
> > > functions that worked on a structure of two doubles to track
> > > the sum and square instead of using only one simple type.
> >
> > needed the average WHILE scanning through the table, which required two
> > passes, which the aggregate system is not designed to do.  I may be
> > wrong on this, though.
>
> I had asked you how to calculate this and the variance early last
> year.  One (I think the variance) was two-pass because of the need
> for the average, but I thought the StdDev would work with the struct.
>
> Been a while and I still haven't configured #(*&^ FreeBSD ppp yet.

The Perl Module "Statistics/Descriptive" has on the fly variance calculation.

my \$self = shift;  ##Myself
my \$oldmean;
my (\$min,\$mindex,\$max,\$maxdex);

##Take care of appending to an existing data set
\$min    = (defined (\$self->{min}) ? \$self->{min} : \$_[0]);
\$max    = (defined (\$self->{max}) ? \$self->{max} : \$_[0]);
\$maxdex = \$self->{maxdex} || 0;
\$mindex = \$self->{mindex} || 0;

##Calculate new mean, pseudo-variance, min and max;
foreach (@_) {
\$oldmean = \$self->{mean};
\$self->{sum} += \$_;
\$self->{count}++;
if (\$_ >= \$max) {
\$max = \$_;
\$maxdex = \$self->{count}-1;
}
if (\$_ <= \$min) {
\$min = \$_;
\$mindex = \$self->{count}-1;
}
\$self->{mean} += (\$_ - \$oldmean) / \$self->{count};
\$self->{pseudo_variance} += (\$_ - \$oldmean) * (\$_ - \$self->{mean});
}

\$self->{min}          = \$min;
\$self->{mindex}       = \$mindex;
\$self->{max}          = \$max;
\$self->{maxdex}       = \$maxdex;
\$self->{sample_range} = \$self->{max} - \$self->{min};
if (\$self->{count} > 1) {
\$self->{variance}     = \$self->{pseudo_variance} / (\$self->{count} -1);
\$self->{standard_deviation}  = sqrt( \$self->{variance});
}
return 1;
}

-dg

David Gould            dg(at)illustra(dot)com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612