## Perl Standard Deviation function is wrong !

From: Andreas Zeugswetter "'colink(at)latticesemi(dot)com'" , "'jason(at)wagner(dot)com'" "'dg(at)illustra(dot)com'" , "'hackers(at)postgresql(dot)org'" Perl Standard Deviation function is wrong ! 1998-06-05 10:05:18 01BD907A.6B7A9F10@zeugswettera.user.lan.at (view raw, whole thread or download thread mbox) 1998-06-05 10:05:18 from Andreas Zeugswetter  1998-06-05 15:16:27 from Brook Milligan   1998-06-06 01:41:07 from Colin Kuskie pgsql-hackers
```Hi,

First of all I would like to thank you for your work on the Statistics Module.
Unfortunately a lot of books differ in their formula for variance and stdev.
In Europe the below corrected definition where stdev is not simply the sqrt of variance
seems to be more popular.
For large populations (>400) the calculation will be almost the same,
but for small populations (like 5) the below calculation will be different.

[Hackers] please forget my last mail to this subject. It was wrong.
Tanx
Andreas Zeugswetter

David Gould wrote:
>The Perl Module "Statistics/Descriptive" has on the fly variance calculation.
>
>  my \$self = shift;  ##Myself
>  my \$oldmean;
>  my (\$min,\$mindex,\$max,\$maxdex);
>
>  ##Take care of appending to an existing data set
>  \$min    = (defined (\$self->{min}) ? \$self->{min} : \$_[0]);
>  \$max    = (defined (\$self->{max}) ? \$self->{max} : \$_[0]);
>  \$maxdex = \$self->{maxdex} || 0;
>  \$mindex = \$self->{mindex} || 0;
>
>  ##Calculate new mean, pseudo-variance, min and max;
>  foreach (@_) {
>    \$oldmean = \$self->{mean};
>    \$self->{sum} += \$_;
>    \$self->{count}++;
>    if (\$_ >= \$max) {
>      \$max = \$_;
>      \$maxdex = \$self->{count}-1;
>    }
>    if (\$_ <= \$min) {
>      \$min = \$_;
>      \$mindex = \$self->{count}-1;
>    }
>    \$self->{mean} += (\$_ - \$oldmean) / \$self->{count};
>    \$self->{pseudo_variance} += (\$_ - \$oldmean) * (\$_ - \$self->{mean});
>  }
>
>  \$self->{min}          = \$min;
>  \$self->{mindex}       = \$mindex;
>  \$self->{max}          = \$max;
>  \$self->{maxdex}       = \$maxdex;
>  \$self->{sample_range} = \$self->{max} - \$self->{min};
>  if (\$self->{count} > 1) {
>    \$self->{variance}     = \$self->{pseudo_variance} / (\$self->{count} -1);
>    \$self->{standard_deviation}  = sqrt( \$self->{variance});

Most books state:
\$self->{variance}     = \$self->{pseudo_variance} / \$self->{count};
\$self->{standard_deviation}  = sqrt( \$self->{pseudo_variance} / ( \$self->{count} - 1 ))

>  }
>  return 1;
>}

```

### pgsql-hackers by date

 Next: From: Ulrich Voss Date: 1998-06-05 12:31:24 Subject: Re: [HACKERS] keeping track of connections Previous: From: Maarten Boekhold Date: 1998-06-05 08:43:40 Subject: Re: [HACKERS] NEW POSTGRESQL LOGOS