Re: Do we need multiple forms of the SQL2003 statistics

From: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Do we need multiple forms of the SQL2003 statistics
Date: 2006-07-30 18:13:46
Message-ID: Pine.LNX.4.64.0607302205330.30743@lnfm1.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 29 Jul 2006, Tom Lane wrote:

> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> ... From a mathematician's point of view, however, some of these
>> functions normally produce irrational numbers anyway, so it seems
>> unlikely that numeric will be useful. But looking at the definition
>> of, say, regr_avgx(Y, X), if all the input values are integers, it
>> might be useful if I could get an exact integer or rational number as
>> output, instead of a float, that is.
>
> The question is whether this is useful enough to justify adding a
> thousand lines to numeric.c.
>
> I believe also that the numeric forms of the aggregates will be
> enormously slower than the float forms, at least on most modern
> machines with decent floating-point performance. I don't have time
> to do any performance testing though.
>

Based on just simple tests On my laptop (1.8Ghz Centrino) the numeric
versions are 5-10 times slower (depending on what 2arg function is used)

The example:

template1=# \d test
Table "public.test"
Column | Type | Modifiers
--------+------------------+-----------
na | numeric |
nb | numeric |
da | double precision |
db | double precision |
template1=# select count(*) from test;
count
---------
1000000
(1 row)

template1=# select regr_avgx(da,db) from test;
regr_avgx
-----------------
0.5002412120227
(1 row)

Time: 1052,893 ms
template1=# select regr_avgx(na,nb) from test;
regr_avgx
-------------------------
0.500040167263887822939
(1 row)

Time: 4459,739 ms

template1=# select regr_sxx(da,db) from test;
regr_sxx
------------------
83303.6317359119
(1 row)

Time: 1043,891 ms
template1=# select regr_sxx(na,nb) from test;
regr_sxx
--------------------------------------------------
83342.044294954140912267902323775495680113567986
(1 row)

Time: 8514,843 ms
template1=# select corr(da,db) from test;
corr
----------------------
0.000527588261283456
(1 row)

Time: 1074,948 ms
template1=# select corr(na,nb) from test;
corr
----------------------------------------------------------------------------------------

0.000759857150984988517883855238363403977440313567465424735082001422354119457463407737
(1 row)

Time: 18327,376 ms

Yes, that's significantly slower, but I don't think that it make the
functions unusable...

Regards,
Sergey

*******************************************************************
Sergey E. Koposov
Max Planck Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math(at)sai(dot)msu(dot)ru

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2006-07-30 18:21:36 Re: [PATCHES] 8.2 features?
Previous Message Alvaro Herrera 2006-07-30 18:11:42 Re: PATCH to allow concurrent VACUUMs to not lock each