Re: Weighted Stats

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weighted Stats
Date: 2016-01-08 05:37:36
Message-ID: CAJrrPGf1KT-5BN+RPuNXMkCiGXNQs5_RatYqseun-xfF9H+8aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 21, 2015 at 1:50 PM, David Fetter <david(at)fetter(dot)org> wrote:
> On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote:
>> On 11/2/15 5:46 PM, David Fetter wrote:
>> >I'd like to add weighted statistics to PostgreSQL
>>
>> Anything happen with this? If community isn't interested, ISTM it'd be good
>> to put this in PGXN.
>
> I think it's already in PGXN as an extension, and I'll get another
> version out this early this week, as it involves mostly adding some
> tests.
>
> I'll do the float8 ones for core this week, too, and unless there's a
> really great reason to do more data types on the first pass, it should
> be in committable shape.

I reviewed the patch, following are my observations.

1. + precision</type>, <type>numeric</type>, or <type>interval</type>

with interval type it is giving problem. As interval data type is not supported,
so remove it in the list of supported inputs.

postgres=# select weighted_avg(f7,f1) from tbl;
ERROR: function weighted_avg(interval, smallint) does not exist at character 8
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.

2. +float8_weighted_avg(PG_FUNCTION_ARGS)

It will be helpful, if you provide some information as a function header,
how the weighted average is calculated similar like other weighted functions.

3. + transvalues = check_float8_array(transarray,
"float8_weighted_stddev_accum", 4);

The second parameter to check_float8_array should be "float8_weighted_accum".

4. There is an OID conflict of 4066 with latest master code.

5.+ A += newvalW * ( newvalX - transvalues[2] ) / W;
+ CHECKFLOATVAL(A, isinf(newvalW) || isinf(newvalX - transvalues[2])
|| isinf(1.0/W), true);

+ Q += newvalW * (newvalX - transvalues[2]) * (newvalX - A);
+ CHECKFLOATVAL(A, isinf(newvalX - transvalues[3]) || isinf(newvalX -
A) || isinf(1.0/W), true);

Is the need of calculation also needs to be passed to CHECKFLOATVAL?
Just passing
the variables involved in the calculation isn't enough? If expressions
are required then
it should be something as follows?

CHECKFLOATVAL(A, isinf(transvalues[2]) || isinf(newvalW) ||
isinf(newvalX - transvalues[2]) || isinf(1.0/W), true);

CHECKFLOATVAL(Q, isinf(transvalues[3]) || isinf(newvalX -
transvalues[2]) || isinf(newvalX - A) || isinf(1.0/W), true);

I verified the stddev transition and final function calculations
according to wikipedia
and they are fine.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Catalin Iacob 2016-01-08 06:31:47 Re: proposal: PL/Pythonu - function ereport
Previous Message Peter Geoghegan 2016-01-08 05:11:23 Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses