Re: [HACKERS] Implementing STDDEV and VARIANCE

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeroen van Vianen <jeroen(at)design(dot)nl>
Cc: Don Baccus <dhogaza(at)pacifier(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Implementing STDDEV and VARIANCE
Date: 2000-01-24 15:29:58
Message-ID: 25171.948727798@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeroen van Vianen <jeroen(at)design(dot)nl> writes:
>> Right, that's pretty much what I'm visualizing. One minor detail: there
>> is not an "agginitfunction", there is an "agginitvalue". So your
>> special datatype to hold n/sx/sx2 must have at least a typinput function
>> that can convert the text string held in pg_aggregate into the desired
>> internal form of the initial state. (At least, that's how it's done
>> now. Do you want to argue to change it? As long as we're opening up
>> the AGG design for reconsideration, we could revisit that choice too.)

> I would suggest supplying an initfunction that initializes the datatype
> that holds n/sx/sx2 so you're able to set individual members to NULL if so
> desired. I also won't need to implement typinput for all required aggregate
> types, one small headache less ;-)

I've been thinking about this and have come to the conclusion that using
an initial value string and a typinput function is still the right design,
compared to using a parameterless initializer function. See, that way
you have a shot at reusing the same transition type (and typinput code)
for several different aggregates with slightly different initial
condition requirements, whereas with the initializer-function approach,
you have no choice but to write a separate initializer function for each
aggregate initial condition you need.

I have been thinking that we could save some effort (at a trivial cost
in memory) by defining a datatype or two that is specifically intended
to be an aggregate transition state datatype. For example, a struct
containing three or four float8 fields and as many bool fields would
serve nicely for AVG, STDDEV, and probably some other aggregates; some
of them wouldn't use all the fields, but so what? The only support code
this datatype would need would be a typinput function to convert a
string from pg_aggregate into initial struct contents. By doing it that
way, the same datatype can support several aggregates with different
initial condition requirements, without having to write a separate
initializer function for each one.

I think the point about initializing struct contents to NULL is a red
herring. For basic C types like float8, there isn't such a thing as
NULL really; you have to have a separate flag field or count field to
tell you that you've seen no input yet. That can be initialized equally
well by a typinput function or by an initializer function. But the
typinput function gets to have a parameter taken from pg_aggregate;
an initializer function would not.

> Let me wait for you to overhaul the fmgr code and do all the aggregate
> stuff right in one sweep. Only thing is: how do we deal with current
> user-defined aggregates?

Well, a user aggregate that only used transfn1, or only transfn2, would
convert directly. This scheme would break user aggregates that used
both, which is why I'm running it up the flagpole early --- to see if
anyone complains.

regards, tom lane

In response to

Responses

  • unsubscribe at 2000-01-24 15:44:59 from Nguyen, Thuan X

Browse pgsql-hackers by date

  From Date Subject
Next Message Nguyen, Thuan X 2000-01-24 15:44:59 unsubscribe
Previous Message Jose Soares 2000-01-24 15:26:44 Re: [INTERFACES] Re: ODBC drive strange behavior