Idea for aggregates

From: Greg Stark <stark(at)mit(dot)edu>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Idea for aggregates
Date: 2014-04-04 22:13:40
Message-ID: CAM-w4HNswsWehGSe4WO+tZR9KwrVPX3uJm_htUDX5pLbQZd-XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon, Dmitri, Peter Eisentraut, and I were chatting at PGConfNYC and
we had an idea for something interesting to do with aggregates.
Interestingly Simon and I came at this from two different directions
but realized we needed the same functionality to implement what we
wanted.

The basic idea is to separate the all the properties of the aggregate
functions except the final function from the final function into a
separate object. Giving the optimizer the knowledge that multiple
aggregate functions use the share the same basic machinery and
semantics for the state is the magic sauce that's a prerequisite for
the several ideas we were each thinking of.

I'm imagining something like (though I'm really not wedded to this
syntax at all):

CREATE AGGREGATE CLASS class_name ( input_data_type [ , ... ] ) (
SFUNC = sfunc,
STYPE = state_data_type
[ , INITCOND = initial_condition ]
[ , SORTOP = sort_operator ]
)

CREATE AGGREGATE func_name ( class_name ) (
SFUNC = sfunc
)

The idea then is that this should enable a number of useful optimizations:

1) If the planner sees multiple aggregates in the target list
belonging to the same class and having the same arguments then it
knows it can keep just one transition state varible for all of them.

2) For materialized views the materialized view can keep just the
state variable rather than the final aggregate needed for any
aggregates in the materialized view. This would enable users to query
any other aggregate function in the same class on the same column even
if it wasn't included in the original materialized view definition.

This kind of thing is probably even more important down the road for
incrementally updating materialized views.

3) I envision treating the aggregate classes as real types so you
could do something like

CREATE TABLE user (userid integer, page_timing aggregate class
numeric_agg_class)
And later update the row by doing something like update user set
page_timing = page_timing + $1 where id = $2
And still later call select page_timing.count(), page_timing.sum(),
page_timing.stddev() from user where id = $1

Except of course this syntax wouldn't work quite right. I haven't
thought of a better syntax yet though.

--
greg

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-04-04 22:21:47 Re: [PATCH] Add transforms feature
Previous Message Joshua Yanovski 2014-04-04 21:51:06 Re: Proposal: COUNT(*) (and related) speedup