Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-03-14 02:20:08
Message-ID: CAKJS1f9u0eC-U0cE_ottqYFuNx2Ucu_73KXPFVr-48zZ3Gkzew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 March 2016 at 06:28, David Steele <david(at)pgmasters(dot)net> wrote:
> As far as I can tell this is the most recent version of the patch but it
> doesn't apply on master. I'm guessing that's mostly because of Tom's
> UPP patch.
>
> This is a long and confusing thread and I should know since I just read
> through the entire thing. I'm setting this back to "waiting for author"
> and I'd like see a rebased patch and a summary of what's in the patch
> before setting it back to "needs review".

Many thank for taking the time to read through this thread.

Let me try to summarise the current status.

Overview:
Combine Aggregate states is required to allow aggregate functions to
be calculated in multiple stages. The primary intended use for this,
as of now is for allowing parallel aggregate. A simple combine
function for count(*) would perform something very similar to sum() of
all of the partially calculated aggregate states. In this case
"partially" means aggregated states without the final function called.

Status:
As of [1] we have basic infrastructure in place to allow this, but
that commit only added the infrastructure for aggregate functions
which don't have an INTERNAL state type. Many aggregates do have an
internal state type, and for the case of parallel aggregate, since
these internal states are just passed around as pointers to memory
which belongs to the process wherever the member was allocated, we're
unable to pass these back to the main process for the final aggregate
stage. This patch allows this to happen by way of serializing these
internal states into some type that can be copied into the main
backend process.
Robert was quite right to leave this part out of the commit as at the
time the method I was using was under debate, and there was no reason
to hold things up, as the more simple case could be committed to allow
work to continue on features which would use this new infrastructure.

These threads are a bit of a mess as there's often confusion (from me
too) about which code belongs in which patch. Quite possibly the
setrefs.c stuff of the parallel aggregate patch actually do belong
here, but it's probably not worth shuffling things around as I think
the parallel agg stuff should go in before the additional serial
states stuff.

Current patch:
I've now updated the patch to base it on top of the parallel aggregate
patch in [2]. To apply the attached, you must apply [2] first!
The changes to numeric.c are not really intended for commit for now. I
plan on spending a bit of time first on looking at using send
functions and bytea serial types, which if that works should be far
more efficient than converting the serialized output into a string.
The example combine function in numeric.c is just intended to show
that this does actually work.

I've also left one opr_sanity test failing for now. This test is
complaining that the de-serialisation function returns an INTERNAL
type and does not accept one (which would disallow it from being
called from SQL interface). I think this is safe to allow as I've
coded the function to fail if it's not called in aggregate context.
Perhaps I'm wrong though, so I left the test failing to highlight that
I'd like someone else to look and maybe think about this a bit. It
seems very important not to get this part wrong.

Comments and reviews etc all welcome. Please remember to apply [2]
before the attached.

[1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a7de3dc5c346e07e0439275982569996e645b3c2
[2] http://www.postgresql.org/message-id/CAKJS1f9Tr-+9aKWZ1XuHUnEJZ7GKKfo45b+4fCNj8DkrDZYK4g@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
serialize_states_2016-03-14.patch application/octet-stream 106.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2016-03-14 02:29:51 Re: Relation extension scalability
Previous Message Thomas Munro 2016-03-14 01:54:04 pg_stat_get_progress_info(NULL) blows up