Re: using custom scan nodes to prototype parallel sequential scan

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: using custom scan nodes to prototype parallel sequential scan
Date: 2014-11-14 07:24:20
Message-ID: 9A28C8860F777E439AA12E8AEA7694F801077AB8@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Fri, Nov 14, 2014 at 2:12 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>
>
> > On 12 November 2014 07:54, David Rowley
> <dgrowleyml(at)gmail(dot)com>
> > Take int4_avg_accum() for example it does:
> >
> >
> > transdata->count++;
> > transdata->sum += newval;
> >
> > The merge function would need to perform something like:
> >
>
> > transdata->count += transdata2merge.count; sum +=
> transdata2merge.sum;
> >
> > Then the final function could be called on the merged aggregate
> state.
> >
> More simplify, we can describe parallel aware aggregate function.
> Please assume AVG(X) function that takes nrows and sum of X. Its
> transition
> function performs as like you described above, then final function
> works as
> usual.
>
> The job of parallel seq scan needs to do is:
> 1. replace AVG(X) by AVG(nrows, sum(X)
> 2. generate count(*) on the partial relation being grouped.
> 3. generate sum(X) on the partial relation being grouped.
>
> It looks like the following query:
> SELECT AVG(nrows, sum_X) FROM (
> SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 0 and 999 GROUP BY cat
> UNION
> SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 1000 and 1999 GROUP BY cat
> UNION
> SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 2000 and 2999 GROUP BY cat
> UNION
> SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 3000 and 3999 GROUP BY cat
> );
>
>
>
>
> Well, this would require giving the planner some kind of knowledge of what
> AVG() is. It currently knows nothing about that. It currently calls the
> transition function for each row, and the final function at the end and
> does not care or need to care about what either of those functions actually
> does. The transformation above looks like it would need to care and the
> logic to add that would be way more complex than aggregate merge functions.
>
It may make sense to have an extra catalog to indicate how usual aggregate
function can be broken down (or unavailable).
Like, AVG(X) = {COUNT(X) + SUM(X)} x N-partitions

> Likely for most aggregates, like count, sum, max, min, bit_and and bit_or
> the merge function would be the same as the transition function, as the
> state type is just the same as the input type. It would only be aggregates
> like avg(), stddev*(), bool_and() and bool_or() that would need a new merge
> function made... These would be no more complex than the transition
> functions... Which are just a few lines of code anyway.
>
> We'd simply just not run parallel query if any aggregates used in the query
> didn't have a merge function.
>
> When I mentioned this, I didn't mean to appear to be placing a road block.I
> was just bringing to the table the information that COUNT(*) + COUNT(*)
> works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) does not.
>
> Merge functions should be a simple patch to write. If I thought there was
> going to be a use for them in this release, I'd put my hand up to put a
> patch together.
>
Things I'm uncertain is, how caller of aggregate function distinguish a context
to call usual translation function, or new merge function.

Let's back an example of:
SELECT cat, COUNT(*), AVG(X) FROM t GROUP BY cat;

Its plan tree is probably as follows:
HashAggregate
Group Key: cat
-> Custom Scan (Parallel Scan)
# of workers: 4

The caller of translation/merge/final function is HashAggregate node.
On the other hand, it has to know whether the underlying plan returns every
records of underlying table or sub-total by parallel scan.
Please correct me, if my assumption is wrong.

Once HashAggregate can know that sub-plan returns sub-total of the relation,
it can chose merge function instead of translation function.
However, what I want to clarify is how to inform HashAggregate node its sub-
plan intends to return sub-total, instead of individual rows.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-11-14 07:37:52 Re: using custom scan nodes to prototype parallel sequential scan
Previous Message Amit Kapila 2014-11-14 07:23:46 Re: WAL format and API changes (9.5)