Re: [HACKERS] WIP: Aggregation push-down

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2018-02-28 16:02:43
Message-ID: 31585.1519833763@localhost
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >> > I'm not about to add any other features now. Implementation of the missing
> >> > parts (see the TODO comments in the code) is the next step. But what I'd
> >> > appreciate most is a feedback on the design. Thanks.
> >>
> >> I am getting a conflict after applying patch 5 but this did not get
> >> any reviews so moved to next CF with waiting on author as status.
> >
> > Attached is the next version.

> translate_expression_to_rels() looks unsafe. Equivalence members are
> known to be equal under the semantics of the relevant operator class,
> but that doesn't mean that one can be freely substituted for another.
> For example:
> rhaas=# create table one (a numeric);
> rhaas=# create table two (a numeric);
> rhaas=# insert into one values ('0');
> INSERT 0 1
> rhaas=# insert into two values ('0.00');
> INSERT 0 1
> rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1;
> a | count
> ---+-------
> 0 | 1
> (1 row)
> rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1;
> a | count
> ------+-------
> 0.00 | 1
> (1 row)
> There are, admittedly, a large number of data types for which such a
> substitution would work just fine. numeric will not, citext will not,
> but many others are fine. Regrettably, we have no framework in the
> system for identifying equality operators which actually test identity
> versus some looser notion of equality. Cross-type operators are a
> problem, too; if we have foo.x = bar.y in the query, one might be int4
> and the other int8, for example, but they can still belong to the same
> equivalence class.
> Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
> agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
> p.i GROUP BY p.i" you assume that it's OK to do a Partial
> HashAggregate over c1.parent rather than p.i. This will be false if,
> say, c1.parent is of type citext and p.i is of type text; this will
> get grouped together that shouldn't. It will also be false if the
> grouping expression is something like GROUP BY length(p.i::text),
> because one value could be '0'::numeric and the other '0.00'::numeric.
> I can't think of a reason why it would be false if the grouping
> expressions are both simple Vars of the same underlying data type, but
> I'm a little nervous that I might be wrong even about that case.
> Maybe you've handled all of this somehow, but it's not obvious to me
> that it has been considered.

These problems are still being investigated, see [1]

> Another thing I noticed is that the GroupedPathInfo looks a bit like a
> stripped-down RelOptInfo, in that for example it has both a pathlist
> and a partial_pathlist. I'm inclined to think that we should build new
> RelOptInfos instead. As you have it, there are an awful lot of things
> that seem to know about the grouped/ungrouped distinction, many of
> which are quite low-level functions like add_path(),
> add_paths_to_append_rel(), try_nestloop_path(), etc. I think that
> some of this would go away if you had separate RelOptInfos instead of
> GroupedPathInfo.

I've reworked the patch so that separate RelOptInfo is used for grouped
relation. The attached patch is only the core part. Neither postgres_fdw
changes nor the part that tries to avoid the final aggregation is included
here. I'll add these when the patch does not seem to need another major rework.


Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt

Attachment Content-Type Size
agg_pushdown_v6.diff text/x-diff 293.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2018-02-28 16:03:11 Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Previous Message Robert Haas 2018-02-28 15:59:22 Re: server crash in nodeAppend.c