Re: [HACKERS] WIP: Aggregation push-down

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2018-02-28 16:02:43
Message-ID: 31585.1519833763@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >> > I'm not about to add any other features now. Implementation of the missing
> >> > parts (see the TODO comments in the code) is the next step. But what I'd
> >> > appreciate most is a feedback on the design. Thanks.
> >>
> >> I am getting a conflict after applying patch 5 but this did not get
> >> any reviews so moved to next CF with waiting on author as status.
> >
> > Attached is the next version.

> translate_expression_to_rels() looks unsafe. Equivalence members are
> known to be equal under the semantics of the relevant operator class,
> but that doesn't mean that one can be freely substituted for another.
> For example:
>
> rhaas=# create table one (a numeric);
> CREATE TABLE
> rhaas=# create table two (a numeric);
> CREATE TABLE
> rhaas=# insert into one values ('0');
> INSERT 0 1
> rhaas=# insert into two values ('0.00');
> INSERT 0 1
> rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1;
> a | count
> ---+-------
> 0 | 1
> (1 row)
>
> rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1;
> a | count
> ------+-------
> 0.00 | 1
> (1 row)
>
> There are, admittedly, a large number of data types for which such a
> substitution would work just fine. numeric will not, citext will not,
> but many others are fine. Regrettably, we have no framework in the
> system for identifying equality operators which actually test identity
> versus some looser notion of equality. Cross-type operators are a
> problem, too; if we have foo.x = bar.y in the query, one might be int4
> and the other int8, for example, but they can still belong to the same
> equivalence class.
>
> Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
> agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
> p.i GROUP BY p.i" you assume that it's OK to do a Partial
> HashAggregate over c1.parent rather than p.i. This will be false if,
> say, c1.parent is of type citext and p.i is of type text; this will
> get grouped together that shouldn't. It will also be false if the
> grouping expression is something like GROUP BY length(p.i::text),
> because one value could be '0'::numeric and the other '0.00'::numeric.
> I can't think of a reason why it would be false if the grouping
> expressions are both simple Vars of the same underlying data type, but
> I'm a little nervous that I might be wrong even about that case.
> Maybe you've handled all of this somehow, but it's not obvious to me
> that it has been considered.

These problems are still being investigated, see [1]

> Another thing I noticed is that the GroupedPathInfo looks a bit like a
> stripped-down RelOptInfo, in that for example it has both a pathlist
> and a partial_pathlist. I'm inclined to think that we should build new
> RelOptInfos instead. As you have it, there are an awful lot of things
> that seem to know about the grouped/ungrouped distinction, many of
> which are quite low-level functions like add_path(),
> add_paths_to_append_rel(), try_nestloop_path(), etc. I think that
> some of this would go away if you had separate RelOptInfos instead of
> GroupedPathInfo.

I've reworked the patch so that separate RelOptInfo is used for grouped
relation. The attached patch is only the core part. Neither postgres_fdw
changes nor the part that tries to avoid the final aggregation is included
here. I'll add these when the patch does not seem to need another major rework.

[1] https://www.postgresql.org/message-id/CA+Tgmoa5Pp-DBJg=W8Xj8Czf-32PfxPgxwFPkA6qN2w_wPX8bg@mail.gmail.com

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
agg_pushdown_v6.diff text/x-diff 293.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2018-02-28 16:03:11 Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Previous Message Robert Haas 2018-02-28 15:59:22 Re: server crash in nodeAppend.c