|From:||Antonin Houska <ah(at)cybertec(dot)at>|
|To:||Robert Haas <robertmhaas(at)gmail(dot)com>|
|Subject:||Re: [HACKERS] WIP: Aggregation push-down|
|Views:||Raw Message | Whole Thread | Download mbox|
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >> > I'm not about to add any other features now. Implementation of the missing
> >> > parts (see the TODO comments in the code) is the next step. But what I'd
> >> > appreciate most is a feedback on the design. Thanks.
> >> I am getting a conflict after applying patch 5 but this did not get
> >> any reviews so moved to next CF with waiting on author as status.
> > Attached is the next version.
> I've been a bit confused for a while about what this patch is trying
> to do, so I spent some time today looking at it to try to figure it
> There's a lot I don't understand yet, but it seems like the general idea is
> to build, potentially for each relation in the join tree, not only the
> regular list of paths but also a list of "grouped" paths. If the
> pre-grouped path wins, then we can get a final path that looks like Finalize
> Aggregate -> Some Joins -> Partial Aggregate -> Maybe Some More Joins ->
> Base Table Scan.
Yes. The regression test output shows some more plans.
> In some cases the patch seems to replace that uppermost Finalize Aggregate
> with a Result node.
This is a feature implemented in 09_avoid_agg_finalization.diff. You can
ignore it for now, it's of lower priority than the preceding parts and needs
more work to be complete.
> translate_expression_to_rels() looks unsafe. Equivalence members are
> known to be equal under the semantics of the relevant operator class,
> but that doesn't mean that one can be freely substituted for another.
> For example:
> rhaas=# create table one (a numeric);
> CREATE TABLE
> rhaas=# create table two (a numeric);
> CREATE TABLE
> rhaas=# insert into one values ('0');
> INSERT 0 1
> rhaas=# insert into two values ('0.00');
> INSERT 0 1
> rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1;
> a | count
> 0 | 1
> (1 row)
> rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1;
> a | count
> 0.00 | 1
> (1 row)
> There are, admittedly, a large number of data types for which such a
> substitution would work just fine. numeric will not, citext will not,
> but many others are fine. Regrettably, we have no framework in the
> system for identifying equality operators which actually test identity
> versus some looser notion of equality. Cross-type operators are a
> problem, too; if we have foo.x = bar.y in the query, one might be int4
> and the other int8, for example, but they can still belong to the same
> equivalence class.
> Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
> agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
> p.i GROUP BY p.i" you assume that it's OK to do a Partial
> HashAggregate over c1.parent rather than p.i. This will be false if,
> say, c1.parent is of type citext and p.i is of type text; this will
> get grouped together that shouldn't.
I've really missed this problem, thanks for bringing it up! Your
considerations made me think of the specifics of the types like numeric or
citext. Although your example does not demonstrate what I consider the core
issue, I believe we eventually think of the same.
Consider this example query (using the tables you defined upthread):
FROM one, two
WHERE one.a = two.a AND scale(two.a) > 1
GROUP BY 1
I we first try to aggregate "two", then evaluate WHERE clause and then
finalize the aggregation, the partial aggregation of "two" can put various
binary values of the "a" column (0, 0.0, etc.) into the same group so the
scale of the output (of the partial agregation) will be undefined. Thus the
aggregation push-down will change the input of the WHERE clause.
So one problem is that the grouping expression can be inappropriate for
partial aggregation even if there's no type change during the
translation. What I consider typical for this case is that the equality
operator used to identify groups (SortGroupClause.eqop) can return true even
if binary (stored) values of the inputs differ. The partial aggregation could
only take place if we had a special equality operator which distinguishes the
*binary* values (I don't know yet how to store this operator the catalog ---
in pg_opclass recors for the hash opclasses?)..
On the other hand, if the grouping expression is not a plain Var and there's
no type change during the translation and the expression output type is not of
the "identity-unsafe type" (numeric, citext, etc.), input vars of the
"identity-unsafe type"should not prevent us from using the expression for
partial aggregation: in such a case the grouping keys are computed in the same
way they would be w/o the partial aggregation.
The other problem is which type changes are legal. We should not allow any
type-specific information (such as scale or precision of the numeric type) to
bet lost or ambiguous. In this example
> It will also be false if the grouping expression is something like GROUP BY
> length(p.i::text), because one value could be '0'::numeric and the other
you show that not only numeric -> int is a problem but also int ->
numeric. This is the tricky part.
One of my ideas is to check whether the source and destination types are
binary coercible (i.e. pg_cast(castfunc) = 0) but this might be a misuse of
the binary coercibility. Another idea is to allow only such changes that the
destination type is in the same operator class as the source, and explicitly
enumerate the "safe opclasses". But that would mean that user cannot define
new opclasses within which the translation is possible --- not sure this is a
Perhaps the most consistent solution is that the translation is not performed
if any input variable of the grouping expression should change its data type.
> Another thing I noticed is that the GroupedPathInfo looks a bit like a
> stripped-down RelOptInfo, in that for example it has both a pathlist
> and a partial_pathlist. I'm inclined to think that we should build new
> RelOptInfos instead. As you have it, there are an awful lot of things
> that seem to know about the grouped/ungrouped distinction, many of
> which are quite low-level functions like add_path(),
> add_paths_to_append_rel(), try_nestloop_path(), etc. I think that
> some of this would go away if you had separate RelOptInfos instead of
This was proposed by Ashutosh Bapat upthread. I actually did this in the early
(not published) prototype of the patch and I considered it too invasive for
standard_join_search() and subroutines. IIRC an array like simple_rel_array
had to be introduced for the grouped relation in which the meaning of NULL was
twofold: either the relation is not a base relation or it is base relation but
no grouped relation exists for it. Also I thought there would be too much
duplicate data if the grouped relation had a separate RelOptInfo. Now that I'm
done with one approach I can consider the original approach aggain.
> Some compiler noise:
> Core dump running the regression tests:
I'll fix these, thanks.
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
|Next Message||Bruce Momjian||2018-01-26 13:13:29||Re: AS OF queries|
|Previous Message||Michael Paquier||2018-01-26 12:45:52||Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables|