Re: [HACKERS] WIP: Aggregation push-down

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2019-02-28 16:02:54
Message-ID: 23468.1551369774@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Antonin Houska <ah(at)cybertec(dot)at> writes:
> > Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> Latest patch set fails to apply, so moved to next CF, waiting on
> >> author.
>
> > Rebased.
>
> This is in need of rebasing again :-(. I went ahead and pushed the 001
> part, since that seemed fairly uncontroversial.

ok, thanks.

> I did not spend a whole lot of time looking at the patch today, but
> I'm still pretty distressed at the data structures you've chosen.
> I remain of the opinion that a grouped relation and a base relation
> are, er, unrelated, even if they happen to share the same relid set.
> So I do not see the value of the RelOptInfoSet struct you propose here,

ok. As you suggested upthread, I try now to reuse the join_rel_list /
join_rel_hash structures, see v11-001-Introduce_RelInfoList.patch.

> and I definitely don't think there's any value in having, eg,
> create_seqscan_path or create_index_path dealing with this stuff.

Originally I tried to aggregate any path that ever gets passed to agg_path(),
but that's probably not worth the code complexity. Now the partial aggregation
is only applied to paths that survived agg_path() on the plain relation.

> I also don't like changing create_nestloop_path et al to take a PathTarget
> rather than using the RelOptInfo's pathtarget; IMO, it's flat out wrong
> for a path to generate a tlist different from what its parent RelOptInfo
> says that the relation produces.

Likewise, the current patch version is less invasive, so create_nestloop_path
et al are not touched at all.

> I think basically the way this ought to work is that we generate baserel
> paths pretty much the same as today, and then we run through the baserels
> and see which ones are potentially worth doing partial aggregation on,
> and for each one that is, we create a separate "upper relation" RelOptInfo
> describing that. The paths for this RelOptInfo would be
> partial-aggregation paths using paths from the corresponding baserel as
> input. Then we'd run a join search that only considers joining grouped
> rels with plain rels (I concur that joining two grouped rels is not worth
> coping with, at least for now).

make_join_rel() is the core of my implementation: besides joining two plain
relations it tries to join plain relation to grouped one, and also to
aggregate the join of the two plain relations. I consider this approach less
invasive and more efficient than running the whole standard_join_search again
for the grouped rels.

The problem of numeric-like data types (i.e. types for wich equality of two
values of the grouping key does not justify putting them into the same group
because information like scale would be discarded this way) remains open. My
last idea was to add a boolean flag to operator class which tells that
equality implies "bitwise equality", and to disallow aggregate push-down if
SortGroupClause.eqop is in an opclass which has this field FALSE. I'd like to
hear your opinion before I do any coding.

--
Antonin Houska
https://www.cybertec-postgresql.com

Attachment Content-Type Size
v11-001-Introduce_RelInfoList.patch text/x-diff 11.6 KB
v11-002-Introduce_make_join_rel_common.patch text/x-diff 2.5 KB
v11-003-Agg_pushdown_basic.patch text/x-diff 107.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Brindle 2019-02-28 16:03:28 Re: Re: Row Level Security − leakproof-ness and performance implications
Previous Message Robert Haas 2019-02-28 15:58:49 Re: Why don't we have a small reserved OID range for patch revisions?