Re: [HACKERS] WIP: Aggregation push-down

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Richard Guo <riguo(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2020-02-07 09:05:34
Message-ID: 53726.1581066334@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> Hi,
>
> I've been looking at the 'make_join_rel()' part of the patch, and I'm
> aware that, if we are joining A and B, a 'grouped join rel (AB)' would
> be created besides the 'plain join rel (AB)', and paths are added by 1)
> applying partial aggregate to the paths of the 'plain join rel (AB)', or
> 2) joining grouped A to plain B or joining plain A to grouped B.
>
> This is a smart idea. One issue I can see is that some logic would have
> to be repeated several times. For example, the validity check for the
> same proposed join would be performed at most three times by
> join_is_legal().
>
> I'm thinking of another idea that, instead of using a separate
> RelOptInfo for the grouped rel, we add in RelOptInfo a
> 'grouped_pathlist' for the grouped paths, just like how we implement
> parallel query via adding 'partial_pathlist'.
>
> For base rel, if we decide it can produce grouped paths, we create the
> grouped paths by applying partial aggregation to the paths in 'pathlist'
> and add them into 'grouped_pathlist'.
>
> For join rel (AB), we can create the grouped paths for it by:
> 1) joining a grouped path from 'A->grouped_pathlist' to a plain path
> from 'B->pathlist', or vice versa;
> 2) applying partial aggregation to the paths in '(AB)->pathlist'.
>
> This is basically the same idea, just moves the grouped paths from the
> grouped rel to a 'grouped_pathlist'. With it we should not need to make
> any code changes to 'make_join_rel()'. Most code changes would happen in
> 'add_paths_to_joinrel()'.
>
> Will this idea work? Is it better or worse?

I tried such approach in an earlier version of the patch [1], and I think the
reason also was to avoid repeated calls of functions like
join_is_legal(). However there were objections against such approach,
e.g. [2], and I admit that it was more invasive than what the current patch
version does.

Perhaps we can cache the result of join_is_legal() that we get for the plain
relation, and use it for the group relation. I'll consider that. Thanks.

[1] https://www.postgresql.org/message-id/18007.1513957437%40localhost
[2] https://www.postgresql.org/message-id/CA%2BTgmob8og%2B9HzMg1vM%2B3LwDm2f_bHUi9%2Bg1bqLDTjqpw5s%2BnQ%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2020-02-07 09:19:14 Fix comment for max_cached_tuplebufs definition
Previous Message Amit Langote 2020-02-07 08:54:07 Re: In PG12, query with float calculations is slower than PG11