Re: [HACKERS] WIP: Aggregation push-down

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Richard Guo <riguo(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2020-02-06 08:30:32
Message-ID: CAMbWs4_F_S22tn7Nmxx=x0jZY2J4qA_hB_SLMDvcJuNhnCLu+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been looking at the 'make_join_rel()' part of the patch, and I'm
aware that, if we are joining A and B, a 'grouped join rel (AB)' would
be created besides the 'plain join rel (AB)', and paths are added by 1)
applying partial aggregate to the paths of the 'plain join rel (AB)', or
2) joining grouped A to plain B or joining plain A to grouped B.

This is a smart idea. One issue I can see is that some logic would have
to be repeated several times. For example, the validity check for the
same proposed join would be performed at most three times by
join_is_legal().

I'm thinking of another idea that, instead of using a separate
RelOptInfo for the grouped rel, we add in RelOptInfo a
'grouped_pathlist' for the grouped paths, just like how we implement
parallel query via adding 'partial_pathlist'.

For base rel, if we decide it can produce grouped paths, we create the
grouped paths by applying partial aggregation to the paths in 'pathlist'
and add them into 'grouped_pathlist'.

For join rel (AB), we can create the grouped paths for it by:
1) joining a grouped path from 'A->grouped_pathlist' to a plain path
from 'B->pathlist', or vice versa;
2) applying partial aggregation to the paths in '(AB)->pathlist'.

This is basically the same idea, just moves the grouped paths from the
grouped rel to a 'grouped_pathlist'. With it we should not need to make
any code changes to 'make_join_rel()'. Most code changes would happen in
'add_paths_to_joinrel()'.

Will this idea work? Is it better or worse?

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-02-06 09:10:29 replication_origin and replication_origin_lsn usage on subscriber
Previous Message Julien Rouhaud 2020-02-06 08:23:33 Re: Expose lock group leader pid in pg_stat_activity