Re: [HACKERS] WIP: Aggregation push-down

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Richard Guo <riguo(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2020-01-11 01:11:24
Message-ID: 20200111011124.efacqhdsptc36pm2@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been looking at the last version (v14) of this patch series,
submitted way back in July and unfortunately quiet since then. Antonin
is probably right one of the reasons for the lack of reviews is that it
requires interest/experience with planner.

Initially it was also a bit hard to understand what are the benefits
(and the patch shifted a bit), which is now mostly addressed by the
README in the last patch. The trouble is that's hidden in the patch and
so not immediately obvious to people considering reviewing it :-( Tom
posted a nice summary in November 2018, but it was perhaps focused on
the internals.

So my first suggestion it to write a short summary with example how it
affects practical queries (plan change, speedup) etc.

My second suggestion is to have meaningful commit messages for each part
of the patch series, explaining why we need that particular part. It
might have been explained somewhere in the thread, but as a reviewer I
really don't want to reverse-engineer the whole thread.

Now, regarding individual parts of the patch:

1) v14-0001-Introduce-RelInfoList-structure.patch
-------------------------------------------------

- I'm not entirely sure why we need this change. We had the list+hash
before, so I assume we do this because we need the output functions?

- The RelInfoList naming is pretty confusing, because it's actually not
a list but a combination of list+hash. And the embedded list is called
items, to make it yet a bit more confusing. I suggest we call this
just RelInfo or RelInfoLookup or something else not including List.

- RelInfoEntry seems severely under-documented, particularly the data
part (which is void* making it even harder to understand what's it for
without having to read the functions). AFAICS it's just simply a link
to the embedded list, but maybe I'm wrong.

- I suggest we move find_join_rel/add_rel_info/add_join_rel in relnode.c
right before build_join_rel. This makes diff clearer/smaller and
visual diffs nicer.

2) v14-0002-Introduce-make_join_rel_common-function.patch
---------------------------------------------------------

I see no point in keeping this change in a separate patch, as it prety
much just renames make_join_rel to make_join_rel_common and then adds

make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
{
return make_join_rel_common(root, rel1, rel2);
}

which seems rather trivial and useless on it's own. I'd just merge it
into 0003 where we use the _common function more extensively.

3) v14-0003-Aggregate-push-down-basic-functionality.patch
---------------------------------------------------------

I haven't done much review on this yet, but I've done some testing and
benchmarking so let me share at least that. The queries I used were of
the type I mentioned earlier in this thread - essentially a star schema,
i.e. fact table referencing dimensions, with aggregation per columns in
the dimension. So something like

SELECT d.c, sum(f) FROM fact JOIN dimension d ON (d.id = f.d_id)
GROUP BY d.c;

where "fact" table is much much larger than the dimension.

Attached is a script I used for testing with a bunch of queries and a
number of parameters (parallelism, size of dimension, size of fact, ...)
and a spreadsheed summarizing the results.

Overall, the results seem pretty good - in many cases the queries get
about 2x faster and I haven't seen any regressions. That's pretty nice.

One annoying thing is that this only works for non-parallel queries.
That is, I saw no improvements with parallel queries. It was still
faster than the non-parallel query with aggregate pushdown, but the
parallel query did not improve.

An example of timing looks like this:

non-parallel (pushdown = off): 918 ms
non-parallel (pushdown = on): 561 ms
parallel (pushdown = off): 313 ms
parallel (pushdown = on): 313 ms

The non-parallel query gets faster because after enabling push-down the
plan changes (and we get partial aggregate below the join). With
parallel query that does not happen, the plans stay the same.

I'm not claiming this is a bug - we end up picking the fastest execution
plan (i.e. we don't make a mistake by e.g. switching to the non-parallel
one with pushdown).

My understanding is that the aggregate pushdown can't (currently?) be
used with queries that use partial aggregate because of parallelism.
That is we can either end up with a plan like this:

Finalize Aggregate
-> Join
-> Partial Aggregate
-> ...

or a parallel plan like this:

Finalize Aggregate
-> Gather
-> Partial Aggregate
-> Join
-> ...
-> ...

but we currently don't support a combination of both

Finalize Aggregate
-> Gather
-> Join
-> Partial Aggregate
-> ...

I wonder if that's actually even possible and what would it take to make
it work?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
pushdown-agg.ods application/vnd.oasis.opendocument.spreadsheet 18.1 KB
pushdown.sh application/x-sh 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-01-11 01:34:20 Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX"
Previous Message nuko yokohama 2020-01-11 00:27:58 Re: Implementing Incremental View Maintenance