Re: PoC: Grouped base relation

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PoC: Grouped base relation
Date: 2017-01-18 00:11:55
Message-ID: cb2065d8-ae20-4fc9-f0bb-9aab47725d84@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/17/2017 08:05 PM, Antonin Houska wrote:
> [ Trying to respond to both Tomas and David. I'll check tomorrow if anything
> else of the thread needs my comment. ]
>
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>> On 01/17/2017 12:42 AM, David Rowley wrote:
>>> On 10 January 2017 at 06:56, Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
>>> I've been thinking about this aggtransmultifn and I'm not sure if it's
>>> really needed. Adding a whole series of new transition functions is
>>> quite a pain. At least I think so, and I have a feeling Robert might
>>> agree with me.
>>>
>>> Let's imagine some worst case (and somewhat silly) aggregate query:
>>>
>>> SELECT count(*)
>>> FROM million_row_table
>>> CROSS JOIN another_million_row_table;
>>>
>>> Today that's going to cause 1 TRILLION transitions! Performance will
>>> be terrible.
>>>
>>> If we pushed the aggregate down into one of those tables and performed
>>> a Partial Aggregate on that, then a Finalize Aggregate on that single
>>> row result (after the join), then that's 1 million transfn calls, and
>>> 1 million combinefn calls, one for each row produced by the join.
>>>
>>> If we did it your way (providing I understand your proposal correctly)
>>> there's 1 million transfn calls on one relation, then 1 million on the
>>> other and then 1 multiplyfn call. which does 1000000 * 1000000
>>>
>>> What did we save vs. using the existing aggcombinefn infrastructure
>>> which went into 9.6? Using this actually costs us 1 extra function
>>> call, right? I'd imagine the size of the patch to use aggcombinefn
>>> instead would be a fraction of the size of the one which included all
>>> the new aggmultiplyfns and pg_aggregate.h changes.
>>>
>
>> I think the patch relies on the assumption that the grouping reduces
>> cardinality,
>
> Yes.
>
>> so a CROSS JOIN without a GROUP BY clause may not be the best
>> counterexample.
>
> Yet it tells me that my approach is not ideal in some cases ...
>
>>> It sounds like a much more manageable project by using aggcombinefn
>>> instead. Then maybe one day when we can detect if a join did not cause
>>> any result duplication (i.e Unique Joins), we could finalise the
>>> aggregates on the first call, and completely skip the combine state
>>> altogether.
>>>
>
>> I don't quite see how the patch could use aggcombinefn without sacrificing a
>> lot of the benefits. Sure, it's possible to run the aggcombinefn in a loop
>> (with number of iterations determined by the group size on the other side of
>> the join), but that sounds pretty expensive and eliminates the reduction of
>> transition function calls. The join cardinality would still be reduced,
>> though.
>
> That's what I think. The generic case is that neither side of the join is
> unique. If it appears that both relations should be aggregated below the join,
> aggcombinefn would have to be called multiple times on each output row of the
> join to achieve the same result as the calling aggmultiplyfn.
>
>> I do have other question about the patch, however. It seems to rely on the
>> fact that the grouping and joins both reference the same columns. I wonder how
>> uncommon such queries are.
>>
>> To give a reasonable example, imagine the typical start schema, which is
>> pretty standard for large analytical databases. A dimension table is
>> "products" and the fact table is "sales", and the schema might look like this:
>>
>> CREATE TABLE products (
>> id SERIAL PRIMARY KEY,
>> name TEXT,
>> category_id INT,
>> producer_id INT
>> );
>>
>> CREATE TABLE sales (
>> product_id REFERENCES products (id),
>> nitems INT,
>> price NUMERIC
>> );
>>
>> A typical query then looks like this:
>>
>> SELECT category_id, SUM(nitems), SUM(price)
>> FROM products p JOIN sales s ON (p.id = s.product_id)
>> GROUP BY p.category_id;
>>
>> which obviously uses different columns for the grouping and join, and so the
>> patch won't help with that. Of course, a query grouping by product_id would
>> allow the patch to work
>
> Right, the current version does not handle this. Thanks for suggestion.
>

So you're saying it's merely a limitation of the initial patch version
and not an inherent limitation?

>
>> Another thing is that in my experience most queries do joins on foreign keys
>> (so the PK side is unique by definition), so the benefit on practical examples
>> is likely much smaller.
>
> ok. So in some cases the David's approach might be better.
>

In which cases would David's approach be more efficient? But even if
there are such cases, I assume we could generate both paths and decide
based on cost, just like with all other alternative paths.

>
> However I think the ability to join 2 grouped (originally non-unique)
> relations is still important. Consider a query involving "sales" as well as
> another table which also has many-to-one relationship to "products".
>

Well, can you give a practical example? What you describe seems like a
combination of two fact tables + a dimension, something like this:

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
category_id INT,
producer_id INT
);

CREATE TABLE sales (
product_id REFERENCES products (id),
nitems INT,
price NUMERIC
);

CREATE TABLE reviews (
product_id REFERENCES products (id),
stars INT
);

But how exactly do you join that together? Because

SELECT * FROM products p JOIN sales s ON (p.id = s.product_id)
JOIN reviews r ON (p.id = r.product_id)

is clearly wrong - it's essentially M:N join between the two fact
tables, increasing the number of rows.

It'd helpful to have an example of a practical query optimized by the
patch. I'm not claiming it does not exist, but I've been unable to come
up with something reasonable at the moment.

>> But I guess my main question is if there are actual examples of queries the
>> patch is trying to improve, or whether the general benefit is allowing
>> parallel plans for queries where it would not be possible otherwise.
>
> In fact I did all this with postgres_fdw in mind.
>

I assume there's not much difference between pushing down aggregates to
local workers and to remote nodes. There'll be costing differences, but
are there any other differences?

>
> From this perspective, David's approach can be slightly more efficient if all
> the tables are local, but aggregation of multiple base relations below the
> join can save a lot of effort if the tables are remote (as it reduces the
> amount of data transferred over network).
>
> I'm not terribly happy about changing the system catalog, but adding something
> like pg_aggregate(aggtransmultifn) is currently the best idea I have.
>

I personally don't see that as a major problem, my impression it can be
mostly copied from the partial aggregate patch - it's not trivial, but
manageable. Propagating it to the optimizer will be less trivial, but
well, if it's necessary ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2017-01-18 00:55:47 Re: Parallel Index Scans
Previous Message Tom Lane 2017-01-17 22:53:36 Re: [COMMITTERS] pgsql: Generate fmgr prototypes automatically