Re: PoC: Grouped base relation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC: Grouped base relation
Date: 2017-01-19 00:05:09
Message-ID: CA+TgmobmmQ5sOACZSTdCj6h=_3_JkuS7VK4d+paQ3sojx7skAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 18, 2017 at 5:14 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 19 January 2017 at 07:32, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Jan 17, 2017 at 11:33 PM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> I don't think aggcombinefn isn't there because we couldn't write it
>>> for array_agg() or string_agg(). I guess, it won't be efficient to
>>> have those aggregates combined across parallel workers.
>>
>> I think there are many cases where it would work fine. I assume that
>> David just didn't make it a priority to write those functions because
>> it wasn't important to the queries he wanted to optimize. But
>> somebody can submit a patch for it any time and I bet it will have
>> practical use cases. There might be some performance problems shoving
>> large numbers of lengthy values through a shm_mq, but we won't know
>> that until somebody tries it.
>
> I had assumed that the combine function which combines a large array
> or a large string would not be any cheaper than doing that
> incrementally with the transfn. Of course some of this would happen in
> parallel, but it still doubles up some of the memcpy()ing, so perhaps
> it would be slower? ... I didn't ever get a chance to test it.

Even if that particular bit is not very much faster, it might have the
advantage of letting other parts of the plan be parallelized, and you
can still win that way. In the internal-to-EnterpriseDB experiments
we've been doing over the last few months, we've seen that kind of
thing a lot, and it informs a lot of the patches that my colleagues
have been submitting. But I also wouldn't be surprised if there are
cases where it wins big even without that. For example, if you're
doing an aggregate with lots of groups and good physical-to-logical
correlation, the normal case might be for all the rows in a group to
be on the same page. So you parallel seq scan the table and have
hardly any need to run the combine function in the leader (but of
course you have to have it available just in case you do need it).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-01-19 00:05:52 Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Previous Message Petr Jelinek 2017-01-19 00:02:58 Re: Logical Replication WIP