Re: Parallel Aggregates for string_agg and array_agg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-04-05 20:46:26
Message-ID: fdbf52dc-2e80-f5bc-5d43-b66a2deba021@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/2018 09:10 PM, Tels wrote:
> Moin,
>
> On Wed, April 4, 2018 11:41 pm, David Rowley wrote:
>> Hi Tomas,
>>
>> Thanks for taking another look.
>>
>> On 5 April 2018 at 07:12, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
>> wrote:
>>> Other than that, the patch seems fine to me, and it's already marked as
>>> RFC so I'll leave it at that.
>>
>> Thanks.
>
> I have one more comment - sorry for not writing sooner, the flu got to me ...
>
> Somewhere in the code there is a new allocation of memory when the string
> grows beyond the current size - and that doubles the size. This can lead
> to a lot of wasted space (think: constructing a string that is a bit over
> 1 Gbyte, which would presumable allocate 2 GByte).
>

I don't think we support memory chunks above 1GB, so that's likely going
to fail anyway. See

#define MaxAllocSize ((Size) 0x3fffffff) /* 1 gigabyte - 1 */
#define AllocSizeIsValid(size) ((Size) (size) <= MaxAllocSize)

But I get your point - we may be wasting space here. But that's hardly
something this patch should mess with - that's a more generic allocation
question.

> The same issue happens when each worker allocated 512 MByte for a 256
> Mbyte + 1 result.
>
> IMHO a factor of like 1.4 or 1.2 would work better here - not sure what
> the current standard in situations like this in PG is.
>

With a 2x scale factor, we only waste 25% of the space on average.
Consider that you're growing because you've reached the current size,
and you double the size - say, from 1MB to 2MB. But the 1MB wasted space
is the worst case - in reality we'll use something between 1MB and 2MB,
so 1.5MB on average. At which point we've wasted just 0.5MB, i.e. 25%.

That sounds perfectly reasonable to me. Lower factor would be more
expensive in terms of repalloc, for example.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-05 20:47:28 Re: WIP: a way forward on bootstrap data
Previous Message John Naylor 2018-04-05 20:35:23 Re: WIP: a way forward on bootstrap data