Re: 9.5: Better memory accounting, towards memory-bounded HashAgg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date: 2015-02-22 17:52:37
Message-ID: 54EA1765.5080405@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.2.2015 09:14, Jeff Davis wrote:
>
> On Wed, 2015-01-07 at 20:07 +0100, Tomas Vondra wrote:
>> So I started digging in the code and I noticed this:
>>
>> hash_mem = MemoryContextMemAllocated(aggstate->hashcontext, true);
>>
>> which is IMHO obviously wrong, because that accounts only for the
>> hashtable itself. It might be correct for aggregates with state passed
>> by value, but it's incorrect for state passed by reference (e.g.
>> Numeric, arrays etc.), because initialize_aggregates does this:
>>
>> oldContext = MemoryContextSwitchTo(aggstate->aggcontext);
>> pergroupstate->transValue = datumCopy(peraggstate->initValue,
>> peraggstate->transtypeByVal,
>> peraggstate->transtypeLen);
>> MemoryContextSwitchTo(oldContext);
>>
>> and it's also wrong for all the user-defined aggretates that have no
>> access to hashcontext at all, and only get reference to aggcontext using
>> AggCheckCallContext(). array_agg() is a prime example.
>
> Actually, I believe the first one is right, and the second one is
> wrong. If we allocate pass-by-ref transition states in aggcontext,
> that defeats the purpose of the patch, because we can't release the
> memory when we start a new batch (because we can't reset aggcontext).
>
> Instead, the transition states should be copied into hashcontext; we
> should count the memory usage of hashcontext; AggCheckCallContext
> should return hashcontext; and after each batch we should reset
> hashcontext.

Hmmm, maybe. I'd however suggest not to stuff everything into a single
memory context, but to create a 'transvalue' context next to
hashcontext, and return that one from AggCheckCallContext(). I.e.
something like this:

aggcontext
hashcontext - hash table etc.
transcontext - new context for transition values

It's better to keep things separated, I guess.

> It might be worth refactoring a bit to call it the "groupcontext" or
> something instead, and use it for both HashAgg and GroupAgg. That
> would reduce the need for conditionals.

Not sure that's worth it. The patch already seems complex enough.

> [ ... problems with array_agg subcontexts ... ]
>
>> and it runs indefinitely (I gave up after a few minutes). I believe this
>> renders the proposed memory accounting approach dead.
>
> ...
>
>> The array_agg() patch I submitted to this CF would fix this
>> particular query, as it removes the child contexts (so there's no
>> need for recursion in MemoryContextMemAllocated), but it does
>> nothing to the user-defined aggregates out there. And it's not
>> committed yet.
>
> Now that your patch is committed, I don't think I'd call the memory
> accounting patch I proposed "dead" yet. We decided that creating one
> context per group is essentially a bug, so we don't need to optimize
> for that case.

Yes, but the question is how many other aggregates designed similarly to
array_agg are out there (in extensions atc.), and whether we actually
care about them. Because those will be equally impacted by this, and we
can't fix them.

But maybe we don't really care, and it's the author's responsibility to
test them on a new major version, and fix the performance issue just
like we did with array_agg().

>
> Your approach may be better, though.
>

I'm not sure - it really boils down do the goal and assumptions.

If we restrict ourselves to HashAggregate-only solution, under the
assumption that the number of child contexts is small (and leave the fix
up to the author of the extension), then your approach is probably
simpler and better than the approach I proposed.

If we're aiming for a general-purpose solution that might be used
elsewhere, then I'm afraid we can't make the 'small number of child
contexts' assumption and the simple solution may not be the right one.

Although I believe that we shouldn't be creating large numbers of
'parallel' memory contexts anyway.

> Thank you for reviewing. I'll update my patches and resubmit for the
> next CF.

Thanks for working on this.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-02-22 17:57:48 Re: TABLESAMPLE patch
Previous Message Pavel Stehule 2015-02-22 17:38:57 Re: hash agg is slower on wide tables?