Re: 9.5: Better memory accounting, towards memory-bounded HashAgg

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date: 2014-11-17 17:13:09
Message-ID: 546A2CA5.2040705@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.11.2014 08:31, Jeff Davis wrote:
> On Sat, 2014-11-15 at 21:36 +0000, Simon Riggs wrote:
>> Do I understand correctly that we are trying to account for exact
>> memory usage at palloc/pfree time? Why??
>
> Not palloc chunks, only tracking at the level of allocated blocks
> (that we allocate with malloc).
>
> It was a surprise to me that accounting at that level would have any
> measurable impact, but Robert found a reasonable case on a POWER
> machine that degraded a couple percent. I wasn't able to reproduce
> it consistently on x86.
>
>> Or alternatively, can't we just sample the allocations to reduce
>> the overhead?
>
> Not sure quite what you mean by "sample", but it sounds like
> something along those lines would work.

Maybe. It might also cause unexpected volatility / nondeterminism in the
query execution.

Imagine a Hash Aggregate, where a small number of the requests is
considerably larger than the rest. If you happen to sample only a few of
those large requests, the whole hash aggregate happens in-memory,
otherwise it starts batching. The users will observe this as random
variations to the runtimes - same query / parameters, idle machine,
sudden changes to the plan ...

Maybe I'm too wary, but I guess there are use cases where the latency
uniformity is a concern.

> Attached is a patch that does something very simple: only tracks
> blocks held in the current context, with no inheritance or anything
> like it. This reduces it to a couple arithmetic instructions added
> to the alloc/dealloc path, so hopefully that removes the general
> performance concern raised by Robert[1].
>
> To calculate the total memory used, I included a function
> MemoryContextMemAllocated() that walks the memory context and its
> children recursively.
>
> Of course, I was originally trying to avoid that, because it moves
> the problem to HashAgg. For each group, it will need to execute
> MemoryContextMemAllocated() to see if work_mem has been exceeded. It
> will have to visit a couple contexts, or perhaps many (in the case
> of array_agg, which creates one per group), which could be a
> measurable regression for HashAgg.

:-(

> But if that does turn out to be a problem, I think it's solvable.
> First, I could micro-optimize the function by making it iterative
> rather than recursive, to save on function call overhead. Second, I
> could offer a way to prevent the HTAB from creating its own context,
> which would be one less context to visit. And if those don't work,
> perhaps I could resort to a sampling method of some kind, as you
> allude to above.

Do you plan to try this approach with your hashagg patch, and do some
measurements?

I'd expect the performance hit to be significant, but I'd like to see
some numbers.

kind regards
Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-11-17 17:21:39 Re: [BUGS] ltree::text not immutable?
Previous Message Peter Geoghegan 2014-11-17 17:06:26 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}