Re: Memory Accounting v11

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Memory Accounting v11
Date: 2015-07-07 04:49:58
Message-ID: CAKJS1f8yvvvj-sVDv_bcxkzcZKq0ZOTVhX0dHfnYDct2Mycq5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 June 2015 at 07:43, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> This patch tracks memory usage (at the block level) for all memory
> contexts. Individual palloc()s aren't tracked; only the new blocks
> allocated to the memory context with malloc().
>
> It also adds a new function, MemoryContextMemAllocated() which can
> either retrieve the total allocated for the context, or it can recurse
> and sum up the allocations for all subcontexts as well.
>
> This is intended to be used by HashAgg in an upcoming patch that will
> bound the memory and spill to disk.
>
> Previous discussion here:
>
> http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop
>
> Previous concerns:
>
> * There was a slowdown reported of around 1-3% (depending on the exact
> version of the patch) on an IBM power machine when doing an index
> rebuild. The results were fairly noisy for me, but it seemed to hold up.
> See http://www.postgresql.org/message-id/CA
> +Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajP3Cro9dBRaA(at)mail(dot)gmail(dot)com
> * Adding a struct field to MemoryContextData may be bad for the CPU
> caching behavior, and may be the cause of the above slowdown.
>

I've been looking at this patch and trying to reproduce the reported
slowdown by using Tomas' function to try to exercise palloc() with minimal
overhead of other code:

https://github.com/tvondra/palloc_bench

I also wanted to attempt to determine if the slowdown was due to the
mem_allocated maintenance or if it was down to the struct size increasing.
I decided to test this just by simply commenting out all of
the context->mem_allocated += / -= and just keep the mem_allocated field in
the struct, but I've been really struggling to see any sort of performance
decrease anywhere. I'm just getting too much variation in the test results
to get any sort of idea.

I've attached a spreadsheet of the results I saw. It would be really good
if Robert was able to test with the IBM PPC just with the extra field in
the struct so we can see if the 1-3% lies there, or if it's in overhead of
keeping mem_allocated up-to-date.

My main question here is: How sure are you that none of your intended use
cases will need to call MemoryContextMemAllocated with recurse = true? I'd
imagine if you ever have to use MemoryContextMemAllocated(ctx, true) then
we'll all be sitting around with our stopwatches out to see if the call
incurs too much of a performance penalty.

Other small issues:

+ oldblksize = block->endptr - ((char *)block);
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
return NULL;
+
+ context->mem_allocated += blksize - oldblksize;
+

Shouldn't you be setting oldblksize after the "if"? I'd imagine the
realloc() failure is rare, but it just seems better to do it just before
it's required and only when required.

+ if (recurse)
+ {
+ MemoryContext child = context->firstchild;
+ for (child = context->firstchild;
+ child != NULL;

Here there's a double assignment of "child".

***************
*** 632,637 **** AllocSetDelete(MemoryContext context)
--- 637,644 ----
{
AllocBlock next = block->next;

+ context->mem_allocated -= block->endptr - ((char *) block);
+

I might be mistaken here, but can't you just set context->mem_allocted = 0;
after that loop?
Or maybe it would be an improvement to only do the decrement
if MEMORY_CONTEXT_CHECKING is defined, then Assert() that mem_allocated ==
0 after the loop...

One other thing that I don't remember seeing discussed would be to just
store a List in each context which would store all of the mem_allocated's
that need to be updated during each allocation and deallocation of memory.
The list would be setup once when the context is created. When a child
context is setup we'd just loop up the parent context chain and
list_concat() all of the parent's lists onto the child's lists. Would this
method add too much overhead when a context is deleted? Has this been
explored already?

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
mem_accounting_benchmark.ods application/vnd.oasis.opendocument.spreadsheet 14.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-07-07 04:56:36 Re: Support for N synchronous standby servers - take 2
Previous Message Amit Kapila 2015-07-07 04:17:27 Re: Parallel Seq Scan