Re: 9.5: Better memory accounting, towards memory-bounded HashAgg

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date: 2014-08-23 01:02:52
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20.8.2014 08:11, Jeff Davis wrote:
> On Tue, 2014-08-19 at 12:54 +0200, Tomas Vondra wrote:
> It would be easier to resolve the performance concern if I could
> reliably get the results Robert is getting. I think I was able to
> reproduce the regression with the old patch, but the results were still
> noisy.

I created a small extension for this purpose, it's available here:

In short, it creates a chain of memory contexts, and then repeats
palloc/pfree a given number of times (with a chosen request size).

It either calls AllocSetContextCreate or AllocSetContextCreateTracked,
depending on whether there's a


so either leave it there or comment it out. The time is printed as a

I did a number of tests to get an idea of the overhead, using this call

select palloc_bench(10,100000000,32768);

which means 10 memory contexts, 100000000 palloc/free cycles with 32kB

And I got these results:

3246.03 ms
3125.24 ms
3247.86 ms
3251.85 ms
3113.03 ms
3140.35 ms

v3 patch (AllocSetContextCreate => no tracing)
3303.64 ms
3278.57 ms
3295.11 ms
3325.63 ms
3329.84 ms
3439.27 ms

-- v3 patch (AllocSetContextCreateTracked => tracing)
6296.43 ms
5228.83 ms
5271.43 ms
5158.60 ms
5404.57 ms
5240.40 ms

-- v4 (tracing all the time)
6728.84 ms
6478.88 ms
6478.82 ms
6473.57 ms
6554.96 ms
6528.66 ms

I think this makes the overhead clearly visible. I also worth mentioning
that this does nothing else except for palloc/free, which is not really
what a real workload does. And 100000000 palloc/free of 32kB blocks
means ~3TB of RAM, unless my math is broken.

So looking at the numbers and saying "7 seconds >> 3 seconds", all is
lost is not really appropriate IMHO.

Anyway, ISTM that v4 is actually a bitm ore expensive than v3 for some
reason. I'm not entirely sure why, but I suspect it's because of
updating the few additional memory contexts up to TopMemoryContext.
That's something v3 didn't do.

I tried to hack a bit on the idea of using a single byte for the flags
(isReset and track_mem) - patch attached. That got me pretty much to v3
performance (or maybe slightly better):

-- v4 + flag (tracing all the time)
5222.38 ms
4958.37 ms
5072.21 ms
5100.43 ms
5059.65 ms
4995.52 ms

But nothing that'd magically save the day ... and with disabled tracing
we get pretty much to v3 numbers (with trace_mem=false). So this
gymnastics gave us pretty much nothing ...

But I have realized that maybe the problem is that we're handling memory
contexts and accounting as 1:1. But that's not really the case - most of
the context is not really interested in this. They don't use accounting
now, and it won't change. So only small fraction of memory contexts will
ask for acounting. Yet all the contexts are forced to pass accounting
info from their children to their parent, if there happens to be a
context with track_mem=true somewhere above them.

And that's exactly the problem, because most of the time is spent in
update_accounting, in the loop over parents.

So my proposal is to separate those two things into two hierarchies,
that are somehow parallel, but not exactly.

That means:

(1) creating a structure with the accouting info

typedef struct MemoryAccountingData {

Size total_allocated;
Size self_allocated;

struct MemoryAccountingData * parent;

} MemoryAccountingData;

(2) adding a pointer to MemoryAccountingData to MemoryContextData, and
a 'track_mem' flag for contexts that requested tracking

typedef struct MemoryContextData
MemoryAccounting accounting;
} MemoryContextData;

(3) when a context does not request accounting, it just uses the
accounting pointer from the parent context, and track_mem=false

(4) when the context requests accounting, it allocates it's own
accounting structure, sets accounting->parent to the accounting
from parent, and sets track_mem=true

Now all the contexts have a direct pointer to the accounting of the
nearest parent context that explicitly requested accounting, and don't
need to walk through all the parents.

Contexts that did not request tracking have track_mem=false, and their
accounting points to the parent with explicit accounting, or is NULL if
there's no such parent. For these contexts, GetAllocated always returns
0, but that's OK because they haven't requested accounting anyway.

Contexts that requested tracking have have track_mem=true, and have
their own specific accounting instance. The accounting->parent plays
pretty much the same role as 'accounting' with 'track_mem=false' (see
previous paragraph). These contexts return GetAllocated properly.

Now, I did a quick with the palloc_bench - 1 context with tracking
enabled, and a chain of 10 contexts without tracking (but updating the
accounting for the first context).

And I see this - with tracking enabled:

3235.57 ms
3240.09 ms
3225.47 ms
3306.95 ms
3249.14 ms
3225.56 ms

and with tracking disabled:

3193.43 ms
3169.57 ms
3156.48 ms
3147.12 ms
3142.25 ms
3161.91 ms
3149.97 ms

Which is quite good, IMHO. Disabled is pretty much exactly as master
(i.e. no accounting at all), enabled is about equal to v3 with disabled

But maybe I did something stupid in those patches, it's 3AM here ...

Patch attached, consider it a an early alpha version.


Attachment Content-Type Size
memory-accounting-tomas.patch text/x-diff 11.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2014-08-23 01:48:20 Re: [v9.5] Custom Plan API
Previous Message Fabrízio de Royes Mello 2014-08-23 00:05:54 Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED