Skip site navigation (1) Skip section navigation (2)

Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Date: 2013-10-20 02:48:01
Message-ID: 52634461.2020002@fuzzy.cz (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-bugs
Hi,

On 19.10.2013 15:38, Frank van Vugt wrote:
> L.S.
> 
> Something seems wrong here.... when applying arrag_agg() on a large recordset, 
> above a certain size 'things fall over' and memory usage races off until the 
> system runs out of it:

...

> with g as (select * from f)
> 	select comcat(id::text), min(value)
> 	from g
> 	group by g.value
> 	having count(1) > 1;
> Time: 18660,326 ms

...

> with g as (select * from f)
> 	select array_agg(id), min(value)
> 	from g
> 	group by g.value
> 	having count(1) > 1;
> Time: <none, fails>

Hmmmm. I initially thought that this was because of 2x resizing the
array in accumArrayResult, but that clearly is not the case (e.g.
because the example is constructed so that groups with multiple elements
are very unlikely).

Then I though that it's because the array is pre-allocated with 64
elements, but that's not an issue either (at least not the main one)
because even with 1e7 elements this amounts for about

   10M * 8 * 64 = ~5GB

and the report says that it consumed ~32GB RAM and swap (not sure how
much, but probably not a small amount). On my machine it easily ate 8GB
of RAM and 4GB of swap (and then got shot by OOM).

Anyway, I disabled the preallocation (i.e. 1 element initially, +1 for
each iteration) which should be ~80MB of data, but even then I was
unable to execute that query.

The effect on smaller queries (say, 1e6 rows) was negligible too - it
consumed more or less the same amount of memory, irrespectedly of the
preallocation.

Then I started to wonder about the memory local context, which is
defined like this:

    arr_context = AllocSetContextCreate(rcontext,
                "accumArrayResult",
                ALLOCSET_DEFAULT_MINSIZE,
                ALLOCSET_DEFAULT_INITSIZE,
                ALLOCSET_DEFAULT_MAXSIZE);

which means

    arr_context = AllocSetContextCreate(rcontext,
                "accumArrayResult",
                0,
                (8*1024),
                (8*1024*1024));

I may be wrong, but it seems to me that each group has it's own
allocation context and once something gets allocated (which is pretty
much granted for each group), it's allocates at least 8kB of memory.

With 1e6 groups that's ~8GB (not really far from what I see here), and
with 1e7 groups it's ~80GB. Not the most efficient approach for 80MB of
values.

I tried this:

#define ALLOCSET_TINY_MINSIZE	 0
#define ALLOCSET_TINY_INITSIZE  (1 * 256)
#define ALLOCSET_TINY_MAXSIZE	(8 * 1024)

    arr_context = AllocSetContextCreate(rcontext,
                "accumArrayResult",
                ALLOCSET_DEFAULT_MINSIZE,
                ALLOCSET_DEFAULT_INITSIZE,
                ALLOCSET_DEFAULT_MAXSIZE);

and after a minor tweak in aset.c, which by default enforces that init
size >= 1024:

	if (initBlockSize < 1024)
		initBlockSize = 1024;

I'm able to process even the 1e7 values, although it still consumes a
significant amount of memory.

The failing query is slightly artificial, but pretty much any array_agg
query with large number of groups is going to fail exactly the same. So
if we could improve that somehow, that'd be nice.

However the more I think about the local memory context, the more I
think the idea to use a dedicated memory context for each group is
flawed. I think a single context for the whole aggregate would work much
better, and indeed - after replacing the local memory context by
rcontext (which is just aggcontext passed from array_agg_transfn) the
memory consumption was cut in half.

The other thing I think we should reevaluate is the preallocation, i.e.
the initial size (currently 64) and if growing to 2x the size is a good
idea.

Based on experience and measurements with multiple custom aggregates I
did in the past few weeks, I think a significantly lower initial size
(say 16 or maybe even 8) would work equally well. We're talking about
difference in percents vs. higher probability of running out of memory.

But let's say it's a minor issue, and by switching to a shared memory
context we've already saved ~50% memory for such these cases (single
element in a group, 1kB chunk - 64*8B = 512B).

Regarding the growth rate - I think once array reachch certain size
(e.g. 1024 elements), we should probably decrease the growth rate. For
example +128 instead of x2 or something like that. With the current
approach, we're pretty much bound to have ~50% overhead because of the
doubling.

regards
Tomas


In response to

Responses

pgsql-bugs by date

Next:From: Frank van VugtDate: 2013-10-20 08:26:39
Subject: Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Previous:From: Tomas VondraDate: 2013-10-20 00:37:18
Subject: Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group