Skip site navigation (1) Skip section navigation (2)

array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion

From: Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
To: pgsql-bugs(at)postgresql(dot)org
Cc: ftm(dot)van(dot)vugt(at)foxi(dot)nl
Subject: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Date: 2013-10-19 13:38:40
Message-ID: 3839201.Nfa2RvcheX@techfox.foxi (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-bugs
L.S.

Something seems wrong here.... when applying arrag_agg() on a large recordset, 
above a certain size 'things fall over' and memory usage races off until the 
system runs out of it:


# select version();
                                      version                                      
-----------------------------------------------------------------------------------
 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.1, 
64-bit


# \! free -g
             total       used       free     shared    buffers     cached
Mem:            31          1         30          0          0          0
-/+ buffers/cache:          0         30
Swap:           31          0         31



====================
Create concatenate function and aggregate:

CREATE OR REPLACE FUNCTION comma_cat(text, text)
        RETURNS text
        LANGUAGE 'plpgsql'
        IMMUTABLE
        STRICT
        SECURITY INVOKER
        AS '
                BEGIN
                        IF $1 = '''' THEN
                                RETURN $2;
                        ELSIF $2 = '''' THEN
                                RETURN $1;
                        ELSE
                                RETURN $1 || '', '' || $2;
                        END IF;
                END;';
CREATE AGGREGATE comcat(text) ( SFUNC = comma_cat, STYPE = text );


Activate timing:

\timing
Timing is on.


Create demo data:

create temp table f as 
	select id, random() as value 
	from generate_series(1, 1e7::int) as f(id);

Time: 7036,917 ms



====================
Don't mind the 'usefulness' of the exact query below, I ran into this issue 
when experimenting a bit using random().




On my system, using the comcat() aggregate is no problem regardless of the 
size of the recordset:

with g as (select * from f limit 1e5)
	select comcat(id::text), min(value)
	from g
	group by g.value
	having count(1) > 1;
Time: 189,835 ms


with g as (select * from f limit 1e6)
	select comcat(id::text), min(value)
	from g
	group by g.value
	having count(1) > 1;
Time: 1815,756 ms


with g as (select * from f)
	select comcat(id::text), min(value)
	from g
	group by g.value
	having count(1) > 1;
Time: 18660,326 ms

====================
However, using the array_agg() this breaks (on my system ) on the largest set:

with g as (select * from f limit 1e5)
	select array_agg(id), min(value)
	from g
	group by g.value
	having count(1) > 1;
Time: 361,242 ms


with g as (select * from f limit 1e6)
	select array_agg(id), min(value)
	from g
	group by g.value
	having count(1) > 1;
Time: 3310,347 ms


with g as (select * from f)
	select array_agg(id), min(value)
	from g
	group by g.value
	having count(1) > 1;
Time: <none, fails>


=> the last query eats up all 32GB main memory in seconds, then starts on the 
32GB swap (which obviously takes a bit longer to digest) until eventually the 
child process gets killed by the oom-killer and postgresql restarts....






-- 

Best,




Frank.



Responses

pgsql-bugs by date

Next:From: Valentine GogichashviliDate: 2013-10-19 19:02:25
Subject: Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Previous:From: Honza HorakDate: 2013-10-18 11:55:09
Subject: random() generates collisions too early

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group