array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion

From: Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
To: pgsql-bugs(at)postgresql(dot)org
Cc: ftm(dot)van(dot)vugt(at)foxi(dot)nl
Subject: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Date: 2013-10-19 13:38:40
Message-ID: 3839201.Nfa2RvcheX@techfox.foxi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

L.S.

Something seems wrong here.... when applying arrag_agg() on a large recordset,
above a certain size 'things fall over' and memory usage races off until the
system runs out of it:

# select version();
version
-----------------------------------------------------------------------------------
PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.1,
64-bit

# \! free -g
total used free shared buffers cached
Mem: 31 1 30 0 0 0
-/+ buffers/cache: 0 30
Swap: 31 0 31

====================
Create concatenate function and aggregate:

CREATE OR REPLACE FUNCTION comma_cat(text, text)
RETURNS text
LANGUAGE 'plpgsql'
IMMUTABLE
STRICT
SECURITY INVOKER
AS '
BEGIN
IF $1 = '''' THEN
RETURN $2;
ELSIF $2 = '''' THEN
RETURN $1;
ELSE
RETURN $1 || '', '' || $2;
END IF;
END;';
CREATE AGGREGATE comcat(text) ( SFUNC = comma_cat, STYPE = text );

Activate timing:

\timing
Timing is on.

Create demo data:

create temp table f as
select id, random() as value
from generate_series(1, 1e7::int) as f(id);

Time: 7036,917 ms

====================
Don't mind the 'usefulness' of the exact query below, I ran into this issue
when experimenting a bit using random().

On my system, using the comcat() aggregate is no problem regardless of the
size of the recordset:

with g as (select * from f limit 1e5)
select comcat(id::text), min(value)
from g
group by g.value
having count(1) > 1;
Time: 189,835 ms

with g as (select * from f limit 1e6)
select comcat(id::text), min(value)
from g
group by g.value
having count(1) > 1;
Time: 1815,756 ms

with g as (select * from f)
select comcat(id::text), min(value)
from g
group by g.value
having count(1) > 1;
Time: 18660,326 ms

====================
However, using the array_agg() this breaks (on my system ) on the largest set:

with g as (select * from f limit 1e5)
select array_agg(id), min(value)
from g
group by g.value
having count(1) > 1;
Time: 361,242 ms

with g as (select * from f limit 1e6)
select array_agg(id), min(value)
from g
group by g.value
having count(1) > 1;
Time: 3310,347 ms

with g as (select * from f)
select array_agg(id), min(value)
from g
group by g.value
having count(1) > 1;
Time: <none, fails>

=> the last query eats up all 32GB main memory in seconds, then starts on the
32GB swap (which obviously takes a bit longer to digest) until eventually the
child process gets killed by the oom-killer and postgresql restarts....

--

Best,

Frank.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Valentine Gogichashvili 2013-10-19 19:02:25 Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Previous Message Honza Horak 2013-10-18 11:55:09 random() generates collisions too early