Memory leaks on SRF rescan

From: Neil Conway <neilc(at)samurai(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Memory leaks on SRF rescan
Date: 2008-02-22 02:11:06
Message-ID: 1203646266.20306.67.camel@goldbach
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Consider this test case:

create table baz (a int, b int);
insert into baz (select 1, generate_series(1, 3000000));

select baz.*, (select 1 from generate_series(1, 1) g(a)
where g.a < baz.b)
from baz;

The final query consumes ~800 MB of memory on my system before it
finishes executing. The guilty memory context is pretty obvious:

ExecutorState: 612360192 total in 82 blocks; 1664004 free (117
chunks); 610696188 used

Two different classes of allocations in the EState's per-query context
are leaked:

(1) In FunctionNext(), we call ExecMakeTableFunctionResult() to compute
the result set of the SRF, which allocates the TupleDesc out-parameter
in the per-query memory context. Since rescan of a function scan (with
chgParam != NULL) results in taking this code path again, we can leak 1
TupleDesc for each rescan of a function scan. I think this is plainly a
bug -- the first attached patch fixes it.

(2) In various SRFs, allocations are made in the "multi-call context"
but are not released before calling SRF_RETURN_DONE. This results in
leaking those allocations for the duration of the query, since the
multi-call context is the EState's per-query context. There appears to
have been some thought that the SRF author would be enlightened enough
to free allocations made in the multi-call context before calling
SRF_RETURN_DONE, but this is rarely done. From a cursory look at the
builtin SRFs, generate_series_step_int4, generate_series_step_int8,
pg_logdir_ls, ts_token_type_byid, ts_token_type_byname, ts_parse_byid,
ts_parse_byname and pg_timezone_abbrevs all get this wrong, at which
point I stopped looking further. I wouldn't think very many user-written
SRFs have gotten this right either.

We could fix this by patching all the guilty SRFs (the second attached
patch does this for the int4 and int8 variants of generate_series()),
but I think it would be more robust to arrange for the FuncCallContext's
multi-call context to have a shorter lifetime: instead of using the
EState's per-query context, we could instead use a context that is
deleted after SRF_RETURN_DONE() is called (or at a minimum, reset each
time the function scan is rescanned).

BTW, it seems to me that shutdown_MultiFuncCall() is wrong in any case:
it frees the SRF's AttInMetadata, but not any of its fields.

Thanks to my colleague Alan Li at Truviso for reporting this issue.

-Neil

Attachment Content-Type Size
fscan_tdesc_rescan_leak-1.patch text/x-patch 654 bytes
int4_int8_gen_series_leak-1.patch text/x-patch 1.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2008-02-22 02:24:35 Re: Including PL/PgSQL by default
Previous Message ITAGAKI Takahiro 2008-02-22 00:57:48 Re: Batch update of indexes on data loading