Re: PATCH: decreasing memory needlessly consumed by array_agg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Date: 2015-01-08 22:34:37
Message-ID: 54AF05FD.6050008@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 8.1.2015 08:53, Ali Akbar wrote:
> In the CF, the status becomes "Needs Review". Let's continue our
> discussion of makeArrayResult* behavior if subcontext=false and
> release=true (more below):
> 2014-12-22 8:08 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com
> <mailto:the(dot)apaan(at)gmail(dot)com>>:
>
>
> With this API, i think we should make it clear if we call
> initArrayResult with subcontext=false, we can't call
> makeArrayResult, but we must use makeMdArrayResult directly.
>
> Or better, we can modify makeArrayResult to release according to
> astate->private_cxt:
>
> @@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
> dims[0] = astate->nelems;
> lbs[0] = 1;
>
> - return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
> true);
> + return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
> astate->private_cxt);

I've done this, so makeArrayResult() uses the private_cxt flag.

> Or else we implement what you suggest below (more comments below):
>
> Thinking about the 'release' flag a bit more - maybe we could do
> this
> instead:
>
> if (release && astate->private_cxt)
> MemoryContextDelete(astate->mcontext);
> else if (release)
> {
> pfree(astate->dvalues);
> pfree(astate->dnulls);
> pfree(astate);
> }
>
> i.e. either destroy the whole context if possible, and just free the
> memory when using a shared memory context. But I'm afraid this would
> penalize the shared memory context, because that's intended for
> cases where all the build states coexist in parallel and then at some
> point are all converted into a result and thrown away. Adding pfree()
> calls is no improvement here, and just wastes cycles.
>
>
> As per Tom's comment, i'm using "parent memory context" instead of
> "shared memory context" below.
>
> In the future, if some code writer decided to use subcontext=false,
> to save memory in cases where there are many array accumulation, and
> the parent memory context is long-living, current code can cause
> memory leak. So i think we should implement your suggestion
> (pfreeing astate), and warn the implication in the API comment. The
> API user must choose between release=true, wasting cycles but
> preventing memory leak, or managing memory from the parent memory
> context.

I'm wondering whether this is necessary after fixing makeArrayResult to
use the privat_cxt flag. It's still possible to call makeMdArrayResult
directly (with the wrong 'release' value).

Another option might be to get rid of the 'release' flag altogether, and
just use the 'private_cxt' - I'm not aware of a code using release=false
with private_cxt=true (e.g. to build the same array twice from the same
astate). But maybe there's such code, and another downside is that it'd
break the existing API.

> In one possible use case, for efficiency maybe the caller will
> create a special parent memory context for all array accumulation.
> Then uses makeArrayResult* with release=false, and in the end
> releasing the parent memory context once for all.

Yeah, although I'd much rather not break the existing code at all. That
is - my goal is not to make it slower unless absolutely necessary (and
in that case the code may be fixed per your suggestion). But I'm not
convinced it's worth it.

> As for the v6 patch:
> - the patch applies cleanly to master
> - make check is successfull
> - memory benefit is still there
> - performance benefit i think is negligible
>
> Reviewing the code, found this:
>
> @@ -573,7 +578,22 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
> elog(ERROR, "array_agg_array_transfn called in
> non-aggregate context");
> }
>
> - state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *)
> PG_GETARG_POINTER(0);
> +
> + if (PG_ARGISNULL(0))
> + {
> + Oid element_type = get_element_type(arg1_typeid);
> +
> + if (!OidIsValid(element_type))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATATYPE_MISMATCH),
> + errmsg("data type %s is not an array type",
> + format_type_be(arg1_typeid))));
>
>
> digging more, it looks like those code required because
> accumArrayResultArr checks the element type:
>
> /* First time through --- initialize */
> Oid element_type = get_element_type(array_type);
>
> if (!OidIsValid(element_type))
> ereport(ERROR,
> (errcode(ERRCODE_DATATYPE_MISMATCH),
> errmsg("data type %s is not an array type",
> format_type_be(array_type))));
> astate = initArrayResultArr(array_type, element_type,
> rcontext, true);
>
>
> I think it should be in initArrayResultArr, because it is an
> initialization-only check, shouldn't it?

Yes, seems reasonable. There are three places where initArrayResultArr
is called, and at two of them it looks just like the code you posted.

Oid element_type = get_element_type(array_type);

if (! OidIsValid(element_type)) { ... error ... }

astate = initArrayResultArr(array_type, element_type, ...)

The last place is initArrayResultAny() is different because it uses the
element_type to decide whether to use initArrayResultArr or plain
initArrayResult. So the error-handling is missing here.

But I think it makes sense to move the error handling into
initArrayResultArr(), including the get_element_type() call, and remove
the element_type from the signature. This means initArrayResultAny()
will call the get_element_type() twice, but I guess that's negligible.
And everyone who calls initArrayResultArr() will get the error handling
for free.

Patch v7 attached, implementing those two changes, i.e.

* makeMdArrayResult(..., astate->private_cxt)
* move error handling into initArrayResultArr()
* remove element_type from initArrayResultArr() signature

regards
Tomas Vondra

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
array-agg-v7.patch text/x-diff 12.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2015-01-08 23:50:23 Re: Turning recovery.conf into GUCs
Previous Message Peter Eisentraut 2015-01-08 21:01:39 Re: Turning recovery.conf into GUCs