Re: PATCH: decreasing memory needlessly consumed by array_agg

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Date: 2014-12-20 08:26:51
Message-ID: CACQjQLrsbbN=MQxOAAnS1k10gkK3H-EH3ERHELkWmqp-dh2dNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-12-16 11:01 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:
>
>
>
> 2014-12-16 10:47 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:
>>
>>
>> 2014-12-16 6:27 GMT+07:00 Tomas Vondra <tv(at)fuzzy(dot)cz>:
>> Just fast-viewing the patch.
>>
>> The patch is not implementing the checking for not creating new context
>> in initArrayResultArr. I think we should implement it also there for
>> consistency (and preventing future problems).
>>
>

Testing the performance with your query, looks promising: speedup is
between 12% ~ 15%.

Because i'm using 32-bit systems, setting work_mem to 1024GB failed:

> ERROR: 1073741824 is outside the valid range for parameter "work_mem" (64
> .. 2097151)
> STATEMENT: SET work_mem = '1024GB';
> psql:/media/truecrypt1/oss/postgresql/postgresql/../patches/array-agg.sql:20:
> ERROR: 1073741824 is outside the valid range for parameter "work_mem" (64
> .. 2097151)
>

Maybe because of that, in the large groups a test, the speedup is awesome:

> master: 16,819 ms
>
with patch: 1,720 ms
>
Looks like with master, postgres resort to disk, but with the patch it fits
in memory.

Note: I hasn't tested the large dataset.

As expected, testing array_agg(anyarray), the performance is still the
same, because the subcontext hasn't implemented there (test script modified
from Tomas', attached).

I implemented the subcontext checking in initArrayResultArr by changing the
v3 patch like this:

> +++ b/src/backend/utils/adt/arrayfuncs.c
> @@ -4797,10 +4797,11 @@ initArrayResultArr(Oid array_type, Oid
> element_type, MemoryContext rcontext,
> bool subcontext)
> {
> ArrayBuildStateArr *astate;
> - MemoryContext arr_context;
> + MemoryContext arr_context = rcontext; /* by default use the parent
> ctx */
>
> /* Make a temporary context to hold all the junk */
> - arr_context = AllocSetContextCreate(rcontext,
> + if (subcontext)
> + arr_context = AllocSetContextCreate(rcontext,
> "accumArrayResultArr",
> ALLOCSET_DEFAULT_MINSIZE,
> ALLOCSET_DEFAULT_INITSIZE,
>

Testing the performance, it got the 12%~15% speedup. Good. (patch attached)

Looking at the modification in accumArrayResult* functions, i don't really
> comfortable with:
>
> 1. Code that calls accumArrayResult* after explicitly calling
> initArrayResult* must always passing subcontext, but it has no effect.
> 2. All existing codes that calls accumArrayResult must be changed.
>
> Just an idea: why don't we minimize the change in API like this:
>
> 1. Adding parameter bool subcontext, only in initArrayResult*
> functions but not in accumArrayResult*
> 2. Code that want to not creating subcontext must calls
> initArrayResult* explicitly.
>
> Other codes that calls directly to accumArrayResult can only be changed in
> the call to makeArrayResult* (with release=true parameter). In places that
> we don't want to create subcontext (as in array_agg_transfn), modify it to
> use initArrayResult* before calling accumArrayResult*.
>
> What do you think?
>

As per your concern about calling initArrayResult* with subcontext=false,
while makeArrayResult* with release=true:

> Also, it seems that using 'subcontext=false' and then 'release=true'
> would be a bug. Maybe it would be appropriate to store the
> 'subcontext' value into the ArrayBuildState and then throw an error
> if makeArrayResult* is called with (release=true && subcontext=false).
>

Yes, i think we should do that to minimize unexpected coding errors. In
makeArrayResult*, i think its better to not throwing an error, but using
assertions:

> Assert(release == false || astate->subcontext == true);
>

Regards,
--
Ali Akbar

Attachment Content-Type Size
array-agg-anyarray.sql application/sql 3.3 KB
array-agg-v3-modified.patch text/x-diff 15.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2014-12-20 10:16:47 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Christian Ullrich 2014-12-20 08:06:13 Re: New Python vs. old PG on raccoon and jaguarundi