Re: PATCH: decreasing memory needlessly consumed by array_agg

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Date: 2014-03-31 19:04:03
Message-ID: CA+TgmobM_NsGf2eGuw8KhqgnHNmnGTuGN52LBnFRuf9XEU5CAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 27, 2014 at 10:00 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> The patch also does one more thing - it changes how the arrays (in the
> aggregate state) grow. Originally it worked like this
>
> /* initial size */
> astate->alen = 64;
>
> /* when full, grow exponentially */
> if (astate->nelems >= astate->alen)
> astate->alen *= 2;
>
> so the array length would grow like this 64 -> 128 -> 256 -> 512 ...
> (note we're talking about elements, not bytes, so with with 32-bit
> integers it's actually 256B -> 512B -> 1024B -> ...).
>
> While I do understand the point of this (minimizing palloc overhead), I
> find this pretty dangerous, especially in case of array_agg(). I've
> modified the growth like this:
>
> /* initial size */
> astate->alen = 4;
>
> /* when full, grow exponentially */
> if (astate->nelems >= astate->alen)
> astate->alen += 4;
>
> I admit that might be a bit too aggressive, and maybe there's a better
> way to do this - with better balance between safety and speed. I was
> thinking about something like this:
>
>
> /* initial size */
> astate->alen = 4;
>
> /* when full, grow exponentially */
> if (astate->nelems >= astate->alen)
> if (astate->alen < 128)
> astate->alen *= 2;
> else
> astate->alen += 128;
>
> i.e. initial size with exponential growth, but capped at 128B.

So I think this kind of thing is very sensible, but the last time I
suggested something similar, I got told "no":

http://www.postgresql.org/message-id/CAEYLb_WLGHT7yJLaRE9PPeRt5RKd5ZJbb15tE+kpgejgQKORyA@mail.gmail.com

But I think you're right and the objections previously raised are
wrong. I suspect that the point at which we should stop doubling is
higher than 128 elements, because that's only 8kB, which really isn't
that big - and the idea that the resizing overhead takes only
amortized constant time is surely appealing. But I still think that
doubling *forever* is a bad idea, here and there. The fact that we've
written the code that way in lots of places doesn't make it the right
algorithm.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-03-31 19:04:55 Re: Still something fishy in the fastpath lock stuff
Previous Message David Johnston 2014-03-31 18:59:19 Re: PQputCopyData dont signal error