Re: Avoid sorting when doing an array_agg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kiriakos Georgiou <kg(dot)postgresql(at)olympiakos(dot)com>, Alexis Woo <awoo2611(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Avoid sorting when doing an array_agg
Date: 2016-12-04 01:20:58
Message-ID: 1480814458.3931.11.camel@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 2016-12-03 at 13:08 -0500, Kiriakos Georgiou wrote:
> The array_agg() has nothing to do with it.  It’s the group by.
> Without knowing what you are conceptually trying to accomplish, I
> can’t say much.

It *IS* caused by array_agg(). PostgreSQL can only do HashAggregate
when everything fits into memory, and in this case has to deal with
aggregate states of unknown size, so assumes each state is 1kB IIRC.

Per the plan the group by is expected to produce ~27k groups, so needs
about 30MB for the HashAggregate.  

> On my test 9.4.10 db, a similar example does a HashAggregate, so no
> sorting (google HashAggregate vs GroupAggregate).  But still it’s an
> expensive query because of all the I/O.

The query does almost no I/O, actually. The bitmap heap scan takes only
~230ms, which is not bad considering it produces ~1M rows. The
expensive part here seems to be the sort, but I doubt it's because of
I/O because it only produces temporary files that likely stay in RAM
anyway.

So the sort is probably slow because of CPU, as it compares strings. In
some locales that may be very expensive - not sure which locale is used
in this case, as it was not mentioned. 

> If I wanted to instantly have the user ids for a specific first, last
> name and category combo, I’d maintain a summary table via an insert
> trigger on the users table.
>  

Maybe. The question is whether it'll be a net win - maintenance of the
summary table will not be for free, especially with arrays of ids.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steve Atkins 2016-12-04 02:32:05 Re: Index size
Previous Message Gmail 2016-12-04 00:59:22 Re: Postrgres-XL and Postgres-BDR