From: | Kiriakos Georgiou <kg(dot)postgresql(at)olympiakos(dot)com> |
---|---|
To: | Alexis Woo <awoo2611(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Avoid sorting when doing an array_agg |
Date: | 2016-12-03 18:08:23 |
Message-ID: | 58C699B6-4616-405A-B28E-ED177866E1F3@olympiakos.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
The array_agg() has nothing to do with it. It’s the group by.
Without knowing what you are conceptually trying to accomplish, I can’t say much.
On my test 9.4.10 db, a similar example does a HashAggregate, so no sorting (google HashAggregate vs GroupAggregate). But still it’s an expensive query because of all the I/O.
If I wanted to instantly have the user ids for a specific first, last name and category combo, I’d maintain a summary table via an insert trigger on the users table.
Kiriakos
From: <pgsql-general-owner(at)postgresql(dot)org> on behalf of Alexis Woo <awoo2611(at)gmail(dot)com>
Date: Friday, December 2, 2016 at 12:25 PM
To: <pgsql-general(at)postgresql(dot)org>
Subject: [GENERAL] Avoid sorting when doing an array_agg
I have a users table which contains ~70 million rows that looks like this:
Column | Type |
-------------+-------------------+
id | integer |
first_name | character varying |
last_name | character varying |
category_id | integer |
Indexes:
"users_id_idx" btree (id)
"users_category_id_first_name_last_name_idx" btree (category_id, first_name, last_name)
I'm trying to retrieve the ids for each (first_name, last_name) couple for one specific category_id.
The query that I'm currently doing is the following:
select array_agg(id)
from users
where category_id = 5432
group by first_name, last_name;
For which the explain analyze output is the following:
GroupAggregate (cost=618461.35..626719.42 rows=26881 width=19) (actual time=1683.139..2613.386 rows=102943 loops=1)
Group Key: first_name, last_name
-> Sort (cost=618461.35..620441.86 rows=792206 width=19) (actual time=1683.116..2368.904 rows=849428 loops=1)
Sort Key: first_name, last_name
Sort Method: external merge Disk: 25304kB
-> Bitmap Heap Scan on users (cost=26844.16..524595.92 rows=792206 width=19) (actual time=86.046..229.469 rows=849428 loops=1)
Recheck Cond: (category_id = 5432)
Heap Blocks: exact=7938
-> Bitmap Index Scan on users_category_id_first_name_last_name_idx (cost=0.00..26646.11 rows=792206 width=0) (actual time=85.006..85.006 rows=849428 loops=1)
Index Cond: (category_id = 5432)
What seems to greatly decrease the performance of the query is the "Sort Method: external merge Disk: 7526kB."
Is it possible to aggregate the ids without doing a sort ?
If not, what other options, apart from increasing the work_mem, do I have ?
Thanks,
Alexis
From | Date | Subject | |
---|---|---|---|
Next Message | Joseph Brenner | 2016-12-03 20:08:30 | Select works only when connected from login postgres |
Previous Message | rob stone | 2016-12-03 15:23:33 | Re: No select privileges when not connecting from login postgres |