Skip site navigation (1) Skip section navigation (2)

Re: Memory Issue with array_agg?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Sosinski <rsosinski(at)ticketevolution(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Memory Issue with array_agg?
Date: 2013-08-19 15:12:15
Message-ID: CAFj8pRC=A1hFGP1GOv=gzzNMZg5oMT4XoAPCZqcV4naqu7UjHQ@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-general
2013/8/19 Robert Sosinski <rsosinski(at)ticketevolution(dot)com>

> Hi Pavel,
>
> What kind of example do you need?  I cant give you the actual data I have
> in the table, but I can give you an example query and the schema attached
> below.  From there, I would just put in 2 million rows worth 1.2 Gigs of
> data.  Average size of the the extended columns (using the pg_column_size
> function) in bytes are:
>
> guid: 33
> name: 2.41
> currency: 4
> fields: 120.32
>
> example query:
>
> -- find duplicate records using a guid
> select guid, array_agg(id) from orders group by guid;
>

how much distinct guid is there, and how much duplicates

??

regards

Pavel



>
> example schema:
>                                      Table "public.things"
>
>    Column   |            Type             |                      Modifiers
>                      | Storage  | Stats target | Description
>
> ------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
>  id         | integer                     | not null default
> nextval('things_id_seq'::regclass) | plain    |              |
>  version    | integer                     | not null
>                      | plain    |              |
>  created_at | timestamp without time zone | not null
>                      | plain    |              |
>  updated_at | timestamp without time zone | not null
>                      | plain    |              |
>  foo_id     | integer                     | not null
>                      | plain    |              |
>  bar_id     | integer                     | not null
>                      | plain    |              |
>  baz_id     | integer                     | not null
>                      | plain    |              |
>  guid       | character varying           | not null
>                      | extended |              |
>  name       | character varying           | not null
>                      | extended |              |
>  price      | numeric(12,2)               | not null
>                      | main     |              |
>  currency   | character varying           | not null
>                      | extended |              |
>  amount     | integer                     | not null
>                      | plain    |              |
>  the_date   | date                        | not null
>                      | plain    |              |
>  fields     | hstore                      |
>                       | extended |              |
> Indexes:
>     "things_pkey" PRIMARY KEY, btree (id)
>     "things_foo_id_idx" btree (foo_id)
>     "things_bar_id_idx" btree (bar_id)
>     "things_baz_id_idx" btree (baz_id)
>     "things_guid_uidx" UNIQUE, btree (guid)
>     "things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
> '0'::text))
>     "things_price_idx" btree (price)
>
> Foreign-key constraints:
>     "things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
>     "things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
>     "things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
> Triggers:
>     timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW EXECUTE
> PROCEDURE timestamps_tfun()
>
> Let me know if you need anything else.
>
> Thanks,
>
>
> On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:
>
>> Hello
>>
>> please, can you send some example or test?
>>
>> Regards
>>
>> Pavel Stehule
>>
>>
>> 2013/8/19 Robert Sosinski <rsosinski(at)ticketevolution(dot)com>
>>
>>> When using array_agg on a large table, memory usage seems to spike up
>>> until Postgres crashes with the following error:
>>>
>>> 2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
>>> because of crash of another server process
>>> 2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has commanded
>>> this server process to roll back the current transaction and exit, because
>>> another server process exited abnormally and possibly corrupted shared
>>> memory.
>>> 2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be able
>>> to reconnect to the database and repeat your command.
>>>
>>> I've definitely isolated it down to using array_agg, as when I changed
>>> the query to use string_agg, it worked fine.  I also tried using array_agg
>>> on a few different queries, all yielding the same issue.  Swapping in
>>> string_agg fixed the issue once more.
>>>
>>> This particular table has over 2 million rows and is 1.2 Gigs, and when
>>> I ran the query while viewing htop, the virtual size of the Postgres
>>> process ballooned to 13.9G until crashing.
>>>
>>> The version of Postgres I am using is: PostgreSQL 9.2.4 on
>>> x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
>>> 4.7.2, 64-bit
>>>
>>> Any help would be much appreciated, thanks!
>>>
>>> -Robert
>>>
>>
>>
>

In response to

Responses

pgsql-general by date

Next:From: Robert SosinskiDate: 2013-08-19 15:59:38
Subject: Re: Memory Issue with array_agg?
Previous:From: Robert SosinskiDate: 2013-08-19 15:06:47
Subject: Re: Memory Issue with array_agg?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group