Re: Memory Issue with array_agg?

From: Robert Sosinski <rsosinski(at)ticketevolution(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Memory Issue with array_agg?
Date: 2013-08-19 15:59:38
Message-ID: CAH1hji3h=v5VyQYaE732KpcTn-DXQqmmey=d=TKXFcPWSyNrcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At the moment, all guids are distinct, however before I zapped the
duplicates, there were 280 duplicates.

Currently, there are over 2 million distinct guids.

-Robert

On Mon, Aug 19, 2013 at 11:12 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

>
>
>
> 2013/8/19 Robert Sosinski <rsosinski(at)ticketevolution(dot)com>
>
>> Hi Pavel,
>>
>> What kind of example do you need? I cant give you the actual data I have
>> in the table, but I can give you an example query and the schema attached
>> below. From there, I would just put in 2 million rows worth 1.2 Gigs of
>> data. Average size of the the extended columns (using the pg_column_size
>> function) in bytes are:
>>
>> guid: 33
>> name: 2.41
>> currency: 4
>> fields: 120.32
>>
>> example query:
>>
>> -- find duplicate records using a guid
>> select guid, array_agg(id) from orders group by guid;
>>
>
> how much distinct guid is there, and how much duplicates
>
> ??
>
> regards
>
> Pavel
>
>
>
>>
>> example schema:
>> Table "public.things"
>>
>> Column | Type |
>> Modifiers | Storage | Stats target | Description
>>
>> ------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
>> id | integer | not null default
>> nextval('things_id_seq'::regclass) | plain | |
>> version | integer | not null
>> | plain | |
>> created_at | timestamp without time zone | not null
>> | plain | |
>> updated_at | timestamp without time zone | not null
>> | plain | |
>> foo_id | integer | not null
>> | plain | |
>> bar_id | integer | not null
>> | plain | |
>> baz_id | integer | not null
>> | plain | |
>> guid | character varying | not null
>> | extended | |
>> name | character varying | not null
>> | extended | |
>> price | numeric(12,2) | not null
>> | main | |
>> currency | character varying | not null
>> | extended | |
>> amount | integer | not null
>> | plain | |
>> the_date | date | not null
>> | plain | |
>> fields | hstore |
>> | extended | |
>> Indexes:
>> "things_pkey" PRIMARY KEY, btree (id)
>> "things_foo_id_idx" btree (foo_id)
>> "things_bar_id_idx" btree (bar_id)
>> "things_baz_id_idx" btree (baz_id)
>> "things_guid_uidx" UNIQUE, btree (guid)
>> "things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
>> '0'::text))
>> "things_price_idx" btree (price)
>>
>> Foreign-key constraints:
>> "things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
>> "things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
>> "things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
>> Triggers:
>> timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW
>> EXECUTE PROCEDURE timestamps_tfun()
>>
>> Let me know if you need anything else.
>>
>> Thanks,
>>
>>
>> On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:
>>
>>> Hello
>>>
>>> please, can you send some example or test?
>>>
>>> Regards
>>>
>>> Pavel Stehule
>>>
>>>
>>> 2013/8/19 Robert Sosinski <rsosinski(at)ticketevolution(dot)com>
>>>
>>>> When using array_agg on a large table, memory usage seems to spike up
>>>> until Postgres crashes with the following error:
>>>>
>>>> 2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
>>>> because of crash of another server process
>>>> 2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has
>>>> commanded this server process to roll back the current transaction and
>>>> exit, because another server process exited abnormally and possibly
>>>> corrupted shared memory.
>>>> 2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be
>>>> able to reconnect to the database and repeat your command.
>>>>
>>>> I've definitely isolated it down to using array_agg, as when I changed
>>>> the query to use string_agg, it worked fine. I also tried using array_agg
>>>> on a few different queries, all yielding the same issue. Swapping in
>>>> string_agg fixed the issue once more.
>>>>
>>>> This particular table has over 2 million rows and is 1.2 Gigs, and when
>>>> I ran the query while viewing htop, the virtual size of the Postgres
>>>> process ballooned to 13.9G until crashing.
>>>>
>>>> The version of Postgres I am using is: PostgreSQL 9.2.4 on
>>>> x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
>>>> 4.7.2, 64-bit
>>>>
>>>> Any help would be much appreciated, thanks!
>>>>
>>>> -Robert
>>>>
>>>
>>>
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Granthana Biswas 2013-08-19 16:58:09 AccessShareLock on pg_authid
Previous Message Pavel Stehule 2013-08-19 15:12:15 Re: Memory Issue with array_agg?