From: | luben karavelov <karavelov(at)mail(dot)bg> |
---|---|
To: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #6763: Severe memory leak with arrays and hstore |
Date: | 2012-07-26 13:55:05 |
Message-ID: | 10398EEF-03DE-405E-B276-06F73ABF9E3A@mail.bg |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Jul 26, 2012, at 11:17 AM, Craig Ringer wrote:
> On 07/26/2012 09:32 AM, karavelov(at)mail(dot)bg wrote:
>> Finally I have managed to migrate it in batches of 100-200k user ids and
>> disconnecting after each query in order to free the backend and leaked
>> memory.
> If you do it in batches, but you do NOT disconnect and reconnect, does the backend continue to grow?
>
> What's the output of:
>
> SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM (
> SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) AS sub;
>
> and
>
> SELECT pg_size_pretty(pg_total_relation_size('old_prefs'));
>
> ?
>
> --
> Craig Ringer
>
-
Sorry for the broken formatting in the last message. Here it is again:
Ok, I will do the procedure again with taking notes on each step.
First, here are the results of the queries you asked:
pg=> SELECT count(sub.user_id), to_char(AVG(sub.n_prefs), '99999.99') FROM (
SELECT user_id, count(name) AS n_prefs FROM old_prefs GROUP BY user_id) AS sub;
count | to_char
---------+-----------
1257262 | 2.26
(1 row)
pg=> SELECT pg_size_pretty(pg_total_relation_size('old_prefs'));
pg_size_pretty
----------------
264 MB
(1 row)
pg=> \d old_prefs
Table "public.old_prefs"
Column | Type | Modifiers
---------+-------------------+-----------
user_id | integer | not null
name | character varying | not null
value | character varying | not null
Indexes:
"old_prefs_user_id_ids" btree (user_id)
Also there are max of 34 rows per user_id in old_prefs
Here is the new table I just created:
pg=> \d new_preferences
Table "public.new_preferences"
Column | Type | Modifiers
---------+---------+-----------
user_id | integer | not null
prefs | hstore |
Indexes:
"new_preferences_pkey" PRIMARY KEY, btree (user_id)
Foreign-key constraints:
"new_preferences_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
Here is the newly connected backend:
root(at)pg:/var/log# ps axu | egrep '10.0.2.71|USER' | grep -v grep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 0.0 0.0 2266944 3448 ? Ss 15:23 0:00 postgres: pg pg 10.0.2.71(51734) idle
Migrating the first 200k of the users to the new scheme:
pg=> select count(*) from old_prefs where user_id<200000;
count
--------
174767
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id<200000 GROUP BY user_id;
INSERT 0 48993
pg=> commit;
COMMIT
Here is the backend:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 0.8 7.1 3081772 582712 ? Ss 15:23 0:02 postgres: pg pg 10.0.2.71(51734) idle
Migrating another batch of users:
pg => select count(*) from old_prefs where user_id>=200000 and user_id<600000;
count
--------
193824
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=200000 AND user_id<600000 GROUP BY user_id;
INSERT 0 54157
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 1.1 8.5 3176164 697444 ? Ss 15:23 0:05 postgres: pg pg 10.0.2.71(51734) idle
Another batch:
pg=> select count(*) from old_prefs where user_id>=600000 and user_id<1100000;
count
--------
190504
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=600000 AND user_id<1100000 GROUP BY user_id;
INSERT 0 56199
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 0.7 9.6 3210224 791404 ? Ss 15:23 0:08 postgres: pg pg 10.0.2.71(51734) idle
Another batch:
pg=> select count(*) from old_prefs where user_id>=1100000 and user_id<1600000;
count
--------
194965
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=1100000 AND user_id<1600000 GROUP BY user_id;
INSERT 0 60257
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 0.9 10.8 3277412 889860 ? Ss 15:23 0:11 postgres: pg pg 10.0.2.71(51734) idle
So Pg backeng keep growing with 100M per 200k row from old table that became 50-60k rows in the new table
Proceeding with another batch:
pg=> select count(*) from old_prefs where user_id>=1600000 and user_id<2400000;
count
--------
170858
(1 row)
Time: 83,994 ms
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=1600000 AND user_id<2400000 GROUP BY user_id;
INSERT 0 55447
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 0.9 11.5 3277412 945560 ? Ss 15:23 0:15 postgres: pg pg 10.0.2.71(51734) idle
Another batch:
pg=> select count(*) from old_prefs where user_id>=2400000 and user_id<3400000;
count
--------
200614
(1 row)
Time: 83,409 ms
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=2400000 AND user_id<3400000 GROUP BY user_id;
INSERT 0 87940
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 1.2 16.2 3736968 1331796 ? Ss 15:23 0:20 postgres: pg pg 10.0.2.71(51734) idle
Another batch:
pg => select count(*) from old_prefs where user_id>=3400000 and user_id<3800000;
count
--------
161390
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3400000 AND user_id<3800000 GROUP BY user_id;
ERROR: insert or update on table "new_preferences" violates foreign key constraint "new_preferences_user_id_fkey"
DETAIL: Key (user_id)=(3615131) is not present in table "users".
pg=> rollback;
ROLLBACK
Ops.. have to cleanup the old_prefs, some users were deleted in the meantime:
pg=> delete from old_prefs where user_id not in (select user_id from users);
DELETE 7
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 1.4 26.4 4469520 2157588 ? Ss 15:23 0:29 postgres: pg pg 10.0.2.71(51734) idle
Near 1G grow on rolled back transaction....
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3400000 AND user_id<3800000 GROUP BY user_id;
INSERT 0 131803
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 1.7 26.6 4479944 2180536 ? Ss 15:23 0:35 postgres: pg pg 10.0.2.71(51734) idle
Another batch, bigger this time:
pg=> select count(*) from old_prefs where user_id>=3800000 and user_id<4200000;
count
--------
327374
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=3800000 AND user_id<4200000 GROUP BY user_id;
INSERT 0 177044
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 1.9 33.1 5238968 2710756 ? Ss 15:23 0:45 postgres: pg pg 10.0.2.71(51734) idle
Another big batch:
pg=> select count(*) from old_prefs where user_id>=4200000 and user_id<4400000;
count
--------
375352
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=4200000 AND user_id<4400000 GROUP BY user_id;
INSERT 0 189095
pg=> commit;
COMMIT
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 2.2 35.7 5438412 2918720 ? Ss 15:23 0:55 postgres: pg pg 10.0.2.71(51734) idle
Now a smaller batch:
pg=> select count(*) from old_prefs where user_id>=4400000 and user_id<4500000;
count
--------
219249
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=4400000 AND user_id<4500000 GROUP BY user_id;
INSERT 0 99782
pg=> commit;
COMMIT
RSS keeps growing:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 2.2 36.0 5438412 2943784 ? Ss 15:23 1:00 postgres: pg pg 10.0.2.71(51734) idle
Lets see if a bigger batch will pass:
pg=> select count(*) from old_prefs where user_id>=4500000;
count
--------
631911
(1 row)
pg=> INSERT INTO new_preferences SELECT user_id,hstore(array_agg(name), array_agg(value)) FROM old_prefs WHERE user_id>=4500000 GROUP BY user_id;
INSERT 0 296541
pg=> commit;
COMMIT
Ok, this time it passed, but the backend is over 4G
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 19121 2.2 50.0 7227968 4088928 ? Ss 15:23 1:17 postgres: pg pg 10.0.2.71(51734) idle
Some observations:
1. Backend does not free allocated memory between transactions.
2. Rolled back transactions also leak memory.
3. Leaked memory is not linear to work done - 2 transactions with 200k keys will leak less than 1 transaction with 400k keys
Regarding Tom's question:
The old_prefs does not fit in work_mem but is quite small regarding the total RAM. Isn't the "work_mem" a limit of the memory each backend could allocate for
sorting, grouping and aggregation? My understanding is that bigger allocation will overflow to disk and will not kill the server. I could be wrong though.
Thanks in advance and best regards
--
Luben Karavelov
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-07-26 16:41:19 | Re: BUG #6758: ./configure script sets HAVE_WCSTOMBS_L 1 |
Previous Message | karavelov | 2012-07-26 13:37:21 | Re: BUG #6763: Severe memory leak with arrays and hstore |