out of memory woes

From: "Angva" <angvaw(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: out of memory woes
Date: 2006-12-13 21:49:08
Message-ID: 1166046548.628294.222790@n67g2000cwd.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi everyone,

First, this group has been good to me, and I thank you guys for the
valuable help I've found here. I come seeking help with another
problem. I am not even sure my problem lies in Postgres, but perhaps
someone here has had a similar problem and could point me in the right
direction.

As I've mentioned in a few other posts, I run a daily job that loads
large amounts of data into a Postgres database. It must run
efficiently, so one of the tricks I do is run table loads, and commands
such as cluster, in parallel. I am having a problem where my cluster
job consistently fails with an out of memory error. It did not always
do this, but at some point a memory leak or "something" built up to the
point where clustering would always fail on one of the two largest
tables. That is, four tables are clustered in parallel. The smaller of
the two finish successfully. The remaining two - the largest - run for
several minutes. Then one of the tables - not always the same one -
gets an out of memory error and fails.

So, suspecting a memory leak, I tried bouncing Postgres, and ran the
clusters again. No luck - failed in the same manner.

I don't know if it means anything, but swap never seems to be used by
the postgres processes (I stared at top and vmstat for a while), though
the VIRT column in top definitely shows something. Here are sample
lines from top while two processed are running:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30692 postgres 25 0 1033m 854m 459m R 100 14.3 1:10.68 postgres:
secmaster dev_stage [local] SELECT
30691 postgres 23 0 1752m 1.6g 459m R 100 27.5 2:55.60 postgres:
secmaster dev_stage [local] SELECT

(You see SELECT instead of CLUSTER because I wrapped up my cluster
commands in functions. I call them from psql by selecting the
function.)

Sample vmstat output:

procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
3 2 724 18392 5052 5599624 0 0 915 2352 5 3 11
2 80 8

It seems that the postgres processes do not want to use swap -- swap
never increases as the processes run. Again I am not sure whether this
is significant.

If I run the clusters sequentially, there is no error - they just take
too long. I was out of ideas so I bounced the server entirely and ran
the clusters in parallel a few times - success! But I don't want to
have to bounce the server regularly.

So, I'm not sure what to do. I need to figure out why the server would
deteriorate in such a way. Any advice that can be offered would be much
appreciated. I can provide any additional information that might be
necessary.

I am running Postgres 8.1.2 on CentOS 4.4 64-bit.

Thanks a lot,
Mark

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brandon Aiken 2006-12-13 21:55:00 INSERT INTO row value constructors
Previous Message Scott Marlowe 2006-12-13 21:43:40 Re: about the RULE system