extreme memory use when loading in a lot of data

From: Vivek Khera <khera(at)kcilink(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: extreme memory use when loading in a lot of data
Date: 2004-05-21 19:02:20
Message-ID: x7vfipmwn7.fsf@yertle.int.kciLink.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have some historic data that I want to analyze. To do this I set up
postgres on a spare box I picked up for cheap, which just lucked into
having tons of RAM (1.5G). I set up postgres to use 10000 buffers,
and recompiled the kernel to allow 2Gb data size limit per process.

Since this is historical data, I'm actually merging a couple of dumps
that span the time range. I've dealt with eliminating any conflicting
data (ie, clashing unique keys) but I'm not 100% sure that the foreign
key constraints are all met. Thus, when loading the data from the
second dump, I am leaving the FK triggers on.

Now, this is where my trouble has begun... On importing row 29,796,801
for the first big table, I get this (after 27 hours!):

pg_restore: ERROR: out of memory
DETAIL: Failed on request of size 44.
CONTEXT: COPY msg_recipients, line 29796801: "75312 3434358 sent"
pg_restore: [archiver (db)] error returned by PQendcopy

(I had originally tried this with a 1Gb data size limit on the
process, and it died at around row 15 million.)

The curious thing is that watching the process size grow on another
window, is that it shrunk considerably a few minutes before it
croaked:

% ps axuw | grep postg
pgsql 26032 6.0 7.9 2186888 123440 ?? D Thu09AM 122:26.10 postmaster: khera vkmlm [local] COPY (postgres)

Note here that we are using nearly 2gb virtual memory (I'm sure a
bunch of that is stack and the shared memory segment).

a few minutes later I saw this:

% ps axuw | grep postg
pgsql 26032 10.4 5.5 91840 85624 ?? S Thu09AM 123:17.24 postmaster: khera vkmlm [local] idle (postgres)

note the process size is down to 91M. A few minutes later I got the
out of memory error. This is very curious because I don't expect the
process to release the memory back to the OS like that.

There are about 157 million records in this table, three columns wide.

FreeBSD 4.10-PRERELEASE, PostgreSQL 7.4.2

So, is there any way to load this in or do I need to either break it
into chunks (no small feat) or disable triggers during load and hope
and pray there are no FK violations?

Is there some leak or does it just take that much RAM to load in data
from a table?

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: khera(at)kciLink(dot)com Rockville, MD +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vivek Khera 2004-05-21 19:13:18 Re: Is Linux 2.6.5 kernel good enough for production?
Previous Message Carl E. McMillin 2004-05-21 18:30:55 Re: Function with RETURN TYPE RECORD Called From JAVA