9.0 Out of memory

From: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: 9.0 Out of memory
Date: 2011-04-13 04:02:20
Message-ID: 666FB8D75E95AE42965A0E76A5E5337E06DCFAEA37@prdlsmmsg01.ad.linz.govt.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi All,

I running PostgreSQL 9.0.3 and getting an out of memory error while running a big transaction. This error does not crash the backend.

The nature of this transaction is it is sequentially applying data updates to a large number (104) of tables, then after applying those updates, a series of materialised views are created or updated using a set of pl/pgsql function. I need this operation to be one transaction so I can rollback the update if any issue occurs.

The size of the database and the database is 350GB and I would characterise this system as a data warehouse.

At this stage I can't isolate the problem down to a simple use case or even smaller subset of the transaction, which would have been nice for posting to this list. I can only replicate the error when I run the "entire" transaction - running the table update part or materialised views parts separately work fine. To make matter worse I do not see any context around my error "out of memory" message - which makes the message useless.

The actual error occurs during a table compare operation within a function which uses 2 cursor to scan for differences. However directly before this I have a query that generates a temp table that I had trouble with earlier (when initially tunning the server) and had to set the work_mem to 1MB so it would run:

CREATE TEMP TABLE tmp_titles AS
SELECT
TTL.audit_id AS id,
TTL.title_no,
TTL.status,
TTLT.char_value AS type,
LOC.name AS land_district,
TTL.issue_date,
TTLG.char_value AS guarantee_status,
string_agg(
DISTINCT(
ETTT.char_value || ', ' ||
ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '')
),
E'\r\n'
ORDER BY
ETTT.char_value || ', ' ||
ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ASC
) AS estate_description,
string_agg(
DISTINCT
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END,
', '
ORDER BY
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END ASC
) AS owners,
count(
DISTINCT
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END
) AS number_owners,
TPA.title_no IS NOT NULL AS part_share,
-- With Postgis 1.5.2 the ST_Collect aggregate returns a truncated
-- collection when a null value is found. To fix this the shapes
-- are order so all null shapes row are at the end of input list.
ST_Multi(ST_Collect(PAR.shape ORDER BY PAR.shape ASC)) AS shape
FROM
crs_title TTL
LEFT JOIN crs_title_estate ETT ON TTL.title_no = ETT.ttl_title_no AND ETT.status = 'REGD'
LEFT JOIN crs_estate_share ETS ON ETT.id = ETS.ett_id AND ETT.status = 'REGD'
LEFT JOIN crs_proprietor PRP ON ETS.id = PRP.ets_id AND PRP.status = 'REGD'
LEFT JOIN crs_legal_desc LGD ON ETT.lgd_id = LGD.id AND LGD.type = 'ETT' AND LGD.status = 'REGD'
LEFT JOIN crs_legal_desc_prl LGP ON LGD.id = LGP.lgd_id
LEFT JOIN (
SELECT
title_no
FROM
tmp_parcel_titles
GROUP BY
title_no
HAVING
count(*) > 1
) TPA ON TTL.title_no = TPA.title_no
LEFT JOIN (
SELECT
id,
(ST_Dump(shape)).geom AS shape
FROM
crs_parcel
WHERE
status = 'CURR' AND
ST_GeometryType(shape) IN ('ST_MultiPolygon', 'ST_Polygon')
) PAR ON LGP.par_id = PAR.id
JOIN crs_locality LOC ON TTL.ldt_loc_id = LOC.id
JOIN crs_sys_code TTLG ON TTL.guarantee_status = TTLG.code AND TTLG.scg_code = 'TTLG'
JOIN crs_sys_code TTLT ON TTL.type = TTLT.code AND TTLT.scg_code = 'TTLT'
LEFT JOIN crs_sys_code ETTT ON ETT.type = ETTT.code AND ETTT.scg_code = 'ETTT'
WHERE
TTL.status IN ('LIVE', 'PRTC') AND
TTL.title_no NOT IN (SELECT title_no FROM tmp_excluded_titles)
GROUP BY
TTL.audit_id,
TTL.title_no,
TTL.status,
TTLT.char_value,
LOC.name,
TTL.issue_date,
TTLG.char_value,
TPA.title_no;

This query does a lot of string concatenation and uses the new 9.0 string_agg function. The result set is about 2million rows.

So down to questions:

1) Is there anything I can do to get more information out about this memory error message? Debug build maybe, attach a debugger and set some break points?

2) Is there anything I can do to optimise the memory so this transaction might be able to run?

3) I noticed in the Top Memory Context dump in the server log contained a lot of the MCs for table indexes/PKs (about 850) See http://pastebin.com/346zi2sS. Is this a problem? Could these MCs be cleaned-up part way thought the transaction? Maybe use a savepoint? Or do they exist for the life of session or transaction?

4) Can anyone help me make sense of the top transaction memory error to help track down the issue? What number should I be looking for?

Well I'm pretty much at the end of the line of in terms of getting PostgreSQL to work for this task. So any guidance would be soooo much appreciated.

Thanks again,
Jeremy

PS. My config:

The OS I'm running is:

Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux.

It's a running on VMWare and, has 2 CPU's and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running other than cacti, ssh and ftp server daemons. The main OS parameters I have tuned are:

vm.swappiness=0
vm.overcommit_memory=2
kernel.shmmax = 4196769792
kernel.shmall = 1024602

And the PostgreSQL is:

PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit.

The main changed postgresql.conf parameters I've tuned are:

shared_buffers = 512MB
maintenance_work_mem = 512MB
temp_buffers = 256MB
work_mem = 1MB
wal_buffers = 16MB
effective_cache_size = 4094MB

The typical number of users connected to the database is 1 or 2.

______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info(at)linz(dot)govt(dot)nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Carlo Stonebanks 2011-04-13 04:10:54 Re: Revisiting UPDATE FROM ... ORDER BY not respected
Previous Message Fujii Masao 2011-04-13 02:28:12 Re: Trying out replication: cp cannot stat log file during recovery