Re: 8.3rc1 Out of memory when performing update

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-28 05:04:00
Message-ID: F0238EBA67824444BC1CB4700960CB48048E0ABA@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > So dropping the fk constraint and index results in
> successful query execution with constant memory usage. Does
> this confirm that the memory leak you found is the one I was
> suffering from?
>
> Well, it confirms that you were suffering from that memory
> leak. What's
> not clear is whether the leak completely explains the bad performance
> you saw. The leak would have resulted in extra swapping, but
> I wouldn't
> have thought it would drive the machine completely into swap hell.

The query crashing from out of memory did so after an hour,
which isn't bad performance given the workaround with less indexes to update succeeded after 45 minutes.

It was the rewritten one which I killed after 3 days.

> Would the monitoring tools you were using have shown swapping I/O?

I was using Process Explorer, which shows page faults and deltas,
which are not included in the read & write IO stats.
The query with poor IO performance wasn't swapping.

> > I'd also class the extremely poor performance of the
> alternate query as a bug.
>
> Yeah, we should look into that. The plan you showed before estimated
> about 16.9M rows in "document" --- is that about right? What have you
> got work_mem set to?

Yes, 16894164 rows.
Exactly the same number of rows in document as in document_file.
[count(*) queries taking 38 and 63 seconds]

work_mem appears to be left as the default 1MB

I get 1023 temp files created straight away, which take four minutes (250s) to grow to about 448K each
(reading @ 5MB/s writing @ 2MB/s)
memory usage during this first phase slowly increased from 13.4M to 14.4M
then 1023 more temp files are created, and they grow to about 2170K each
(reading @ 2MB/s writing @ 2MB/s until the checkpoint starts, when the speed decreases to 200K/s, and doesn't increase again after the checkpoint finishes.)
memory usage during this first phase slowly increased from 22.5M to 26.4M
My concern is with what it then does. (Spends a fortnight doing really slow IO)

An hour's worth of logs from during this phase show 6 checkpoints, and 6 temp files reported (which seems to coincide with them being deleted):

2008-01-26 06:02:08.086 NZDT [3432]: [233-1] LOG: checkpoint starting: time
2008-01-26 06:03:28.916 NZDT [3432]: [234-1] LOG: checkpoint complete: wrote 899 buffers (21.9%); 0 transaction log file(s) added, 0 removed, 11 recycled; write=77.798 s, sync=2.750 s, total=80.830 s
2008-01-26 06:12:08.094 NZDT [3432]: [235-1] LOG: checkpoint starting: time
2008-01-26 06:12:23.407 NZDT [3824]: [209-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1321", size 2224520
2008-01-26 06:12:23.407 NZDT [3824]: [210-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:12:24.157 NZDT [3824]: [211-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.477", size 461356
2008-01-26 06:12:24.157 NZDT [3824]: [212-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:13:21.876 NZDT [3432]: [236-1] LOG: checkpoint complete: wrote 724 buffers (17.7%); 0 transaction log file(s) added, 0 removed, 17 recycled; write=71.500 s, sync=2.108 s, total=73.781 s
2008-01-26 06:22:08.024 NZDT [3432]: [237-1] LOG: checkpoint starting: time
2008-01-26 06:23:25.415 NZDT [3432]: [238-1] LOG: checkpoint complete: wrote 877 buffers (21.4%); 0 transaction log file(s) added, 0 removed, 11 recycled; write=74.141 s, sync=2.985 s, total=77.391 s
2008-01-26 06:29:36.311 NZDT [3824]: [213-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1350", size 2220990
2008-01-26 06:29:36.311 NZDT [3824]: [214-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:29:36.982 NZDT [3824]: [215-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.516", size 463540
2008-01-26 06:29:36.982 NZDT [3824]: [216-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:32:08.016 NZDT [3432]: [239-1] LOG: checkpoint starting: time
2008-01-26 06:33:19.501 NZDT [3432]: [240-1] LOG: checkpoint complete: wrote 872 buffers (21.3%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=69.062 s, sync=2.171 s, total=71.484 s
2008-01-26 06:42:08.101 NZDT [3432]: [241-1] LOG: checkpoint starting: time
2008-01-26 06:43:27.431 NZDT [3432]: [242-1] LOG: checkpoint complete: wrote 813 buffers (19.8%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=76.579 s, sync=2.592 s, total=79.329 s
2008-01-26 06:46:45.558 NZDT [3824]: [217-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1940", size 2229130
2008-01-26 06:46:45.558 NZDT [3824]: [218-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:46:46.246 NZDT [3824]: [219-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.631", size 459564
2008-01-26 06:46:46.246 NZDT [3824]: [220-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:52:08.078 NZDT [3432]: [243-1] LOG: checkpoint starting: time
2008-01-26 06:53:31.173 NZDT [3432]: [244-1] LOG: checkpoint complete: wrote 983 buffers (24.0%); 0 transaction log file(s) added, 0 removed, 13 recycled; write=78.203 s, sync=4.641 s, total=83.094 s

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tory M Blue 2008-01-28 06:09:12 Re: Postgres 8.2 memory weirdness
Previous Message Merlin Moncure 2008-01-28 04:25:26 Re: 1 or 2 servers for large DB scenario.