Re: 8.3rc1 Out of memory when performing update

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-27 21:55:10
Message-ID: F0238EBA67824444BC1CB4700960CB480482B748@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
>>> I altered the update statement slightly, and reran the query.
>>> The altered query has been running over 3 hours now,
>>> without using lots of memory (38M private bytes).
>>> 2046 temp files were created (2.54GB worth),
>>> which have recently changed from slowly growing in size
>>> to very very slowly reducing in number.

To which Tom Lane replied:
>>Hmm. I think what that really means is you haven't got to the part of
>>the query where the leak is :-(.

I then said:
> It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened.

Well, it is now Monday morning for me, and those temp files are still slowly reducing in number.
There are now only 1629 of them left, so I'm guessing that the query is about 20% done.
The PC seems to have been steadily but very slowly working away at this very simple query for close to 70 hours.
I decided not to leave this query running for a fortnight to find out if I then strike the memory leak.
Private Bytes had grown to 685MB
I cancelled the query.

Rough snapshot of what was happening with IO (a single 7200 IDE disk):

The process for the update query was reading about 500KB/second , writing between 80KB/second to 200KB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was writing about 400KB/second
Checkpoints were 10 minutes apart, taking about 85 seconds to write 1000 buffers.

What could cause such poor performance?
I presume that the disk was being forced to move the head a great deal.

I also asked:
> If I drop the fk constraint, and/or its index, would I still be affected by the leak you found?

I dropped two indexes and one fk constraint and ran VACUUM FULL VERBOSE ANALYZE document_file;
As an indication of the disk performance: at its peak the vacuum process was reading and writing 20MB/seconds (sustained), completing in less than 11 minutes.

I reran the original query.
It used constant memory (6.7MB private bytes)
It was reading 2 to 3MB/second, writing 3 to 6MB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was initially writing about 1MB/second, increasing to about 3MB/second

Checkpoints in the middle of this query were taking up to 13 seconds to write 100 buffers.

The checkpoint after the query took 300 seconds (exactly half the checkpoint interval), and was writing about 60KB/second. It wrote 2148 buffers.

So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this confirm that the memory leak you found is the one I was suffering from?

I'd also class the extremely poor performance of the alternate query as a bug.
Why take a fortnight when you could take three quarters of an hour? (Granted there where two less indexes to update, but that is still too long.)

Aside: I must say that I am impressed with PostgreSQL's handling of this connection. It recovers extremely well from running out of memory, cancelling very long running queries, reloading config (to turn autovacuum off), and continues to work as expected (the 3 day old connection that is).

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2008-01-27 23:01:08 Re: Slow set-returning functions
Previous Message Marcin Stępnicki 2008-01-27 19:16:20 Re: Slow set-returning functions