Skip site navigation (1) Skip section navigation (2)

Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

From: david(at)lang(dot)hm
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Date: 2008-08-28 06:16:22
Message-ID: alpine.DEB.1.10.0808272305410.30743@asgard.lang.hm (view raw or flat)
Thread:
Lists: pgsql-performance
On Wed, 27 Aug 2008, Craig James wrote:

> The OOM killer is a terrible idea for any serious database server.  I wrote a 
> detailed technical paper on this almost 15 years ago when Silicon Graphics 
> had this same feature, and Oracle and other critical server processes 
> couldn't be made reliable.
>
> The problem with "overallocating memory" as Linux does by default is that 
> EVERY application, no matter how well designed and written, becomes 
> unreliable: It can be killed because of some OTHER process.  You can be as 
> clever as you like, and do all the QA possible, and demonstrate that there 
> isn't a single bug in Postgres, and it will STILL be unreliable if you run it 
> on a Linux system that allows overcommitted memory.
>
> IMHO, all Postgres servers should run with memory-overcommit disabled.  On 
> Linux, that means  /proc/sys/vm/overcommit_memory=2.

it depends on how much stuff you allow others to run on the box. if you 
have no control of that then yes, the box is unreliable (but it's not just 
becouse of the OOM killer, it's becouse those other users can eat up all 
the other box resources as well CPU, network bandwidth, disk bandwidth, 
etc)

even with overcommit disabled, the only way you can be sure that a program 
will not fail is to make sure that it never needs to allocate memory. with 
overcommit off you could have one program that eats up 100% of your ram 
without failing (handling the error on memory allocation such that it 
doesn't crash), but which will cause _every_ other program on the system 
to fail, including any scripts (becouse every command executed will 
require forking and without overcommit that will require allocating the 
total memory that your shell has allocated so that it can run a trivial 
command (like ps or kill that you are trying to use to fix the problem)

if you have a box with unpredictable memory use, disabling overcommit will 
not make it reliable. it may make it less unreliable (the fact that the 
linux OOM killer will pick one of the worst possible processes to kill is 
a problem), but less unreliable is not the same as reliable.

it's also not that hard to have a process monitor the postmaster (along 
with other box resources) to restart it if it is killed, at some point you 
can get init to watch your watchdog and the OOM killer will not kill init. 
so while you can't prevent the postmaster from being killed, you can setup 
to recover from it.

David Lang

In response to

Responses

pgsql-performance by date

Next:From: davidDate: 2008-08-28 06:23:16
Subject: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Previous:From: Craig JamesDate: 2008-08-28 05:58:51
Subject: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group