Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

From: david(at)lang(dot)hm
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Matthew Wakeling <matthew(at)flymine(dot)org>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Date: 2008-08-29 01:16:16
Message-ID: alpine.DEB.1.10.0808281814100.2713@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 28 Aug 2008, Scott Marlowe wrote:

> On Thu, Aug 28, 2008 at 5:08 PM, <david(at)lang(dot)hm> wrote:
>> On Thu, 28 Aug 2008, Scott Marlowe wrote:
>>
>>> On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling <matthew(at)flymine(dot)org>
>>> wrote:
>>>
>>>> Another point is that from a business perspective, a database that has
>>>> stopped responding is equally bad regardless of whether that is because
>>>> the
>>>> OOM killer has appeared or because the machine is thrashing. In both
>>>> cases,
>>>> there is a maximum throughput that the machine can handle, and if
>>>> requests
>>>> appear quicker than that the system will collapse, especially if the
>>>> requests start timing out and being retried.
>>>
>>> But there's a HUGE difference between a machine that has bogged down
>>> under load so badly that you have to reset it and a machine that's had
>>> the postmaster slaughtered by the OOM killer. In the first situation,
>>> while the machine is unresponsive, it should come right back up with a
>>> coherent database after the restart.
>>>
>>> OTOH, a machine with a dead postmaster is far more likely to have a
>>> corrupted database when it gets restarted.
>>
>> wait a min here, postgres is supposed to be able to survive a complete box
>> failure without corrupting the database, if killing a process can corrupt
>> the database it sounds like a major problem.
>
> Yes it is a major problem, but not with postgresql. It's a major
> problem with the linux OOM killer killing processes that should not be
> killed.
>
> Would it be postgresql's fault if it corrupted data because my machine
> had bad memory? Or a bad hard drive? This is the same kind of
> failure. The postmaster should never be killed. It's the one thing
> holding it all together.

the ACID guarantees that postgres is making are supposed to mean that even
if the machine dies, the CPU goes up in smoke, etc, the transactions that
are completed will not be corrupted.

if killing the process voids all the ACID protection then something is
seriously wrong.

it may loose transactions that are in flight, but it should not corrupt
the database.

David Lang

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2008-08-29 01:52:00 Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Previous Message Scott Marlowe 2008-08-29 01:11:50 Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception