Skip site navigation (1) Skip section navigation (2)

Re: select on 22 GB table causes "An I/O error occuredwhile sending to the backend." exception

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: david(at)lang(dot)hm
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Matthew Dennis <mdennis(at)merfer(dot)net>, Matthew Wakeling <matthew(at)flymine(dot)org>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: select on 22 GB table causes "An I/O error occuredwhile sending to the backend." exception
Date: 2008-08-29 02:49:39
Message-ID: 20080829024939.GN8424@alvh.no-ip.org (view raw or flat)
Thread:
Lists: pgsql-performance
david(at)lang(dot)hm escribió:
> On Thu, 28 Aug 2008, Scott Marlowe wrote:

>> scenario 1:  There's a postmaster, it owns all the child processes.
>> It gets killed.  The Postmaster gets restarted.  Since there isn't one
>
> when the postmaster gets killed doesn't that kill all it's children as  
> well?

Of course not.  The postmaster gets a SIGKILL, which is instant death.
There's no way to signal the children.  If they were killed too then
this wouldn't be much of a problem.

>> running, it comes up.  starts new child processes.  Meanwhile, the old
>> child processes that don't belong to it are busy writing to the data
>> store.  Instant corruption.
>
> if so then the postmaster should not only check if there is an existing  
> postmaster running, it should check for the presense of the child  
> processes as well.

See my other followup.  There's limited things it can check, but against
sysadmin stupidity there's no silver bullet.

> well, if you aren't going through the postmaster, what process is  
> recieving network messages? it can't be a group of processes, only one 
> can be listening to a socket at one time.

Huh?  Each backend has its own socket.

> and if the postmaster isn't needed for the child processes to write to 
> the datastore, how are multiple child processes prevented from writing to 
> the datastore normally? and why doesn't that mechanism continue to work?

They use locks.  Those locks are implemented using shared memory.  If a
new postmaster starts, it gets a new shared memory, and a new set of
locks, that do not conflict with the ones already held by the first gang
of backends.  This is what causes the corruption.


> so are you saying that the only possible thing that can kill the  
> postmaster is the OOM killer? it can't possilby exit in any other  
> situation without the children being shutdown first?
>
> I would be surprised if that was really true.

If the sysadmin sends a SIGKILL then obviously the same thing happens.

Any other signal gives it the chance to signal the children before
dying.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

In response to

Responses

pgsql-performance by date

Next:From: davidDate: 2008-08-29 03:02:48
Subject: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Previous:From: Greg SmithDate: 2008-08-29 02:43:54
Subject: Re: How to setup disk spindles for best performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group