Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: david(at)lang(dot)hm
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Matthew Dennis <mdennis(at)merfer(dot)net>, Matthew Wakeling <matthew(at)flymine(dot)org>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Date: 2008-08-29 13:48:06
Message-ID: 20080829134806.GE3983@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

david(at)lang(dot)hm escribió:
> On Thu, 28 Aug 2008, Alvaro Herrera wrote:
>
>> david(at)lang(dot)hm escribi?:
>>> On Thu, 28 Aug 2008, Scott Marlowe wrote:
>>
>>>> scenario 1: There's a postmaster, it owns all the child processes.
>>>> It gets killed. The Postmaster gets restarted. Since there isn't one
>>>
>>> when the postmaster gets killed doesn't that kill all it's children as
>>> well?
>>
>> Of course not. The postmaster gets a SIGKILL, which is instant death.
>> There's no way to signal the children. If they were killed too then
>> this wouldn't be much of a problem.
>
> I'm not saying that it would signal it's children, I thought that the OS
> killed children (unless steps were taken to allow them to re-parent)

Oh, you were mistaken then.

>>> well, if you aren't going through the postmaster, what process is
>>> recieving network messages? it can't be a group of processes, only one
>>> can be listening to a socket at one time.
>>
>> Huh? Each backend has its own socket.
>
> we must be talking about different things. I'm talking about the socket
> that would be used for clients to talk to postgres, this is either a TCP
> socket or a unix socket. in either case only one process can listen on
> it.

Obviously only one process (the postmaster) can call listen() on a given
TCP address/port. Once connected, the socket is passed to the
backend, and the postmaster is no longer involved in the communication
between backend and client. Each backend has its own socket. If the
postmaster dies, the established communication is still alive.

>>> and if the postmaster isn't needed for the child processes to write to
>>> the datastore, how are multiple child processes prevented from writing to
>>> the datastore normally? and why doesn't that mechanism continue to work?
>>
>> They use locks. Those locks are implemented using shared memory. If a
>> new postmaster starts, it gets a new shared memory, and a new set of
>> locks, that do not conflict with the ones already held by the first gang
>> of backends. This is what causes the corruption.
>
> so the new postmaster needs to detect that there is a shared memory
> segment out that used by backends for this database.

> this doesn't sound that hard,

You're welcome to suggest actual improvements to our interlocking
system, after you've read the current code and understood its rationale.

>> Any other signal gives it the chance to signal the children before
>> dying.
>
> are you sure that it's not going to die from a memory allocation error?
> or any other similar type of error without _always_ killing the children?

I am sure. There are no memory allocations in that code. It is
carefully written with that one purpose.

There may be bugs, but that's another matter. This code was written
eons ago and has proven very healthy.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2008-08-29 15:25:51 Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Previous Message Gregory Williamson 2008-08-29 11:18:33 Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception