Re: FATAL: could not reattach to shared memory (Win32)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Trevor Talbot <quension(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 15:06:27
Message-ID: 200708241506.l7OF6R700259@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Trevor Talbot wrote:
> On 8/23/07, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> > Shelby Cain wrote:
>
> > > Wild guess on my part... could that error be the result of an attempt
> > > to map shared memory into a process at a fixed location that just
> > > happens to already be occupied by a dll that Windows had decided to
> > > relocate?
> >
> > Not that wild a guess, really :-) I'd say it's a very good possibility -
> > but I have no idea why it'd do that, since all backends load the same
> > DLLs at that stage.
>
> Not a valid assumption; you can't rely on consistent VM space among
> multiple [non-cloned] processes without a serious amount of effort.
> Anything can use that space, it's not just file views. Obviously it
> happens to work some of the time, but when it doesn't, it doesn't. I
> gather postgres depends on it being at the same address, and fixing
> that isn't trivial?
>
> If everything relevant is going through the intriguing
> internal_forkexec(), you could probably reserve address space there
> before resuming the thread. You'd want to combine this with picking
> address space that's less likely to be used before creating the shared
> memory section. (Actually, if you're doing that, you might as well
> just inject the backend variables too instead of going through the
> mapped file gymnastics.)
>
> Not a simple change, but would likely make this particular problem go
> away (assuming this is the problem). It's also the first time I've
> looked at the source, so perhaps I missed something.

I think this is accurate. When we created the Win32 native port there
was a lot of concern about how to handle shared memory in a BACKEND_EXEC
case, namely that postmaster children were not copies which had the same
shared memory mappings, but rather were new processes that had to attach
to shared memory at a fixed address.

The WIN32 solution was to create the shared memory in the parent, and
then pass that address value down to the children to use in attaching to
the existing segment. We expected all sorts of problems with this but
in fact it seemed to work fine (most of the time).

As you can see it doesn't work 100% of the time, but it worked more
reliabily than we expected. What we have been waiting for is someone
who can recreate a failure so we can track down how to best make it 100%
reliable, and as you can see, we haven't had a flood of problem reports
to track this down.

If you want to help make it 100% we will work with you to find the
solution.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2007-08-24 15:09:16 Re: [OT - sorta] How to extract a substring using Regex
Previous Message Gregory Stark 2007-08-24 15:04:42 Re: Geographic High-Availability/Replication