retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)
Date: 2017-05-23 12:14:39
Message-ID: CAA4eK1+R6hSx6t_yvwtx+NRzneVp+MRqXAdGJZChcau8Uij-8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 20, 2017 at 5:56 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Apr 15, 2017 at 02:30:18PM -0700, Andres Freund wrote:
>> On 2017-04-15 17:24:54 -0400, Tom Lane wrote:
>> > Andres Freund <andres(at)anarazel(dot)de> writes:
>> > > On 2017-04-15 17:09:38 -0400, Tom Lane wrote:
>> > >> Why doesn't Windows' ability to map the segment into the new process
>> > >> before it executes take care of that?
>> >
>> > > Because of ASLR of the main executable (i.e. something like PIE).
>> >
>> > Not following. Are you saying that the main executable gets mapped into
>> > the process address space immediately, but shared libraries are not?
>
> At the time of the pgwin32_ReserveSharedMemoryRegion() call, the child process
> contains only ntdll.dll and the executable.
>
>> Without PIE/ASLR we can somewhat rely on pgwin32_ReserveSharedMemoryRegion
>> to find the space that PGSharedMemoryCreate allocated still unoccupied.
>
> I've never had access to a Windows system that can reproduce the fork
> failures. My best theory is that antivirus or similar software injects an
> additional DLL at that early stage.
>
>> > I wonder whether we could work around that by just destroying the created
>> > process and trying again if we get a collision. It'd be a tad
>> > inefficient, but hopefully collisions wouldn't happen often enough to be a
>> > big problem.
>>
>> That might work, although it's obviously not pretty.
>
> I didn't like that idea when Michael proposed it in 2015. Since disabling
> ASLR on the exe proved insufficient, I do like it now. It degrades nicely; if
> something raises the collision rate from 1% to 10%, that just looks like fork
> latency degrading.
>

So it seems both you and Tom are leaning towards some sort of retry
mechanism for shm reattach on Windows. I also think that is a viable
option to negate the impact of ASLR. Attached patch does that. Note
that, as I have mentioned above I think we need to do it for shm
reserve operation as well. I think we need to decide how many retries
are sufficient before bailing out. As of now, I have used 10 to have
some similarity with PGSharedMemoryCreate(), but we can choose some
different count as well. One might say that we can have "number of
retries" as a guc parameter, but I am not sure about it, so not used.
Another point to consider is that do we want the same retry mechanism
for EXEC_BACKEND builds (function
PGSharedMemoryReAttach is different on Windows and EXEC_BACKEND
builds). I think it makes sense to have retry mechanism for
EXEC_BACKEND builds, so done that way in the patch. Yet another point
which needs some thought is for reattach operation, before retrying do
we want to reserve the shm by using VirtualAllocEx?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
win_shm_retry_reattach_v1.patch application/octet-stream 6.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mahi Gurram 2017-05-23 12:22:33 Re: Regarding Postgres Dynamic Shared Memory (DSA)
Previous Message Rafia Sabih 2017-05-23 12:06:33 Re: Increasing parallel workers at runtime