From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heath Lord <heath(dot)lord(at)crunchydata(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: "could not reattach to shared memory" on buildfarm member dory |
Date: | 2018-05-01 02:59:14 |
Message-ID: | 20180501025914.GA2777381@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 30, 2018 at 08:01:40PM -0400, Tom Lane wrote:
> It's clear from dory's results that something is causing a 4MB chunk
> of memory to get reserved in the process's address space, sometimes.
> It might happen during the main MapViewOfFileEx call, or during the
> preceding VirtualFree, or with my map/unmap dance in place, it might
> happen during that. Frequently it doesn't happen at all, at least not
> before the point where we've successfully done MapViewOfFileEx. But
> if it does happen, and the chunk happens to get put in a spot that
> overlaps where we want to put the shmem block, kaboom.
>
> What seems like a plausible theory at this point is that the apparent
> asynchronicity is due to the allocation being triggered by a different
> thread, and the fact that our added monitoring code seems to make the
> failure more likely can be explained by that code changing the timing.
> But what thread could it be? It doesn't really look to me like either
> the signal thread or the timer thread could eat 4MB. syslogger.c
> also spawns a thread, on Windows, but AFAICS that's not being used in
> this test configuration. Maybe the reason dory is showing the problem
> is something or other is spawning a thread we don't even know about?
Likely some privileged daemon is creating a thread in every new process. (On
Windows, it's not unusual for one process to create a thread in another
process.) We don't have good control over that.
> I'm at a loss for a reasonable way to fix it
> for real. Is there a way to seize control of a Windows process so that
> there are no other running threads?
I think not.
> Any other ideas?
PostgreSQL could retry the whole process creation, analogous to
internal_forkexec() retries. Have the failed process exit after recording the
fact that it couldn't attach. Make the postmaster notice and spawn a
replacement. Give up after 100 failed attempts.
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2018-05-01 03:11:27 | Re: [HACKERS] proposal: schema variables |
Previous Message | Huong Dangminh | 2018-05-01 02:20:13 | RE: power() function in Windows: "value out of range: underflow" |