Re: Unexpected "shared memory block is still in use"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Unexpected "shared memory block is still in use"
Date: 2019-09-04 22:27:21
Message-ID: 22079.1567636041@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Attached is a draft patch to change both shmem and sema key selection
> to be based on data directory inode rather than port.
> ...
> I'm not quite sure what's going on in src/test/recovery/t/017_shm.pl.
> As expected, the test for port number non-collision no longer sees
> a failure. After fixing that, the test passes, but it takes a
> ridiculously long time (minutes); apparently each postmaster start/stop
> cycle takes much longer than it ought to. I suppose this patch is
> breaking its assumptions, but I've not studied it.

After looking closer, the problem is pretty obvious: the initial
loop is trying to create a cluster whose shmem key matches its
port-number-based expectation. With this code, that will never
happen except by unlikely accident, so it wastes time with repeated
initdb/start/stop attempts. After 100 tries it gives up and presses
on with the test, resulting in the apparent pass with long runtime.

I now understand the point you made upthread that this test could
only be preserved if we invent some way to force the choice of shmem
key. While it wouldn't be hard to do that (say, invent a magic
environment variable), I really don't want to do so. In the field,
such a behavior would have no positive use, and it could destroy our
newly-improved guarantees about detecting conflicting old processes.

However, there's another way to skin this cat. We can have the
Perl test script create a conflicting shmem segment directly,
as in the attached second-draft patch. I simplified the test
script quite a bit, since I don't see any particular value in
creating more than one test postmaster with this approach.

This still isn't committable as-is, since the test will just curl up
and die on machines lacking IPC::SharedMem. (It could be rewritten
to rely only on the lower-level IPC::SysV module, but I doubt that's
worth the trouble, since either way it'd fail on Windows.) I'm not
sure whether we should just not bother to run the test at all, or
if we should run it but skip the IPC-related parts; and my Perl-fu
isn't really up to implementing either behavior.

Another thing that might be interesting is to do more than just create
the conflicting segment, ie, try to put some data into it that would
fool the postmaster. I'm not excited about that at all, but maybe
someone else is?

The attached patch is identical to the previous one except for the
changes in src/test/recovery/t/017_shm.pl.

regards, tom lane

Attachment Content-Type Size
use-data-dir-inode-for-ipc-keys-2.patch text/x-diff 18.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-09-04 22:28:37 Re: unexpected rowlock mode when trigger is on the table
Previous Message George Hafiz 2019-09-04 21:57:06 Re: Client Certificate Authentication Using Custom Fields (i.e. other than CN)