Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid

From: Noah Misch <noah(at)leadboat(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 9erthalion6(at)gmail(dot)com, sfrost(at)snowman(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
Date: 2019-03-07 03:24:22
Message-ID: 20190307032422.GB1797215@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 04, 2019 at 06:04:20PM +0900, Kyotaro HORIGUCHI wrote:
> I found that I have 65(h) segments left alone on my environment:p

Did patched PostgreSQL create those, or did unpatched PostgreSQL create them?

> At Sat, 11 Aug 2018 23:48:15 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20180812064815(dot)GB2301738(at)rfd(dot)leadboat(dot)com>
> > still doesn't detect. I could pursue a fix via the aforementioned
> > sysv_shmem_key file, modulo the possibility of a DBA removing it. I could
> > also, when postmaster.pid is missing, make sysv_shmem.c check the first N
> > (N=100?) keys applicable to the selected port. My gut feeling is that neither
> > thing is worthwhile, but I'm interested to hear other opinions.
>
> # Though I don't get the meaning of the "modulo" there..

It wasn't clear. Adding a separate sysv_shmem_key file would not protect a
cluster if the DBA does "rm -f postmaster.pid sysv_shmem_key". It would,
however, fix this test case:

+# Fail to reject startup if shm key N has become available and we crash while
+# using key N+1. This is unwanted, but expected. Windows is immune, because
+# its GetSharedMemName() use DataDir strings, not numeric keys.
+$flea->stop; # release first key
+is( $gnat->start(fail_ok => 1),
+ $TestLib::windows_os ? 0 : 1,
+ 'key turnover fools only sysv_shmem.c');

> I think the only thing we must avoid here is sharing the same
> shmem segment with a living-dead server. If we can do that
> without the pid file, it would be better than relying on it. We
> could remove orphaned segments automatically, but I don't think
> we should do that going so far as relying on a dedicated
> file.

There are two things to avoid. First, during postmaster startup, avoid
sharing a segment with any other process. Second, avoid sharing a data
directory with a running PostgreSQL process that was a child of some other
postmaster. I think that's what you mean by "living-dead server". The first
case is already prevented thoroughly, because we only proceed with startup
after creating a new segment with IPC_CREAT | IPC_EXCL. The second case is
what this patch seeks to improve.

> Also, I don't think it's worth stopping shmem id scanning
> at a certain number since I don't come up with an appropriate
> number for it. But looking "port * 1000", it might be expected
> that a usable segment id will found while scanning that number of
> ids (1000).

If we find thousands of unusable segments, we do keep going until we find a
usable segment. I was referring to a different situation. Today, if there's
no postmaster.pid file and we're using port 5432, we first try segment ID
5432000. If segment ID 5432000 is usable, we use it without examining any
other segment ID. If segment ID 5432010 were in use by a "living-dead
server", we won't discover that fact. We could increase the chance of
discovering that fact by checking every segment ID from 5432000 to 5432999.
(Once finished with that scan, we'd still create and use 5432000.) I didn't
implement that idea in the patch, but I shared it to see if anyone thought it
was worth adding.

> PGSharedMemoryCreate changed to choose a usable shmem id using
> the IpcMemoryAnalyze(). But some of the statuses from
> IpcMemoryAnalyze() is concealed by failure of
> PGSharedMemoryAttach() and ignored silently opposed to what the
> code is intending to do.

SHMSTATE_FOREIGN is ignored silently. The patch modifies the
PGSharedMemoryAttach() header comment to direct callers to treat
PGSharedMemoryAttach() failure like SHMSTATE_FOREIGN. I don't think the code
had an unintentional outcome due to the situation you describe. Perhaps I
could have made the situation clearer.

> (By the way SHMSTATE_EEXISTS seems
> suggesting oppsite thing from EEXIST, which would be confusing.)

Good catch. Is SHMSTATE_ENOENT better?

> PGSharedMemoryCreate() repeats shmat/shmdt twice in every
> iteration. It won't harm so much but it would be better if we
> could get rid of that.

I'll try to achieve that and see if the code turns out cleaner.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-03-07 03:57:30 Re: [HACKERS] Block level parallel vacuum
Previous Message Thomas Munro 2019-03-07 02:55:37 Re: Drop type "smgr"?