Re: OpenBSD versus semaphores

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Mikael Kjellström <mikael(dot)kjellstrom(at)gmail(dot)com>, Pierre-Emmanuel André <pea(at)openbsd(dot)org>
Subject: Re: OpenBSD versus semaphores
Date: 2019-01-08 07:05:12
Message-ID: CAEepm=2ndy5RSABeaf3L1hFhXoBSg09RvgudfTWfbn=DMUbJ3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 8, 2019 at 7:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've been toying with OpenBSD lately, and soon noticed a seriously
> annoying problem for running Postgres on it: by default, its limits
> for SysV semaphores are only SEMMNS=60, SEMMNI=10. Not only does that
> greatly constrain the number of connections for a single installation,
> it means that our TAP tests fail because you can't start two postmasters
> concurrently (cf [1]).
>
> Raising the annoyance factor considerably, AFAICT the only way to
> increase these settings is to build your own custom kernel.
>
> So I looked around for an alternative, and found out that modern
> OpenBSD releases support named POSIX semaphores (though not unnamed
> ones, at least not shared unnamed ones). What's more, it appears that
> in this implementation, named semaphores don't eat open file descriptors
> as they do on macOS, removing our major objection to using them.
>
> I don't have any OpenBSD installation on hardware that I'd take very
> seriously for performance testing, but some light testing with
> "pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
> has just about the same performance as a build with SysV semaphores.
>
> This all leads to the thought that maybe we should be selecting
> PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD. At the very least,
> our docs ought to recommend it as a credible alternative for
> people who don't want to get into building custom kernels.
>
> I've checked that this works back to OpenBSD 6.0, and scanning
> their man pages suggests that the feature appeared in 5.5.
> 5.5 isn't that old (2014) so possibly people are still running
> older versions, but we could easily put in version-specific
> default logic similar to what's in src/template/darwin.
>
> Thoughts?

No OpenBSD here, but I was curious enough to peek at their
implementation. Like others, they create a tiny file under /tmp for
each one, mmap() and close the fd straight away. Apparently don't
support shared sem_init() yet (EPERM). So your plan seems good to me.
CC'ing Pierre-Emmanuel (OpenBSD PostgreSQL port maintainer) in case he
is interested.

Wild speculation: I wouldn't be surprised if POSIX named semas
perform better than SysV semas on a large enough system, since they'll
live on different pages. At a glance, their sys_semget apparently
allocates arrays of struct sem without padding and I think they
probably get about 4 to a cacheline; see our experience with an 8
socket box leading to commit 2d306759 where we added our own padding.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-01-08 07:26:38 Re: Improve selectivity estimate for range queries
Previous Message Amit Langote 2019-01-08 06:30:10 Re: speeding up planning with partitions