Re: DSA_ALLOC_NO_OOM doesn't work

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: DSA_ALLOC_NO_OOM doesn't work
Date: 2024-02-21 19:19:49
Message-ID: cf35fccb-957e-48de-8ea0-a083e9d7843d@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 14/02/2024 09:23, Robert Haas wrote:
> On Tue, Feb 13, 2024 at 7:53 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> However, I must say that the dsm_impl_op() interface is absolutely
>> insane. It's like someone looked at ioctl() and thought, "hey that's a
>> great idea!".
>
> As the person who wrote that code, this made me laugh.
>
> I agree it's not the prettiest interface, but I thought that was OK
> considering that it should only ever have a very limited number of
> callers. I believe I did it this way in the interest of code
> compactness. Since there are four DSM implementations, I wanted the
> implementation-specific code to be short and all in one place, and
> jamming it all into one function served that purpose. Also, there's a
> bunch of logic that is shared by multiple operations - detach and
> destroy tend to be similar, and so do create and attach, and there are
> even things that are shared across all operations, like the snprintf
> at the top of dsm_impl_posix() or the slightly larger amount of
> boilerplate at the top of dsm_impl_sysv().
>
> I'm not particularly opposed to refactoring this to make it nicer, but
> my judgement was that splitting it up into one function per operation
> per implementation, say, would have involved a lot of duplication of
> small bits of code that might then get out of sync with each other
> over time. By doing it this way, the logic is a bit tangled -- or
> maybe more than a bit -- but there's very little duplication because
> each implementation gets jammed into the smallest possible box. I'm OK
> with somebody deciding that I got the trade-offs wrong there, but I
> will be interested to see the number of lines added vs. removed in any
> future refactoring patch.

That's fair, I can see those reasons. Nevertheless, I do think it was a
bad tradeoff. A little bit of repetition would be better here, or we can
extract the common parts to smaller functions.

I came up with the attached:

25 files changed, 1710 insertions(+), 1113 deletions(-)

So yeah, it's more code, and there's some repetition, but I think this
is more readable. Some of that is extra boilerplate because I split the
implementations to separate files, and I also added tests.

I'm not 100% wedded on all of the decisions here, but I think this is
the right direction overall. For example, I decided to separate
dsm_handle used by the high-level interface in dsm.c and the
dsm_impl_handle used by the low-level interace in dsm_impl_*.c (more on
that below). That feels slightly better to me, but could be left out if
we don't want that.

Notable changes:

- Split the single multiplexed dsm_impl_op() function into multiple
functions for different operations. This allows more natural function
signatures for the different operations.

- The create() function is now responsible for generating the handle,
instead of having the caller generate it. Those implementations that
need to generate a random handle and retry if it's already in use, now
do that retry within the implementation.

- The destroy() function no longer detaches the segment; you must call
detach() first if the segment is still attached. This avoids having to
pass "junk" values when destroying a segment that's not attached, and in
case of error, makes it more clear what failed.

- Separate dsm_handle, used by backend code to interact with the high
level interface in dsm.c, from dsm_impl_handle, which is used to
interact with the low-level functions in dsm_impl.c. This gets rid of
the convention in dsm.c of reserving odd numbers for DSM segments stored
in the main shmem area. There is now an explicit flag for that the
control slot. For generating dsm_handles, we now use the same scheme we
used to use for main-area shm segments for all DSM segments, which
includes the slot number in the dsm_handle. The implementations use
their own mechanisms for generating the low-level dsm_impl_handles (all
but the SysV implementation generate a random handle and retry on
collision).

- Use IPC_PRIVATE in the SysV implementation to have the OS create a
unique identifier for us. Use the shmid directly as the (low-level)
handle, so that we don't need to use shmget() to convert a key to shmid,
and don't need the "cache" for that.

- create() no longer returns the mapped_size. The old Windows
implementation had some code to read the actual mapped size after
creating the mapping, and returned that in *mapped_size. Others just
returned the requested size. In principle reading the actual size might
be useful; the caller might be able to make use of the whole mapped size
when it's larger than requested. In practice, the callers didn't do
that. Also, POSIX shmem on FreeBSD has similar round-up-to-page-size
behavior but the implementation did not query the actual mapped size
after creating the segment, so you could not rely on it.

- Added a test that exercises basic create, detach, attach functionality
using all the different implementations supported on the current platform.

- Change datatype of the opaque types in dsm_impl.c from "void *" to
typedefs over uintptr_t. It's easy to make mistakes with "void *", as
you can pass any pointer without getting warnings from the compiler.
Dedicated typedefs give a bit more type checking. (This is in the first
commit, all the other changes are bundled together in the second commit.)

Overall, I don't think this is backpatchable. The handle changes and use
of IPC_PRIVATE in particular: they could lead to failure to clean up old
segments if you upgraded the binary without a clean shutdown. A slightly
different version of this possibly would be, but I'd like to focus on
what's best for master for now.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
v1-0001-Change-datatype-of-the-opaque-types-in-dsm_impl.c.patch text/x-patch 12.2 KB
v1-0002-Rewrite-the-dsm_impl-interface.patch text/x-patch 107.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-02-21 20:17:58 Re: BUG #18356: Casting values from jsonb_each_text does not respect WHERE filter with sub select
Previous Message Ed Herrmann 2024-02-21 19:19:15 Re: BUG #18356: Casting values from jsonb_each_text does not respect WHERE filter with sub select

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-02-21 20:24:42 Re: LogwrtResult contended spinlock
Previous Message Amonson, Paul D 2024-02-21 17:35:57 RE: Popcount optimization using AVX512