POSIX shared memory redux

From: A(dot)M(dot) <agentm(at)themactionfaction(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: POSIX shared memory redux
Date: 2010-11-14 00:48:57
Message-ID: 7DBB97E0-3890-4904-AC16-CAB24D89055D@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The goal of this work is to address all of the shortcomings of previous POSIX shared memory patches as pointed out mostly by Tom Lane.

Branch: http://git.postgresql.org/gitweb?p=users/agentm/postgresql.git;a=shortlog;h=refs/heads/posix_shmem
Main file: http://git.postgresql.org/gitweb?p=users/agentm/postgresql.git;a=blob;f=src/backend/port/posix_shmem.c;h=da93848d14eeadb182d8bf1fe576d741ae5792c3;hb=refs/heads/posix_shmem

Design goals:
1) ensure that shared memory creation collisions are impossible
2) ensure that shared memory access collisions are impossible
3) ensure proper shared memory cleanup after backend and postmaster close
4) minimize API changes

This patch addresses the above goals and offers some benefits over SysV shared memory:

1) no kern.sysv management (one documentation page with platform-specific help can disappear)
2) shared memory allocation limited only by mmap usage
3) shared memory regions are completely cleaned up when the postmaster and all of its children are exited or killed for any reason (including SIGKILL)
4) shared memory creation race conditions or collisions between postmasters or backends are impossible
5) after postmaster startup, the postmaster becomes the sole arbiter of which other processes are granted access to the shared memory region
6) mmap and munmap can be used on the shared memory region- this may be useful for offering the option to expand the memory region dynamically

The design goals are accomplished by a simple change in shared memory creation: after shm_open, the region name is immediately shm_unlink'd. Because POSIX shared memory relies on file descriptors, the shared memory is not deallocated in the kernel until the last referencing file descriptor is closed (in this case, on process exit). The postmaster then becomes the sole arbiter of passing the shared memory file descriptor (either through children or through file descriptor passing, if necessary).

The patch is a reworked version of Chris Marcellino <cmarcellino(at)apple(dot)com>'s patch.


1) the shared memory name is based on getpid()- this ensures that no two starting postmasters (or other processes) will attempt to acquire the same shared memory segment.
2) the shared memory segment is created and immediately unlinked, preventing outside access to the shared memory region
3) the shared memory file descriptor is passed to backends via static int file descriptor (normal file descriptor inheritance)
* perhaps there is a better location to store the file descriptor- advice welcomed.
4) shared memory segment detach occurs when the process exits (kernel-based cleanup instead of scheduled in-process clean up)

Additional notes:
The "feature" whereby arbitrary postgres user processes could connect to the shared memory segment has been removed with this patch. If this is a desirable feature (perhaps for debugging or performance tools), this could be added by implementing a file descriptor passing server in the postmaster which would use SCM_RIGHTS control message passing to a) verify that the remote process is running as the same user as the postmaster b) pass the shared memory file descriptor to the process. I am happy to implement this, if required.

I am happy to continue work on this patch if the pg-hackers deem it worthwhile. Thanks!



Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-14 00:54:11 Re: unlogged tables
Previous Message Tom Lane 2010-11-14 00:41:59 Re: unlogged tables